Multimodal Text/Video Kaitlyn

Adversarial Video Promotion Against Text-to-Video Retrieval

Abstract: Thanks to the development of cross-modal models, text-to-video retrieval (T2VR) is advancing rapidly, but its robustness remains largely unexamined. Existing attacks against T2VR are ...

IEEE

Localizing Step-by-Step: Multimodal Long Video Temporal Grounding with LLM

Video Temporal Grounding (VTG) localizes moments in untrimmed videos using natural language queries. Most VTG datasets focus on short videos, and existing approaches excel in short-term cross-modal ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Adversarial Video Promotion Against Text-to-Video Retrieval

Localizing Step-by-Step: Multimodal Long Video Temporal Grounding with LLM

Trending now