DSpace at KIST: Language-free Training for Zero-shot Video Grounding

Browse

DSpace at KISTKIST Conference Paper 2023

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Dahye	-
dc.contributor.author	Park, Jungin	-
dc.contributor.author	Lee, Jiyoung	-
dc.contributor.author	Park, Seongheon	-
dc.contributor.author	Sohn, Kwanghoon	-
dc.date.accessioned	2024-01-12T02:47:42Z	-
dc.date.available	2024-01-12T02:47:42Z	-
dc.date.created	2023-07-28	-
dc.date.issued	2023-01	-
dc.identifier.issn	2472-6737	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/76511	-
dc.description.abstract	Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously. One of the most challenging issues is an extremely time- and cost-consuming annotation collection, including video captions in a natural language form and their corresponding temporal regions. In this paper, we present a simple yet novel training framework for video grounding in the zero-shot setting, which learns a network with only video data without any annotation. Inspired by the recent language-free paradigm, i.e. training without language data, we train the network without compelling the generation of fake (pseudo) text queries into a natural language form. Specifically, we propose a method for learning a video grounding model by selecting a temporal interval as a hypothetical correct answer and considering the visual feature selected by our method in the interval as a language feature, with the help of the well-aligned visuallanguage space of CLIP. Extensive experiments demonstrate the prominence of our language-free training framework, outperforming the existing zero-shot video grounding method and even several weakly-supervised approaches with large margins on two standard datasets.	-
dc.language	English	-
dc.publisher	IEEE COMPUTER SOC	-
dc.title	Language-free Training for Zero-shot Video Grounding	-
dc.type	Conference	-
dc.identifier.doi	10.1109/WACV56688.2023.00257	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.2538 - 2547	-
dc.citation.title	23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)	-
dc.citation.startPage	2538	-
dc.citation.endPage	2547	-
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	Waikoloa, HI	-
dc.citation.conferenceDate	2023-01-03	-
dc.relation.isPartOf	2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)	-
dc.identifier.wosid	000971500202064	-
dc.identifier.scopusid	2-s2.0-85149027937	-

Appears in Collections:: KIST Conference Paper > 2023

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE