Full metadata record

DC Field Value Language
dc.contributor.authorJang, Jinhyun-
dc.contributor.authorPark, Jungin-
dc.contributor.authorKim, Jin-
dc.contributor.authorKwon, Hyeongjun-
dc.contributor.authorSohn, Kwanghoon-
dc.date.accessioned2024-04-11T05:00:35Z-
dc.date.available2024-04-11T05:00:35Z-
dc.date.created2024-04-11-
dc.date.issued2023-10-
dc.identifier.issn1550-5499-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/149641-
dc.description.abstractRecent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries. However, their input-agnostic moment queries inevitably overlook an intrinsic temporal structure of a video, providing limited positional information. In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. To this end, we present two levels of reasoning: 1) Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism; and 2) moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps. Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks. The code is publicly available at https://github.com/jinhyunj/EaTR.-
dc.languageEnglish-
dc.publisherIEEE COMPUTER SOC-
dc.titleKnowing Where to Focus: Event-aware Transformer for Video Grounding-
dc.typeConference-
dc.identifier.doi10.1109/ICCV51070.2023.01273-
dc.description.journalClass1-
dc.identifier.bibliographicCitationIEEE/CVF International Conference on Computer Vision (ICCV), pp.13800 - 13810-
dc.citation.titleIEEE/CVF International Conference on Computer Vision (ICCV)-
dc.citation.startPage13800-
dc.citation.endPage13810-
dc.citation.conferencePlaceUS-
dc.citation.conferencePlaceParis, FRANCE-
dc.citation.conferenceDate2023-10-02-
dc.relation.isPartOf2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023)-
dc.identifier.wosid001169499006025-
dc.identifier.scopusid2-s2.0-85180480805-
Appears in Collections:
KIST Conference Paper > 2023
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE