Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Jang, Jinhyun | - |
dc.contributor.author | Park, Jungin | - |
dc.contributor.author | Kim, Jin | - |
dc.contributor.author | Kwon, Hyeongjun | - |
dc.contributor.author | Sohn, Kwanghoon | - |
dc.date.accessioned | 2024-04-11T05:00:35Z | - |
dc.date.available | 2024-04-11T05:00:35Z | - |
dc.date.created | 2024-04-11 | - |
dc.date.issued | 2023-10 | - |
dc.identifier.issn | 1550-5499 | - |
dc.identifier.uri | https://pubs.kist.re.kr/handle/201004/149641 | - |
dc.description.abstract | Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries. However, their input-agnostic moment queries inevitably overlook an intrinsic temporal structure of a video, providing limited positional information. In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. To this end, we present two levels of reasoning: 1) Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism; and 2) moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps. Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks. The code is publicly available at https://github.com/jinhyunj/EaTR. | - |
dc.language | English | - |
dc.publisher | IEEE COMPUTER SOC | - |
dc.title | Knowing Where to Focus: Event-aware Transformer for Video Grounding | - |
dc.type | Conference | - |
dc.identifier.doi | 10.1109/ICCV51070.2023.01273 | - |
dc.description.journalClass | 1 | - |
dc.identifier.bibliographicCitation | IEEE/CVF International Conference on Computer Vision (ICCV), pp.13800 - 13810 | - |
dc.citation.title | IEEE/CVF International Conference on Computer Vision (ICCV) | - |
dc.citation.startPage | 13800 | - |
dc.citation.endPage | 13810 | - |
dc.citation.conferencePlace | US | - |
dc.citation.conferencePlace | Paris, FRANCE | - |
dc.citation.conferenceDate | 2023-10-02 | - |
dc.relation.isPartOf | 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | - |
dc.identifier.wosid | 001169499006025 | - |
dc.identifier.scopusid | 2-s2.0-85180480805 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.