DSpace at KIST: Dual-path Adaptation from Image to Video Transformers

Browse

DSpace at KISTKIST Conference Paper 2023

Full metadata record

DC Field	Value	Language
dc.contributor.author	Park, Jungin	-
dc.contributor.author	Lee, Jiyoung	-
dc.contributor.author	Sohn, Kwanghoon	-
dc.date.accessioned	2024-01-12T02:46:03Z	-
dc.date.available	2024-01-12T02:46:03Z	-
dc.date.created	2023-11-17	-
dc.date.issued	2023-06	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/76431	-
dc.description.abstract	In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters. Previous adaptation methods have simultaneously considered spatial and temporal modeling with a unified learnable module but still suffered from fully leveraging the representative capabilities of image transformers. We argue that the popular dual-path (two-stream) architecture in video models can mitigate this problem. We propose a novel DUALPATH adaptation separated into spatial and temporal adaptation paths, where a lightweight bottleneck adapter is employed in each transformer block. Especially for temporal dynamic modeling, we incorporate consecutive frames into a grid-like frameset to precisely imitate vision transformers' capability that extrapolates relationships between tokens. In addition, we extensively investigate the multiple baselines from a unified perspective in video understanding and compare them with DUALPATH. Experimental results on four action recognition benchmarks prove that pretrained image transformers with DUALPATH can be effectively generalized beyond the data domain.	-
dc.language	English	-
dc.publisher	IEEE COMPUTER SOC	-
dc.title	Dual-path Adaptation from Image to Video Transformers	-
dc.type	Conference	-
dc.identifier.doi	10.1109/CVPR52729.2023.00219	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.2203 - 2213	-
dc.citation.title	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)	-
dc.citation.startPage	2203	-
dc.citation.endPage	2213	-
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	Vancouver, CANADA	-
dc.citation.conferenceDate	2023-06-17	-
dc.relation.isPartOf	2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR	-
dc.identifier.wosid	001058542602052	-
dc.identifier.scopusid	2-s2.0-85173910084	-

Appears in Collections:: KIST Conference Paper > 2023

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE