Cho, Suhwan Lee, Minhyeok Lee, Seunghoon Lee, Dogyoon Choi, Heeseung Kim, Ig-Jae Lee, Sangyoun 2025-04-23T03:00:07Z 2025-04-23T03:00:07Z 2025-03-20 2024-06 1063-6919 https://pubs.kist.re.kr/handle/201004/152283 Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos. The primary techniques used in unsupervised VOS are 1) the collaboration of appearance and motion information; and 2) temporal fusion between different frames. This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames. IMA densely integrates context information from different modalities based on a mutual refinement. IFA injects global context of a video to the query frame, enabling a full utilization of useful properties from multiple frames. Experimental results on public benchmark datasets demonstrate that our proposed approach outperforms all existing methods by a substantial margin. The proposed two components are also thoroughly validated via ablative study. Code and models are available at https://github.com/Hydragon516/DPA. English IEEE COMPUTER SOC Dual Prototype Attention for Unsupervised Video Object Segmentation Conference 10.1109/CVPR52733.2024.01820 1 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.19238 - 19247 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 19238 19247 US Seattle, WA 2024-06-16 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) 001342515502055 2-s2.0-85201024586