Full metadata record

DC Field Value Language
dc.contributor.authorLee, Minhyeok-
dc.contributor.authorCho, Suhwan-
dc.contributor.authorLee, Jungho-
dc.contributor.authorYang, Sunghun-
dc.contributor.authorChoi, Heeseung-
dc.contributor.authorKim, Ig-Jae-
dc.contributor.authorLee, Sangyoun-
dc.date.accessioned2025-12-30T02:00:52Z-
dc.date.available2025-12-30T02:00:52Z-
dc.date.created2025-11-25-
dc.date.issued2025-06-10-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/153920-
dc.description.abstractOpen-vocabulary semantic segmentation aims to assign pixel-level labels to images across an unlimited range of classes. Traditional methods address this by sequentially connecting a powerful mask proposal generator, such as the Segment Anything Model (SAM), with a pre-trained vision-language model like CLIP. But these two-stage approaches often suffer from high computational costs, memory inefficiencies. In this paper, we propose ESC-Net, a novel one-stage open-vocabulary segmentation model that leverages the SAM decoder blocks for class-agnostic segmentation within an efficient inference framework. By embedding pseudo prompts generated from image-text correlations into SAM’s promptable segmentation framework, ESC-Net achieves refined spatial aggregation for accurate mask predictions. Additionally, a Vision-Language Fusion (VLF) module enhances the final mask prediction through image and text guidance. ESC-Net and PASCAL-Context, outperforming prior methods in both efficiency and accuracy. Comprehensive ablation studies further demonstrate its robustness across challenging conditions.-
dc.publisherIEEE-
dc.titleEffective SAM Combination for Open-Vocabulary Semantic Segmentation-
dc.typeConference-
dc.identifier.doi10.1109/cvpr52734.2025.02429-
dc.description.journalClass1-
dc.identifier.bibliographicCitation2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.26081 - 26090-
dc.citation.title2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)-
dc.citation.startPage26081-
dc.citation.endPage26090-
dc.citation.conferencePlaceUS-
dc.citation.conferencePlaceNashville, TN, USA-
dc.citation.conferenceDate2025-06-10-
dc.relation.isPartOf2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)-

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE