DSpace at KIST: Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

Browse

DSpace at KISTKIST Conference Paper 2023

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Hanjae	-
dc.contributor.author	Lee, Jiyoung	-
dc.contributor.author	Park, Seongheon	-
dc.contributor.author	Sohn, Kwanghoon	-
dc.date.accessioned	2024-04-18T05:30:46Z	-
dc.date.available	2024-04-18T05:30:46Z	-
dc.date.created	2024-04-18	-
dc.date.issued	2023-10	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/149671	-
dc.description.abstract	Compositional zero-shot learning (CZSL) aims to recognize unseen compositions with prior knowledge of known primitives (attribute and object). Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. CoT employs object and attribute experts in distinctive manners to generate representative embeddings, using the visual network hierarchically. The object expert extracts representative object embeddings from the final layer in a bottom-up manner, while the attribute expert makes attribute embeddings in a top-down manner with a proposed object-guided attention module that models contextuality explicitly. To remedy biased prediction caused by imbalanced data distribution, we develop a simple minority attribute augmentation (MAA) that synthesizes virtual samples by mixing two images and oversampling minority attribute classes. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the effectiveness of CoT in improving visual discrimination and addressing the model bias from the imbalanced data distribution. The code is available at https://github.com/HanjaeKim98/CoT.	-
dc.language	English	-
dc.publisher	IEEE COMPUTER SOC	-
dc.title	Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning	-
dc.type	Conference	-
dc.identifier.doi	10.1109/ICCV51070.2023.00522	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	IEEE/CVF International Conference on Computer Vision (ICCV), pp.5652 - 5662	-
dc.citation.title	IEEE/CVF International Conference on Computer Vision (ICCV)	-
dc.citation.startPage	5652	-
dc.citation.endPage	5662	-
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	Paris, FRANCE	-
dc.citation.conferenceDate	2023-10-02	-
dc.relation.isPartOf	2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV	-
dc.identifier.wosid	001159644305085	-
dc.identifier.scopusid	2-s2.0-85179035883	-

Appears in Collections:: KIST Conference Paper > 2023

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE