DSpace at KIST: Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Browse

DSpace at KISTKIST Article 2024

Full metadata record

DC Field	Value	Language
dc.contributor.author	Leem, Saebom	-
dc.contributor.author	Seo, Hyunseok	-
dc.date.accessioned	2024-05-31T08:30:09Z	-
dc.date.available	2024-05-31T08:30:09Z	-
dc.date.created	2024-05-31	-
dc.date.issued	2024-03	-
dc.identifier.issn	2159-5399	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/149987	-
dc.description.abstract	Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structure. In this work, we propose an attention-guided visualization method applied to ViT that provides a high-level semantic explanation for its decision. Our method selectively aggregates the gradients directly propagated from the classification output to each self-attention, collecting the contribution of image features extracted from each location of the input image. These gradients are additionally guided by the normalized self-attention scores, which are the pairwise patch correlation scores. They are used to supplement the gradients on the patch-level context information efficiently detected by the self-attention mechanism. This approach of our method provides elaborate high-level semantic explanations with great localization performance only with the class labels. As a result, our method outperforms the previous leading explainability methods of ViT in the weakly-supervised localization task and presents great capability in capturing the full instances of the target class object. Meanwhile, our method provides a visualization that faithfully explains the model, which is demonstrated in the perturbation comparison test.	-
dc.language	English	-
dc.publisher	Association for the Advancement of Artificial Intelligence (AAAI)	-
dc.title	Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention	-
dc.type	Article	-
dc.identifier.doi	10.1609/aaai.v38i4.28077	-
dc.description.journalClass	3	-
dc.identifier.bibliographicCitation	Proceedings of the AAAI Conference on Artificial Intelligence, v.38, no.4, pp.2956 - 2964	-
dc.citation.title	Proceedings of the AAAI Conference on Artificial Intelligence	-
dc.citation.volume	38	-
dc.citation.number	4	-
dc.citation.startPage	2956	-
dc.citation.endPage	2964	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	other	-

Appears in Collections:: KIST Article > 2024

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE