Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yun, Guhnoo | - |
dc.contributor.author | Yoo, Juhan | - |
dc.contributor.author | Kim, Kijung | - |
dc.contributor.author | Lee, Jeong ho | - |
dc.contributor.author | Kim, Dong Hwan | - |
dc.date.accessioned | 2024-01-12T02:44:48Z | - |
dc.date.available | 2024-01-12T02:44:48Z | - |
dc.date.created | 2023-10-25 | - |
dc.date.issued | 2023-10-04 | - |
dc.identifier.uri | https://pubs.kist.re.kr/handle/201004/76367 | - |
dc.identifier.uri | https://doranlyong.github.io/projects/spanet/ | - |
dc.description.abstract | Recent studies show that self-attentions behave like lowpass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. We verify this by decomposing visual features into the frequency domain and combining them in a balanced manner. To handle this, we replace the balancing problem with a mask filtering problem in the frequency domain. Then, we introduce a novel tokenmixer named SPAM and leverage it to derive a MetaFormer model termed as SPANet. Experimental results show that the proposed method provides a way to achieve this balance, and the balanced representations of both high- and low-frequency components can improve the performance of models on multiple computer vision tasks. Our code is available at https://doranlyong.github.io/projects/spanet/. | - |
dc.publisher | IEEE | - |
dc.title | SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation | - |
dc.type | Conference | - |
dc.identifier.doi | 10.1109/ICCV51070.2023.00562 | - |
dc.description.journalClass | 1 | - |
dc.identifier.bibliographicCitation | International Conference on Computer Vision (ICCV) | - |
dc.citation.title | International Conference on Computer Vision (ICCV) | - |
dc.citation.conferencePlace | FR | - |
dc.citation.conferencePlace | Paris Convention Centre | - |
dc.citation.conferenceDate | 2023-10-02 | - |
dc.relation.isPartOf | Proc. International Conference on Computer Vision (ICCV) | - |
dc.identifier.wosid | 001159644306035 | - |
dc.identifier.scopusid | 2-s2.0-85185868980 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.