Full metadata record

DC Field Value Language
dc.contributor.authorKim, Dongjin-
dc.contributor.authorKim, Woojeong-
dc.contributor.authorKim, Suhyun-
dc.date.accessioned2024-08-26T01:00:37Z-
dc.date.available2024-08-26T01:00:37Z-
dc.date.created2024-08-26-
dc.date.issued2023-12-
dc.identifier.issn1049-5258-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/150504-
dc.identifier.urihttps://neurips.cc/virtual/2023/poster/73014-
dc.description.abstractBatch Normalization is commonly located in front of activation functions, as proposed by the original paper. Swapping the order, i.e., using Batch Normalization after activation functions, has also been attempted, but its performance is generally not much different from the conventional order when ReLU or a similar activation function is used. However, in the case of bounded activation functions like Tanh, we discovered that the swapped order achieves considerably better performance than the conventional order on various benchmarks and architectures. This paper reports this remarkable phenomenon and closely examines what contributes to this performance improvement. By looking at the output distributions of individual activation functions, not the whole layers, we found that many of them are asymmetrically saturated. The experiments designed to induce a different degree of asymmetric saturation support the hypothesis that asymmetric saturation helps improve performance. In addition, Batch Normalization after bounded activation functions relocates the asymmetrically saturated output of activation functions near zero, enabling the swapped model to have high sparsity, further improving performance. Extensive experiments with Tanh, LeCun Tanh, and Softsign show that the swapped models achieve improved performance with a high degree of asymmetric saturation. Finally, based on this investigation, we test a Tanh function shifted to be asymmetric. This shifted Tanh function that is manipulated to have consistent asymmetry shows even higher accuracy than the original Tanh used in the swapped order, confirming the asymmetry's importance. The code is available at https://github.com/hipros/tanh_works_better_with_asymmetry.-
dc.languageEnglish-
dc.publisherNEURAL INFORMATION PROCESSING SYSTEMS (NIPS)-
dc.titleTanh Works Better With Asymmetry-
dc.typeConference-
dc.description.journalClass1-
dc.identifier.bibliographicCitation37th Conference on Neural Information Processing Systems (NeurIPS)-
dc.citation.title37th Conference on Neural Information Processing Systems (NeurIPS)-
dc.citation.conferencePlaceUS-
dc.citation.conferencePlaceNew Orleans, LA-
dc.citation.conferenceDate2023-12-10-
dc.relation.isPartOfADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023)-
dc.identifier.wosid001229826601040-
Appears in Collections:
KIST Conference Paper > 2023
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE