Full metadata record

DC Field Value Language
dc.contributor.authorShin, Donghyuk-
dc.contributor.authorJo, Hyeongcheol-
dc.contributor.authorJang, Hyeseung-
dc.contributor.authorJeong, Yoo Ho-
dc.contributor.authorJeong, Yeonjoo-
dc.contributor.authorKwak, Joon Young-
dc.contributor.authorPark, Jongkil-
dc.contributor.authorLee, Suyoun-
dc.contributor.authorKim, Inho-
dc.contributor.authorPark, Jong-Keuk-
dc.contributor.authorPark, Seongsik-
dc.contributor.authorJang, Hyun Jae-
dc.contributor.authorLee, Hyung-Min-
dc.contributor.authorKim, Jaewook-
dc.date.accessioned2026-03-27T08:00:43Z-
dc.date.available2026-03-27T08:00:43Z-
dc.date.created2026-03-24-
dc.date.issued2026-02-
dc.identifier.issn1662-4548-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/154526-
dc.description.abstractNon-von Neumann architectures overcome the memory-compute separation of von Neumann systems by distributing computation and memory locally, thereby reducing data-transfer bottlenecks and power consumption. These features are particularly advantageous for reinforcement learning (RL) workloads that rely on frequent value-function updates across large state-action spaces. When combined with event-driven spiking neural networks (SNNs), non-von Neumann architectures can further improve overall computational efficiency by leveraging the sparse nature of spike-based processing. In this study, we propose a hardware-feasible SNN-based non-von Neumann architecture that performs Q-learning, one of the most widely known reinforcement learning algorithms. The proposed architecture maps states and actions to individual neurons using one-hot encoding and locally stores each state–action pair's Q-value in the corresponding synapse. To enable each synapse to update its local Q-value based on the next state maximum Q stored in other synapses, a neuron group connected through a lateral inhibition structure is employed to produce the maximum Q, which is then globally transmitted to all synapses. A delay circuit is also added to align the next-state and current-state values to ensure temporally consistent updates. Each synapse locally generates a learning selection signal and combines it with the globally transmitted signals to update only the target synapse. The proposed architecture was validated through simulations on the Cart-pole benchmark, showing stable learning performance under low-bit precision and achieving comparable accuracy to software-based Q-learning with sufficient bit precision.-
dc.languageEnglish-
dc.publisherFrontiers Media S.A.-
dc.titleSpike-based Q-learning in a non-von Neumann architecture-
dc.typeArticle-
dc.identifier.doi10.3389/fnins.2026.1738140-
dc.description.journalClass1-
dc.identifier.bibliographicCitationFrontiers in Neuroscience, v.20-
dc.citation.titleFrontiers in Neuroscience-
dc.citation.volume20-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.identifier.wosid001706552700001-
dc.identifier.scopusid2-s2.0-105031896694-
dc.relation.journalWebOfScienceCategoryNeurosciences-
dc.relation.journalResearchAreaNeurosciences & Neurology-
dc.type.docTypeArticle-
dc.subject.keywordPlusIMPLEMENTATION-
dc.subject.keywordAuthornon-von Neumann architecture-
dc.subject.keywordAuthorneuromorphic architecture-
dc.subject.keywordAuthorSNN-
dc.subject.keywordAuthorreinforcement learning-
dc.subject.keywordAuthorQ-learning-
dc.subject.keywordAuthorcart-pole-
Appears in Collections:
KIST Article > 2026
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE