DSpace at KIST: Spike-based Q-learning in a non-von Neumann architecture

Browse

DSpace at KISTKIST Article 2026

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shin, Donghyuk	-
dc.contributor.author	Jo, Hyeongcheol	-
dc.contributor.author	Jang, Hyeseung	-
dc.contributor.author	Jeong, Yoo Ho	-
dc.contributor.author	Jeong, Yeonjoo	-
dc.contributor.author	Kwak, Joon Young	-
dc.contributor.author	Park, Jongkil	-
dc.contributor.author	Lee, Suyoun	-
dc.contributor.author	Kim, Inho	-
dc.contributor.author	Park, Jong-Keuk	-
dc.contributor.author	Park, Seongsik	-
dc.contributor.author	Jang, Hyun Jae	-
dc.contributor.author	Lee, Hyung-Min	-
dc.contributor.author	Kim, Jaewook	-
dc.date.accessioned	2026-03-27T08:00:43Z	-
dc.date.available	2026-03-27T08:00:43Z	-
dc.date.created	2026-03-24	-
dc.date.issued	2026-02	-
dc.identifier.issn	1662-4548	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/154526	-
dc.description.abstract	Non-von Neumann architectures overcome the memory-compute separation of von Neumann systems by distributing computation and memory locally, thereby reducing data-transfer bottlenecks and power consumption. These features are particularly advantageous for reinforcement learning (RL) workloads that rely on frequent value-function updates across large state-action spaces. When combined with event-driven spiking neural networks (SNNs), non-von Neumann architectures can further improve overall computational efficiency by leveraging the sparse nature of spike-based processing. In this study, we propose a hardware-feasible SNN-based non-von Neumann architecture that performs Q-learning, one of the most widely known reinforcement learning algorithms. The proposed architecture maps states and actions to individual neurons using one-hot encoding and locally stores each state–action pair's Q-value in the corresponding synapse. To enable each synapse to update its local Q-value based on the next state maximum Q stored in other synapses, a neuron group connected through a lateral inhibition structure is employed to produce the maximum Q, which is then globally transmitted to all synapses. A delay circuit is also added to align the next-state and current-state values to ensure temporally consistent updates. Each synapse locally generates a learning selection signal and combines it with the globally transmitted signals to update only the target synapse. The proposed architecture was validated through simulations on the Cart-pole benchmark, showing stable learning performance under low-bit precision and achieving comparable accuracy to software-based Q-learning with sufficient bit precision.	-
dc.language	English	-
dc.publisher	Frontiers Media S.A.	-
dc.title	Spike-based Q-learning in a non-von Neumann architecture	-
dc.type	Article	-
dc.identifier.doi	10.3389/fnins.2026.1738140	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	Frontiers in Neuroscience, v.20	-
dc.citation.title	Frontiers in Neuroscience	-
dc.citation.volume	20	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.identifier.wosid	001706552700001	-
dc.identifier.scopusid	2-s2.0-105031896694	-
dc.relation.journalWebOfScienceCategory	Neurosciences	-
dc.relation.journalResearchArea	Neurosciences & Neurology	-
dc.type.docType	Article	-
dc.subject.keywordPlus	IMPLEMENTATION	-
dc.subject.keywordAuthor	non-von Neumann architecture	-
dc.subject.keywordAuthor	neuromorphic architecture	-
dc.subject.keywordAuthor	SNN	-
dc.subject.keywordAuthor	reinforcement learning	-
dc.subject.keywordAuthor	Q-learning	-
dc.subject.keywordAuthor	cart-pole	-

Appears in Collections:: KIST Article > 2026

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE