DSpace at KIST: 이종 메모리 시스템을 활용한 비용 효율적인 KV-Cache 방법

Browse

DSpace at KISTKIST Conference Paper 2024

Full metadata record

DC Field	Value	Language
dc.contributor.author	박소희	-
dc.contributor.author	권준구	-
dc.contributor.author	조정희	-
dc.contributor.author	박성식	-
dc.date.accessioned	2025-02-21T01:00:15Z	-
dc.date.available	2025-02-21T01:00:15Z	-
dc.date.created	2025-02-11	-
dc.date.issued	2024-12-19	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/151776	-
dc.identifier.uri	https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE12042191	-
dc.description.abstract	Large language model (LLM) 기반 생성형 인공지능 모델은 인공지능 영역을 넘어 우리의 일상에 큰 영 향을 미치고 있다. LLM의 높은 활용성에도 불구하고, LLM이 널리 활용되기 위해서는 추론 효율 개선이 절 실히 요구되고 있다. KV-cache는 LLM의 추론 효율을 높이기 위한 기술로써 많은 관심을 받고 있다. 하지만, LLM의 토큰 크기의 증가에 따라 현재 high bandwidth memory (HBM) 등의 동종 (homogenous) 메모리 시스템으로는 비용, 전력 등 여러 측면에서 KV-cache의 사용이 어려워지고 있다. 이를 해결하기 위해 우리 는 이종 메모리를 활용한 시스템에서 KV-cache를 사용하는 방법에 대해 제안한다.	-
dc.language	Korean	-
dc.publisher	한국정보과학회	-
dc.title	이종 메모리 시스템을 활용한 비용 효율적인 KV-Cache 방법	-
dc.type	Conference	-
dc.description.journalClass	2	-
dc.identifier.bibliographicCitation	2024 한국소프트웨어종합학술대회, pp.1136 - 1138	-
dc.citation.title	2024 한국소프트웨어종합학술대회	-
dc.citation.startPage	1136	-
dc.citation.endPage	1138	-
dc.citation.conferencePlace	KO	-
dc.citation.conferencePlace	여수엑스포컨벤션센터	-
dc.citation.conferenceDate	2024-12-18	-
dc.relation.isPartOf	2024 한국소프트웨어종합학술대회 논문집	-

Appears in Collections:: KIST Conference Paper > 2024

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE