RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation
- Authors
- Ali, Ghazanfar; Kim, HwangYoun; Hwang, Jae-In
- Issue Date
- 2025-07
- Publisher
- John Wiley & Sons Inc.
- Citation
- Computer Animation & Virtual Worlds, v.36, no.4
- Abstract
- Co-speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule-based and deep learning approaches to generate realistic gestures for virtual avatars and human-computer interaction. RIDGE employs a high-fidelity rule base, generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high-confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to a rule-based baseline of 0.6 and an end-to-end: 0.52, while the ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE's potential to enhance gesture synthesis. Project Url: .
- Keywords
- HCI; virtual worlds; computer animation; co-speech gestures
- ISSN
- 1546-4261
- URI
- https://pubs.kist.re.kr/handle/201004/153004
- DOI
- 10.1002/cav.70034
- Appears in Collections:
- KIST Article > Others
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.