RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation

Authors
Ali, GhazanfarKim, HwangYounHwang, Jae-In
Issue Date
2025-07
Publisher
John Wiley & Sons Inc.
Citation
Computer Animation & Virtual Worlds, v.36, no.4
Abstract
Co-speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule-based and deep learning approaches to generate realistic gestures for virtual avatars and human-computer interaction. RIDGE employs a high-fidelity rule base, generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high-confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to a rule-based baseline of 0.6 and an end-to-end: 0.52, while the ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE's potential to enhance gesture synthesis. Project Url: .
Keywords
HCI; virtual worlds; computer animation; co-speech gestures
ISSN
1546-4261
URI
https://pubs.kist.re.kr/handle/201004/153004
DOI
10.1002/cav.70034
Appears in Collections:
KIST Article > Others
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE