DSpace at KIST: RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation

RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation

Abstract: Co-speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule-based and deep learning approaches to generate realistic gestures for virtual avatars and human-computer interaction. RIDGE employs a high-fidelity rule base, generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high-confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to a rule-based baseline of 0.6 and an end-to-end: 0.52, while the ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE's potential to enhance gesture synthesis. Project Url: .

Show Full Item Record

KIST Library Institutional Repository