Ali, Ghazanfar Kim, Woojoo Anwar, Muhammad Shahid Hwang, Jae-In Choi, Ahyoung 2025-09-17T01:35:19Z 2025-09-17T01:35:19Z 2025-09-16 2025-08 https://pubs.kist.re.kr/handle/201004/153162 In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers' motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction. English Institute of Electrical and Electronics Engineers Inc. Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans Article 10.1109/ACCESS.2025.3596328 1 IEEE Access, v.13, pp.145144 - 145157 IEEE Access 13 145144 145157 Y scie scopus 001556092100022 2-s2.0-105013052990 Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications Computer Science Engineering Telecommunications Article NONVERBAL BEHAVIOR APPEARANCE BEAT User experience Videos Motion capture Digital humans Three-dimensional displays Translation Semantics Multilingual Animation Contrastive learning Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities