Visual Speech Recognition using 3D Lip Shape from Stereo Video

Title
Visual Speech Recognition using 3D Lip Shape from Stereo Video
Authors
이연주고혜승최귀원윤인찬
Keywords
visual speech recognition; 3D lip shape model; 3D reconstruction
Issue Date
2013-08
Publisher
The 7th Asian Pacific Conference on Biomechanics
Abstract
Recently, visual speech recognition techniques have been actively researched because visual information such as lip movement is effective to improve the performance of the automatic speech recognition in noisy environment. In addition, visual speech recognition techniques can be used as a useful communication tool for the people with voice impairment or hard-of-hearing people. Most current visual speech recognition methods have focused on two dimensional (2D) lip features obtained from a single lip image. This paper presents a novel method for visual speech recognition using 3D lip shape from stereo video. Calibrated stereo camera was used to obtain 3D information of lip shape. The proposed 3D lip shape feature is extracted by a model-based method to minimize the effects caused by head movements and the correspondence problem of stereo-based 3D reconstruction. To make 3D lip shape model, 3D motion marker data, which is ground-truth data, was acquired by multiple motion cameras and Principal Component Analysis was applied to the aligned 3D motion data. Figure 1 shows the process of the proposed 3D lip shape feature extraction. Lip feature points (LFPs) were extracted separately from the left and right images using a point extraction algorithm. From the extracted corresponding LFPs, 3D lip shape was reconstructed by triangulation [2]. Finally, 3D lip shape feature was extracted by 3D shape model fitting. For word recognition, the Hidden Markov Model algorithm was used. In experiments, stereo video data for two subjects was used and speech words consisted of consecutive five digits (0~4), which were pronounced in Korean. In the experimental results, the proposed 3D shape feature showed a better word recognition performance compared to a 2D shape feature.
URI
http://pubs.kist.re.kr/handle/201004/45920
Appears in Collections:
KIST Publication > Conference Paper
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE