Robust visual speakingness detection using bi-level HMM
- Authors
- Tiawongsombat, P.; Jeong, Mun-Ho; Yun, Joo-Seop; You, Bum-Jae; Oh, Sang-Rok
- Issue Date
- 2012-02
- Publisher
- ELSEVIER SCI LTD
- Citation
- PATTERN RECOGNITION, v.45, no.2, pp.783 - 793
- Abstract
- Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches. (C) 2011 Elsevier Ltd. All rights reserved.
- Keywords
- Visual voice activity detection; Mouth image energy; Speakingness detection; Bi-level HMM
- ISSN
- 0031-3203
- URI
- https://pubs.kist.re.kr/handle/201004/129590
- DOI
- 10.1016/j.patcog.2011.07.011
- Appears in Collections:
- KIST Article > 2012
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.