Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tiawongsombat, P. | - |
dc.contributor.author | Jeong, Mun-Ho | - |
dc.contributor.author | Yun, Joo-Seop | - |
dc.contributor.author | You, Bum-Jae | - |
dc.contributor.author | Oh, Sang-Rok | - |
dc.date.accessioned | 2024-01-20T15:32:16Z | - |
dc.date.available | 2024-01-20T15:32:16Z | - |
dc.date.created | 2021-09-05 | - |
dc.date.issued | 2012-02 | - |
dc.identifier.issn | 0031-3203 | - |
dc.identifier.uri | https://pubs.kist.re.kr/handle/201004/129590 | - |
dc.description.abstract | Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches. (C) 2011 Elsevier Ltd. All rights reserved. | - |
dc.language | English | - |
dc.publisher | ELSEVIER SCI LTD | - |
dc.title | Robust visual speakingness detection using bi-level HMM | - |
dc.type | Article | - |
dc.identifier.doi | 10.1016/j.patcog.2011.07.011 | - |
dc.description.journalClass | 1 | - |
dc.identifier.bibliographicCitation | PATTERN RECOGNITION, v.45, no.2, pp.783 - 793 | - |
dc.citation.title | PATTERN RECOGNITION | - |
dc.citation.volume | 45 | - |
dc.citation.number | 2 | - |
dc.citation.startPage | 783 | - |
dc.citation.endPage | 793 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.identifier.wosid | 000296126000010 | - |
dc.identifier.scopusid | 2-s2.0-80052968371 | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Visual voice activity detection | - |
dc.subject.keywordAuthor | Mouth image energy | - |
dc.subject.keywordAuthor | Speakingness detection | - |
dc.subject.keywordAuthor | Bi-level HMM | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.