DSpace at KIST: Robust visual speakingness detection using bi-level HMM

Browse

DSpace at KISTKIST Article 2012

Full metadata record

DC Field	Value	Language
dc.contributor.author	Tiawongsombat, P.	-
dc.contributor.author	Jeong, Mun-Ho	-
dc.contributor.author	Yun, Joo-Seop	-
dc.contributor.author	You, Bum-Jae	-
dc.contributor.author	Oh, Sang-Rok	-
dc.date.accessioned	2024-01-20T15:32:16Z	-
dc.date.available	2024-01-20T15:32:16Z	-
dc.date.created	2021-09-05	-
dc.date.issued	2012-02	-
dc.identifier.issn	0031-3203	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/129590	-
dc.description.abstract	Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches. (C) 2011 Elsevier Ltd. All rights reserved.	-
dc.language	English	-
dc.publisher	ELSEVIER SCI LTD	-
dc.title	Robust visual speakingness detection using bi-level HMM	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.patcog.2011.07.011	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	PATTERN RECOGNITION, v.45, no.2, pp.783 - 793	-
dc.citation.title	PATTERN RECOGNITION	-
dc.citation.volume	45	-
dc.citation.number	2	-
dc.citation.startPage	783	-
dc.citation.endPage	793	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.identifier.wosid	000296126000010	-
dc.identifier.scopusid	2-s2.0-80052968371	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	Visual voice activity detection	-
dc.subject.keywordAuthor	Mouth image energy	-
dc.subject.keywordAuthor	Speakingness detection	-
dc.subject.keywordAuthor	Bi-level HMM	-

Appears in Collections:: KIST Article > 2012

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE