Full metadata record

DC Field Value Language
dc.contributor.authorTiawongsombat, P.-
dc.contributor.authorJeong, Mun-Ho-
dc.contributor.authorYun, Joo-Seop-
dc.contributor.authorYou, Bum-Jae-
dc.contributor.authorOh, Sang-Rok-
dc.date.accessioned2024-01-20T15:32:16Z-
dc.date.available2024-01-20T15:32:16Z-
dc.date.created2021-09-05-
dc.date.issued2012-02-
dc.identifier.issn0031-3203-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/129590-
dc.description.abstractVisual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches. (C) 2011 Elsevier Ltd. All rights reserved.-
dc.languageEnglish-
dc.publisherELSEVIER SCI LTD-
dc.titleRobust visual speakingness detection using bi-level HMM-
dc.typeArticle-
dc.identifier.doi10.1016/j.patcog.2011.07.011-
dc.description.journalClass1-
dc.identifier.bibliographicCitationPATTERN RECOGNITION, v.45, no.2, pp.783 - 793-
dc.citation.titlePATTERN RECOGNITION-
dc.citation.volume45-
dc.citation.number2-
dc.citation.startPage783-
dc.citation.endPage793-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.identifier.wosid000296126000010-
dc.identifier.scopusid2-s2.0-80052968371-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.type.docTypeArticle-
dc.subject.keywordAuthorVisual voice activity detection-
dc.subject.keywordAuthorMouth image energy-
dc.subject.keywordAuthorSpeakingness detection-
dc.subject.keywordAuthorBi-level HMM-
Appears in Collections:
KIST Article > 2012
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE