DSpace at KIST: Unsupervised speaker indexing using generic models

Browse

DSpace at KISTKIST Article 2005

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kwon, S	-
dc.contributor.author	Narayanan, S	-
dc.date.accessioned	2024-01-21T04:34:00Z	-
dc.date.available	2024-01-21T04:34:00Z	-
dc.date.created	2021-09-01	-
dc.date.issued	2005-09	-
dc.identifier.issn	1063-6676	-
dc.identifier.uri	https://pubs.kist.re.kr/handle/201004/136205	-
dc.description.abstract	Unsupervised speaker indexing sequentially detects points where a speaker identity changes in a multispeaker audio stream, and categorizes each speaker segment, without any prior knowledge about the speakers. This paper addresses two challenges: The first relates to sequential speaker change detection. The second relates to speaker modeling in light of the fact that the number/identity of the speakers is unknown. To address this issue, a predetermined generic speaker-independent model set, called the sample speaker models (SSM), is proposed. This set can be useful for more accurate speaker modeling and clustering without requiring training models on target speaker data. Once a speaker-independent model is selected from the generic sample models, it is progressively adapted into a specific speaker-dependent model. Experiments were performed with data from the Speaker Recognition Benchmark NIST Speech corpus (1999) and the HUB-4 Broadcast News Evaluation English Test material (1999). Results showed that our new technique, sampled using the Markov Chain Monte Carlo method, gave 92.5% indexing accuracy on two speaker telephone conversations, 89.6% on four-speaker conversations with the telephone speech quality, and 87.2% on broadcast news. The SSMs outperformed the universal background model by up to 29.4% and the universal gender models by up to 22.5% in indexing accuracy in the experiments of this paper.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Unsupervised speaker indexing using generic models	-
dc.type	Article	-
dc.identifier.doi	10.1109/TSA.2005.851981	-
dc.description.journalClass	1	-
dc.identifier.bibliographicCitation	IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, v.13, no.5, pp.1004 - 1013	-
dc.citation.title	IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING	-
dc.citation.volume	13	-
dc.citation.number	5	-
dc.citation.startPage	1004	-
dc.citation.endPage	1013	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.identifier.wosid	000231517400025	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalResearchArea	Engineering	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	generic models	-
dc.subject.keywordAuthor	localized search algorithm (LSA)	-
dc.subject.keywordAuthor	Markov chain Monte Carlo (MCMC) method	-
dc.subject.keywordAuthor	maximum a posteriori (MAP)	-
dc.subject.keywordAuthor	sample speaker models (SSM)	-
dc.subject.keywordAuthor	universal background model (UBM)	-
dc.subject.keywordAuthor	universal gender models (UGM)	-
dc.subject.keywordAuthor	unsupervised speaker indexing	-

Appears in Collections:: KIST Article > 2005

Export: RIS (EndNote); XLS (Excel); XML

Show Simple Item Record

KIST Library Institutional Repository

Browse

BROWSE