Full metadata record

DC Field Value Language
dc.contributor.authorKwon, S-
dc.contributor.authorNarayanan, S-
dc.date.accessioned2024-01-21T04:34:00Z-
dc.date.available2024-01-21T04:34:00Z-
dc.date.created2021-09-01-
dc.date.issued2005-09-
dc.identifier.issn1063-6676-
dc.identifier.urihttps://pubs.kist.re.kr/handle/201004/136205-
dc.description.abstractUnsupervised speaker indexing sequentially detects points where a speaker identity changes in a multispeaker audio stream, and categorizes each speaker segment, without any prior knowledge about the speakers. This paper addresses two challenges: The first relates to sequential speaker change detection. The second relates to speaker modeling in light of the fact that the number/identity of the speakers is unknown. To address this issue, a predetermined generic speaker-independent model set, called the sample speaker models (SSM), is proposed. This set can be useful for more accurate speaker modeling and clustering without requiring training models on target speaker data. Once a speaker-independent model is selected from the generic sample models, it is progressively adapted into a specific speaker-dependent model. Experiments were performed with data from the Speaker Recognition Benchmark NIST Speech corpus (1999) and the HUB-4 Broadcast News Evaluation English Test material (1999). Results showed that our new technique, sampled using the Markov Chain Monte Carlo method, gave 92.5% indexing accuracy on two speaker telephone conversations, 89.6% on four-speaker conversations with the telephone speech quality, and 87.2% on broadcast news. The SSMs outperformed the universal background model by up to 29.4% and the universal gender models by up to 22.5% in indexing accuracy in the experiments of this paper.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleUnsupervised speaker indexing using generic models-
dc.typeArticle-
dc.identifier.doi10.1109/TSA.2005.851981-
dc.description.journalClass1-
dc.identifier.bibliographicCitationIEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, v.13, no.5, pp.1004 - 1013-
dc.citation.titleIEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING-
dc.citation.volume13-
dc.citation.number5-
dc.citation.startPage1004-
dc.citation.endPage1013-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.identifier.wosid000231517400025-
dc.relation.journalWebOfScienceCategoryAcoustics-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalResearchAreaAcoustics-
dc.relation.journalResearchAreaEngineering-
dc.type.docTypeArticle-
dc.subject.keywordAuthorgeneric models-
dc.subject.keywordAuthorlocalized search algorithm (LSA)-
dc.subject.keywordAuthorMarkov chain Monte Carlo (MCMC) method-
dc.subject.keywordAuthormaximum a posteriori (MAP)-
dc.subject.keywordAuthorsample speaker models (SSM)-
dc.subject.keywordAuthoruniversal background model (UBM)-
dc.subject.keywordAuthoruniversal gender models (UGM)-
dc.subject.keywordAuthorunsupervised speaker indexing-
Appears in Collections:
KIST Article > 2005
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE