Improving the accuracy of the GMM model in the form of the GMM-VSM system in the application of speech language recognition
Subject Areas : GeneralFahimeh GHasemian 1 * , Mohamad mahdi Homaion por 2
1 -
2 -
Keywords:
Abstract :
The GMM model is one of the most widely used and successful models in the field of automatic language recognition. In this article, a new model called Adapted Weight-GMM (AW-GMM) is presented. This model is similar to GMM, with the difference that the weight of its components in the form of GMM-VSM system is determined based on the strength of the components in differentiating one language from other languages. Also, due to the computational complexity in the GMM-VSM system in the case where a 2-component sequence is considered, a technique for constructing a 2-component sequence has been presented, which can be used to construct higher-order sequences as well. used The evaluations carried out on 4 languages English, Persian, French and German from OGI data show the effectiveness of the presented techniques.
Ziaei A., Ahadi S. M.,Mirrezaie S. M. and Yeganeh H., "Spoken Language Identification Using a New Sequence Kernel-based SVM Back-end Classifier", ISSPIT, 2008, pp.324-329.
Zissman M. A., "Comparision of Four Approaches to Automatic Language Identification of Telephone Speech", IEEE Transactions on Speech and Audio Processing, vol. 4, 1996, pp.31-44.
Li H., Ma B. and Lee C. H., "A Vector space modeling approach to spoken language identification," IEEE Transactions on Audio, Speech and Language Processing, vol. 15, 2007, pp.271-284.
Torres-Carrasquillo P. A., Singer E., Kohler M. A., Greene R. J., Reynolds D. A. and Deller J. A., "Approaches to Language Identification using Gaussian Mixture Models and Shifted Delta Cepstral Features", ICSLP, 2002, pp.89-92.
Tong, R.,Bin, M.,Zhu, D.,Li, H., Chng, E. S., "Integrating acoustic, prosodic and phonotactic features for spoken language identification," ICASSP, 2006, pp. 205-208.
Tong R., Ma B., Li H., and Chng E. S., "Target-Oriented Phone Tokenizers for Spoken Language Recognition", ICASSP 2008, pp. 200-203.
Richardson F. S., Campbell W. M., Torres-Carrasquillo P. A., “Discriminative N-gram selection for dialect recognition”, interspeech, 2009, pp. 192-195.
Muthusamy Y. K., Cole R. A., Oshika B. T., "The OGI multi-language telephone speech corpus", ICSLP, 1992.
Available at: http://htk.eng.cam.ac.uk/