Abstract:
Spoken language recognition (SLR) has been of increasing interest in multilingual speech recognition as a pre-process for identifying the languages of speech utterances. Most existing SLR approaches apply statistical modeling techniques with acoustic, phonotactic and prosodic features. According to the studies of relationship between phonological features (PFs) and language, this thesis uses PF as the linguistic information to capture acoustic characteristics and to represent phonotactic information from the patterns of PF transition in different languages. The current state-of-the art system is the fusion of different sub-systems. The proposed SLR system combining four sub-systems: 1) the phone sequence modeling followed by the vector space model (PRVSM), 2) lattice-SVM system, 3) The phonotactic SLR approach using co-occurrence of PFs, and 4) the SLR sub-system based on the latent-dynamic conditional random field (LDCRF) model using PFs. In the phonotactic SLR systems based on the Support Vector Machine (SVM) modeling, term weighting on the supervector of n-gram probabilities is critical to the recognition performance because the weighting prevents the SVM kernel from being dominated by a few large probabilities. This thesis focuses on enhancing the SLR performance by incorporating with term weighting function on the supervector entities. The combination of redundancy of term frequency (rd) and logarithm of term frequency (logtf) is proposed as the effective term weighting functions combining the local and global weighting. It can effectively eliminate the redundancy of unit frequency co-occurrence across languages. For the phonotactic approach using PF, the statistics of co-occurrence of PFs across different languages are captured. For the SLR systems based on LDCRF using PFs, the LDCRF model was employed to capture the dynamics of the PF attribute sequences for constructing language models. Baseline systems were conducted to evaluate the individual and the fused SLR system. The results showed improvements when combining the sub-systems and the results of integrating the PFs into SLR system can achieve better performance.