Abstract:
This research has the objective to study and select an efficient algorithm for speaker independent Thai numeral word recognition among the Dynamic Time Warping (DTW), Hidden Markov Model (HMM), and Neural Network (NN). All three methods are composed of 4 steps: Preprocessing, Feature Measurement, Pattern Classification, and Decision Making. The first main consideration is the endpoint detection techniques in preprocessing step that used different details among those three methods but all of them were based on energy level measurement. For feature measurement step, DTW used the discrete Harley transform to extract required parameters, but HMM used LPC of order 10 in accordance with the vector quantization (VQ) of 64 codebooks to compute its essential features, and NN used also LPC of order 10 to measure its necessary parameters. In pattern classification step, DTW used its time warping algorithm to create pattern, and 3 states of hidden Markov model was used to construct pattern m HMM, but the backpropagation algorithm was executed to form the pattern. The Nearest Neighbor condition was set for DTW in decision making step. For HMM, this step is more complicate than another by using the Viterbi algorithm. The most simple criteria for decision should certainly is that of NN by using the minimum error distance. To test and compare those three methods, the separated speech training set and testing set and 2 were composed of both male and female speakers within the range of 18 to 25 years of age. The training set and testing set 1 were the same speakers group but different data. The testing set 2 was another speaker group. The number of each set was varied betweeb those methods: 20, 20, and 20 for DTW: 45, 45, and 10 for HMM; and 30, 30, and 12 for NN, respectively. The average recognition rates of each set were : 90.30%, 89.70% and 84.00% fro HMM with 45 reference samples; and 98.20%, 84.30%, and 89.40% for NN with 30 references samples, respectively.