Abstract:
Nowadays, spectral feature extraction at a fixed frame rate is a highly popular technique for representing speech signals. However, some assumptions used by this technique are not suitable for natural speech. Also, the technique has various limitations in acquiring some types of acoustic properties. To avoid these limitations, a segmental representation method separates an acoustic speech signal into small segments according to the underlying phonemes before performing the feature extraction. In this thesis, an approach for extracting feature vectors using segmental representation has been proposed. By means of this approach, a 40-dimensional feature vector, consisting of 12 Mel Frequency Cepstral coefficients and an energy feature of three regions: the frontal region of the segment, and the rear region of the segment, together with the segment duration, is used to represent a speech segment, In the experiments, where 52 Thai phonemes are classified using Linear Discriminant Analysis, the classification accuracy is 66.14% when prior probabilities are used while it is 61.41% without prior probabilities. The best accuracy obtained using our segment-based approach is 9.19% higher than the one using a baseline frame-based approach, which is 56.95%. In addition, it is found as a result of our contribution analysis that features extracted from the middle region contribute the most to the classification.