Abstract:
Automaticspeechrecognition(ASR)knownasspeechrecognitionisacomputertechnology that enables a device to recognize and understand spoken words, by digitizing the sound and matching its pattern against the stored patterns. In short, it is the conversion of spoken words to text. Currently available devices are largely speaker-dependent and can recognize discrete speech better than the normal (continuous) speech. In our research, we have used a system which is speaker independent (recognize speech of indefinite multiple people) and candetectcontinuousspeech. Theirmajorapplicationsareinassistiveforhelpingpeoplein working around their disabilities.
Our proposed Bangla word system, based on LF-25 is a new approach towards the field of Bangla ASR system. For this thesis work, we have prepared a Bangla word recognition system of Bangla ASR. Most of the Bangla ASR system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, Mel-Frequency Cepstral Coefficients (MFCCs)andLocalFeatures(LFs)areinputtedtotheHiddenMarkovModel(HMM)based classifiers for obtaining word recognition performance.
Other than the traditional MFCC triphone model; a new method that have used LF based triphone model had been experimented to get better ASR performance. We used k-mean clustering for the proposed method. From the experimental results, word correct rate and word accuracy for male and female voices distinctly provide much better result for LF-25 than MFCC-38 as well as MFCC-39. So, our proposed system is in favor of gender independent fact. For male and female voices collectively, sometimes MFCC-39 based model and sometimes LF-25 based model shows better word accuracy and correct rate.