Abstract:
Automatic speech recognition (ASR) known as speech recognition is a computer technology that enables a device to recognize and understand spoken words and sentences, by digitizing the sound and matching its pattern against the stored patterns. In short, it is the conversion of spoken speech to text. Currently available devices are largely speaker-dependent and can recognize discrete speech better than the normal (continuous) speech. Speaker independent system recognizes speech of indefinite multiple people. In our research, we have used a system which is speaker independent and can detect continuous speech. Their major applications are in assistive for helping peopleinworkingaroundtheirdisabilities. OurproposedBanglaspeechsystem,based on MFCC+Neural Network+Triphone is a new approach towards the field of Bangla ASR system. For this thesis work, we have prepared a Bangla speech recognition system of Bangla ASR. Most of the Bangla ASR system uses a small number of speakers, but 30 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, Mel-Frequency Cepstral Coefficients (MFCCs) and the result based on (recognized by) Neural Network are inputted to the Hidden Markov Model (HMM) based classifiers for obtaining speech recognition performance. Other than the traditional MFCC triphone model; a new method that have used Neural Network based triphone model had been experimented to get better ASR performance. We used k-mean clustering for the proposed method. From the experimental results, word correct rate and word accuracy for male and female voices distinctly provide much better result for the proposed model based on Neural Network than MFCC-38 as well as MFCC-39. So, our proposed system is in favor of gender independent fact. For male and female voices collectively, sometimes MFCC-39 based model and sometimes Neural Network based model shows better word accuracy and correct rate.