Abstract:
Speechrecognition(alsoknownasautomaticspeechrecognitionorcomputerspeechrecognition) converts spoken words to text. The term “voice recognition” is sometimes used to refer to recognition systems that must be trained to a particular speaker as is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech.
For the past two decades, research in speech recognition has been intensively carried out worldwide,spurredonbyadvancesinsignalprocessing,algorithms,architectures,andhardware. Speech recognition systems have been developed for a wide variety of applications, rangingfromsmallvocabularykeywordrecognitionoverdial-uptelephonelines,tomedium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation.
Inthispaper,weprepareaBanglaPhonemerecognitionsystemofBanglaAutomaticSpeech Recognition (ASR). Most of the Bangla ASR system uses a small number of speakers, but 30 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) are inputted to the hidden Markov model (HMM) based classifiers for obtaining phoneme recognition performance. It is shown from the experimental results that MFCCbased method of 39 dimensions provide higher phoneme correct rate and accuracy. Moreover, it requires fewer mixture components in the HMMs .
Moreover, this paper we review some of the key advances in several areas of automatic speechrecognition. Wealsoillustrate,byexamples,howthesekeyadvancescanbeusedfor continuous speech recognition of Bangla Language. Finally we elaborate the requirements in designing successful real-world applications and address technical challenges that need to be harnessed in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.