Multi Layer Neural Network Based Automatic Word Recognition for Bangla Spoken Language

Istiak Ahmed, Md; Ridwan, Iffatur; Ahmed Fuad, Tanvir

MIST Central Library Repository

MIST Repository Home
→
Department of Computer Science and Engineering (CSE)
→
Bachelor's Thesis
→
View Item

Multi Layer Neural Network Based Automatic Word Recognition for Bangla Spoken Language

Istiak Ahmed, Md; Ridwan, Iffatur; Ahmed Fuad, Tanvir

URI: http://hdl.handle.net/123456789/149

Date: 2014-12

Abstract:

Automatic speech recognition (ASR) known as speech recognition is a computer technology that enables a device to recognize and understand spoken words and sentences, by digitizing the sound and matching its pattern against the stored patterns. In short, it is the conversion of spoken speech to text. Currently available devices are largely speaker-dependent and can recognize discrete speech better than the normal (continuous) speech. Speaker independent system recognizes speech of indeﬁnite multiple people. In our research, we have used a system which is speaker independent and can detect continuous speech. Their major applications are in assistive for helping peopleinworkingaroundtheirdisabilities. OurproposedBanglaspeechsystem,based on MFCC+Neural Network+Triphone is a new approach towards the ﬁeld of Bangla ASR system. For this thesis work, we have prepared a Bangla speech recognition system of Bangla ASR. Most of the Bangla ASR system uses a small number of speakers, but 30 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, Mel-Frequency Cepstral Coefﬁcients (MFCCs) and the result based on (recognized by) Neural Network are inputted to the Hidden Markov Model (HMM) based classiﬁers for obtaining speech recognition performance. Other than the traditional MFCC triphone model; a new method that have used Neural Network based triphone model had been experimented to get better ASR performance. We used k-mean clustering for the proposed method. From the experimental results, word correct rate and word accuracy for male and female voices distinctly provide much better result for the proposed model based on Neural Network than MFCC-38 as well as MFCC-39. So, our proposed system is in favor of gender independent fact. For male and female voices collectively, sometimes MFCC-39 based model and sometimes Neural Network based model shows better word accuracy and correct rate.

Description:

We are thankful to Almighty Allah for his blessings for the successful completion of our thesis. Ourheartiestgratitude,profoundindebtednessanddeeprespectgotooursupervisor, Dr. Mohammad Nurul Huda, Professor & MSCSE Coordinator, Department of Computer Science and Engineering, United International University (UIU), for his constant supervision, affectionate guidance and great encouragement and motivation. His keen interest on the topic and valuable advices throughout the study was of great help in completing thesis. We are especially grateful to the Department of Computer Science and Engineering (CSE) of Military Institute of Science and Technology (MIST) for providing their all out support during the thesis work. Finally, we would like to thank our families and our course mates for their appreciable assistance, patience and suggestions during the course of our thesis.

Show full item record