Visit: www.geocities.com/chinna_chetan05/forfriends.html
E_ BANKING USING SPEAKER RECOGNITION SYSTEM ABSTRTACT: Many people today have access to their company’s information system by logging in from home. Also Internet services and telephone banking are widely used by corporate and private sectors. Therefore to protect one’s resources or information with simple password is not reliable and secure in the world of today. Biometrics are methods for recognizing a user based upon his/her unique physiological and/or behavioral characteristics. Voice signal as unique behavioral characteristics is presented in this paper for speaker verification over telephone lines using artificial neural network (ANN) for banking application. Here Multi-layer feed forward artificial neural network (ANN) system capable of verifying a speaker among the group of speakers is designed. Spectral density of recorded voice signal is used for characterization. Finally the feasibility of the speaker recognition system is tested. This system found more efficient in speaker recognition. 1. INTRODUCTION There is a vital need for speaker identification in all spheres of life. The most important being that this system will enable people to have secure access to information and property. It has significant advantage that in electronic banking and Internet access. Countless money is lost each year due to white-collar crime, fraud and embezzlement. In today’s complex economic times, businesses and individuals are both falling victims to these devastating crimes. Employees embezzle funds or steal goods from employers, then disappear or hide behind legal issues. Individuals can easily become helpless victims of identity theft, stock schemes and other scams that rob them of their money. One solution to avoid such white-collar crimes and shorten the lengthy time in locating and serving perpetrators with a judgment is by use of biometrics techniques for verifying individuals. Artificial neural network (ANN) are intelligence systems that are related in some way to a simplified biological model of human brain. Attenuation and distortion of voice signals exists over telephone lines and artificial neural network, despite a nonlinear, noisy and un -stationary environment, is still good at recognizing and verifying unique characteristics of signal such as speech. Speaker recognition involves speaker identification or speaker verification based on his\her voice in the form of speech. Speaker recognition is the generic term used for two related problems: 1. Speaker Identification: the problem is to determine the identity of a speaker from a known group of (N) possible speakers. 2. Speaker Verification: basically the same problem as speaker identification, except that claimed identity is also given and the problems are “merely” to confirm or disconfirm the identity claim. Speaker recognition problem using ANN is divided into two parts i)
Feature extraction
ii) Pattern matching.
1 Email:
[email protected]
Visit: www.geocities.com/chinna_chetan05/forfriends.html The Text dependant audio signals are recorded over telephone lines for different speakers. In feature extraction signal – processing toolbox of MATLAB is used to convert
recorded sound files to a presentable form as input vector to a neural network. In pattern matching, the output of neural network identifies and verifies unique characteristics of the features of speech signal. The feature extraction, the neural network architecture and the software and hardware involved in the development of speaker identification and verification system are described in this paper. First few sections of this paper are dedicated to speaker recognition system architecture and later its application in e_banking is discussed.
3.SYSTEM CONCEPT The speaker recognition system over telephone lines is investigated in this paper using artificial neural network shown in figure 1.
Figure1: Block Diagram of the Speaker recognition system using an ANN In this paper, the speaker recognition system reported is a text-dependant type. The system is trained on a group of people to be identified by each person speaking out of same phrase .The voices is recorded on a standard 16-bit computer sound card from telephone handset receiver. Although the frequency of human voice ranges from 0 KHz to 20 KHz, most of signal content lies in 0.3 KHz to 4 KHz range. The frequency over the telephone lines is limited to 0.3 KHz to 3.4 KHz and this is the frequency band of interest in this paper. Therefore, a sampling rate of 16 KHz satisfying the Nyquist criteria is used. The voices are stored as sound files on the computer. Digital processing techniques are used to convert sound files to a presentable form as an input vectors to neural network. The output of neural network verifies the speaker in the group. 3. FEATURE EXTRACTIONS Speaker recognition over telephone network present the many challenges such as : 1. Variations in handset microphones, which result in severe mismatches between data gathered from these microphones. 2. Signal distortion due to telephone cannel. 2 Email:
[email protected]
Visit: www.geocities.com/chinna_chetan05/forfriends.html 3. Inadequate control over speaker/speaking conditions.
The bare audio signal cannot be fed into the neural network due to that several speaker may produce similar signal. The process of feature extraction consists of obtaining characteristics parameter of a signal to be used to classify the signal. For speaker recognition, the features extracted from a speech signal should be consistent with regard to the desired speaker while exhibiting large deviations from the other speaker. Here in feature extraction signal-processing toolbox of MATLAB is used to convert recorded sound files to a presentable form as input vector to a neural network. Feature like spectral density gives different representation for different speaker for same text. Here power spectral density of two different speakers uttering same word is shown in figure 2 for speaker X and figure 3 for speaker Y.
Figure 2: PSD of Speaker X of Speaker Y
Figure 3: PSD
From the figures 2 and figure 3 it can be seen that the power spectral density (PSDs) of the speaker X and speaker Y differs from each other. 4. PATTERN MATCHING Artificial Neural network (ANNs) are intelligent system that are related in some way to a simplified biological model of human brain. They are composed of many simple elements, called neural neurons, operating in parallel and connected to each other by some multipliers called the connection weights or strengths. Neural networks are trained by adjusting values of these connection weights between the neurons.
3 Email:
[email protected]
Visit: www.geocities.com/chinna_chetan05/forfriends.html Neural networks have a self learning capability, are fault tolerant and noise immune, and have application in system identification, pattern recognition, classification, speech recognition, image processing, etc. In this application of speaker recognition, ANN is used for pattern matching. The performance of feed forward artificial neural network is investigated for this application.
A three layer feed forward neural network with a sigmoidal hidden layer followed by a liner output layer is used in this application for pattern matching. Error back propagation algorithm is used for this purpose. In this application, an adoptive learning rate is used, i.e. the learning rate is adjusted during training to enhance faster global convergence.
Figure 4: The Multi layer feed forward (MPL) neural network. The MPL network in figure 4 is constructed in MATLAB 6P1 environment. The input to the MPL network is vector containing the PSDs. 10 hidden nodes is used. The number of output nodes depends on the number of speaker. An initial learning rate, an allowable error and maximum number of training cycles/epochs are parameter that is specified during the training phase to a MATLAB neural Network.
5.SPEAKER RECOGNITION APPLICATION IN E_BANKING The most straightforward way to employ speaker recognition is in the cases when one has to gain access to some secure bank account. Voice is completely compatible with the existing transmission protocols via telephone channels; therefore no special adaptations of the system (besides the installment of a system) are necessary. For the time being such a service is restricted to operations within the accounts maintained by a single individual. One can check the status of their account, transfer money between ones own saving accounts, etc.
4 Email:
[email protected]
Visit: www.geocities.com/chinna_chetan05/forfriends.html Here voice samples of different users are recorded uttering a same phrase over controlled and uncontrolled conditions. Users who want to use his account, utters a same phrase over telephone line. The speaker recognition system identifies a particular user is a particular account holder and allows him to access the account. If a particular user is not an account holder, i.e. his voice didn’t matches with any particular person in a group of uses then system disconfirms his identity and not allow him to access the account.
6.EXPRIMENTAL RESULT: The MPL network is trained with the PSDs of ten voice samples recorded at different instance of time under controlled and uncontrolled speaking conditions of ten different speaker uttering the same phrase at all times. Controlled speaking conditions refer to noise and distortion free conditions unlike uncontrolled speaking conditions, which have noise and distortion over transmission lines. The number of PSDs point for each sample is 1000. An adoptive learning rate is used for MPL network. The initial learning rate is 0.01. The allowable sum squared error and maximum number of epochs specified to the MATLAB neural network program is 0.01 and 10000 respectively. It is found that sum squared error goal is reached within 1000 epochs. A success rate of 90% is achieved when the trained MPL network is tested with same samples used in the training phase. However, when untrained samples are used, only a nearly 70% success rate is obtained. This is due to inconsistency in the PSDs of input samples with those used in training phase. The MPL network is also tested with unseen voice samples of people who are not included in the training set and network successfully classified this voice sample as unidentified. 7. CONCLUSIONS Use of artificial neural network in speaker recognition system is proved to be a fair amount of success. Using features like pitch, autocorrelation and cestrum the success rate of this system can be increased. This concept of speaker recognition has variety of applications in the fields such as e –banking. 8. REFERENCES 1.Venayagamoorty GK, Sundepersadh N , “Comparison of text – dependent speaker identification methods for short distance telephone lines using artificial neural network “ , Proceedings of IEEE neural network letter 2000, pp 253 to 258. 2. Lawrence R. Rabiner and Ronaid W. Schafer, Digital Processing of Speech Signals, Prentice- Hall Inc. 3. O. Farooq and S. Datta, Speech Recognition with Emphasis on Wavelet based Feature 5 Email:
[email protected]
Visit: www.geocities.com/chinna_chetan05/forfriends.html Extraction, IETE Journal of Research, Vol. 1, January-February, 2002, pp. 313. 4. Dr. Chen-Han Sung & William C. Jones, III, A Speech Recognition System Featuring Neural Network Processing of Global Lexical Features, IEEE Conference Proceeding, Vol. 11, pp. 437-439.
6 Email:
[email protected]