Practicum

November 2019
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Practicum as PDF for free.

More details

Words: 4,043
Pages: 10

Preview
Full text

JULY 2007

Tutor: Enric Guaus

PRACTICUM MUSIC TECHNOLOGY GROUP – IUA (UPF) Jaime Arroyo

ABSTRACT This work deals with the comparison of two similar techniques of supervised automatic genre classification based on the timbre descriptor Mel Frequency Cepstrum Coefficients. The first method was proposed by Aucouturier [1], and it was based on Gaussian Mixture Models. The second one, a variation of the first, is based on Hidden Markov Models. The database we use was originally proposed by Tzanekatis [3] and it is also used in Guaus [2].

INTRODUCTION The digitalisation of audio and compression technologies that have been developed the last decades and the wide spread use of computers connected to the Internet have made accessible huge amounts of music pieces to millions users. Electronic Music Distribution (EMD) has this as its main objective. A major challenge of EMD is to allow the shift from a mass-market approach to a personalized distribution approach. In fact, the amount of digital music urges for efficient ways to browse, organise and dynamically update collections. Moreover, genre has been traditionally the main top-level descriptor used by the industry to organize their collections. That’s why, associating a genre metadata to each title is a basic achievement for helping users find what they are looking for. Providing this digital link between the commonly unknown music language and people understood language is not a trivial task since genre remains a poorly defined concept and the boundaries between genres are often ambiguous. This is one of the problems dealt by the Music Information Retrieval (MIR) research field. Automatic classification algorithms are traditionally based in three basic steps as suggested in [2]: taxonomy definition, feature extraction and classification techniques. There exist two possible approaches to the classification problem: – The supervised approach has to define a set of genres and train a learning machine with manually labelled data. – The unsupervised approach for classification, analyze data and classifies it without additional manually annotations. Hence, the categories emerges from the data itself. Although there are a lot of descriptors based on different musical facets (timbre, rhythm…) (see [2] to

find a explanation and a comparison between some of them), the most commonly used and the most effective at the moment are those based on timbre. In this work is on one of the most used timbre descriptors: Mel Frequency Cepstrum Coefficients (MFCCs). Finally, there are also various machine learning algorithms, used in the supervised approach, that map the collection of songs into the given taxonomy. Some of them are: K-Nearest Neighbour (KNN), Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM). It can be found an overview of all genre classification systems in Scaringella [4]. Our study has been focused on the last two techniques. SCIENTIFIC BACKGROUND Mel Frequency Cepstrum Coefficients (MFFCs) Spectrum is defined as follows:

( )

S e jϖ = F {rx [m]} Where F{} denotes Fourier Transform and rx is the autocorrelation of x(n) and is defined as the following discrete convolution:

rx [m] = x N [m] ∗ x N [− m] Where:

x N [n] = x[n]w[n] Being x[n] the audio signal and w[n] a windowing function. Spectrum gives us information about energy of the signal x(n) in the different regions of the frequency domain. The cepstrum is the inverse Fourier Transform of the log-spectrum:

1 c[n] = 2π

ω = +π

log(S (e ω ))e ω dω ∫ ω π j

j n

=−

However, the Mel-cepstrum is obtained integrating in a ~ warped ω variable given by the Mel-scale function:

ω~ = g (ω ) , that must be normalized g : [− π , π ] → [− π , π ] and preserve the

to satisfy integration

limits, then we have:

1 c[ n] = 2π

ω~ = + π

log(S (e ω ))e ω ∫ ω π j

j ~ ·n

~ =−

To express the integral only in the variable

dω~

ω:

JULY 2007

Tutor: Enric Guaus

dω~ = g ' (ω )dω 1 c[n] = 2π

ω~ = + π

log(S (e ω ))e ∫ ω π j

jg (ω )· n

g ' (ω )dω

~ =−

That is approximately computed as:

c[n] ≅ 1 ≅ N

K −1 2

  j 2Kπk lg S  e ∑ k =0   

   2πk    2πk   cos  g  n g '       K    K 

The unnormalized mel-scale is given by:



µ (ω ) = 2595·lg1 + 

There exist different algorithms that estimate the parameters, some of them are: Expectation Maximization, Markov Chain Monte Carlo, Spectral Method, etc. In our work, we will use the EM algorithm with a Kmeans initialisation. Original explanation of EM algorithm can be found in Dempster [6]. Hidden Markov Models The state diagram of a Markov chain is represented as follows:

ωf s

  2π ·700 Hz 

So to normalize it:

g (ω ) = =

π µ (ω ) = µ (π )

ωf s   lg1 +  fs    2π ·700 Hz  lg1 +   2·700 Hz  π

A further explanation and justification of this deduction of the MFCCs can be found in Molau [5]. The low order coefficients account for the slowly changing spectral envelope, while the higher order ones describe the fast variations of the spectrum. Therefore, to obtain a timbre measure that is independent of pitch, only the first few coefficients have interest. Also, Gaussian Mixture Models Since a one dimension Gaussian or Normal probability density function is given by:

f X (x ) =

 ( x − µ )2 exp − 2σ 2 σ 2π  1

  = N (µ , σ )  

Where µ is the first moment (mean) and σ is the second one (standard desviation). In a GMM, the number of dimensions is not limited to one, being:

r f X ( x1 , x 2 ,..., x N ) = N (µ , Γ )

Where the mean is now a vector and the second order moment is now called covariance matrix. Then, the Gaussian mixture model is given by a weighted sum of M N-dimensional Gaussian densities that model an arbitrary constellation of vectors in a N dimension space. That is: M

f X (x ) = ∑ c m N (µ m , Γ m ) m =1

The parameters that have to be estimate are then: M coefficients c, M mean vectors µ (of N components each one) and M covariance matrixes Г (NxN). However, the covariance matrixes are often assumed as diagonal, so only N parameters are needed for each one.

Where x(t) is a random variable that represents the hidden state at time t. The number of possible values of the state random variable is a countable set (the state space). The Markov property establishes that the state at a given time t only depends on the state at time t-1. This dependence is given by a transition probably distribution. Under some circumstances (such timehomogeneous or finite state space) this distribution can be represented as a Transition Matrix SxS where S is the number of states. Moreover, the random variable y(t) represents the observation of the process and only depends on the state at the same time t. In the same way, the dependence between the current state and the observation is given by an arbitrary PDF for each state. However, to calculate likelihood probabilities between data and models we need to assume that this PDF’s are Gaussians or a mixture of Gaussians. In the simpler case of a Gaussian PDF, we only need to estimate the two first moments of the variable in each state. When a mixture of Gaussians is needed, we have to estimate also the Gaussian Mixture Matrix MxS (M is the number of Gaussians in the Mixture) that represents the probability of each sample of belonging to a specific Gaussian of the mixture conditioned to the state. We can use also the EM algorithm to estimate all the parameters of the model. Likelihood as a distance The two models described above are able to calculate for some new data, the probability of being generated by a specific estimated model. This is useful as a measure of “similarity”. But using this measure as a distance we find two formal problems: non-symmetry and non-zero distance from one song to

( )

itself D i, i > distances as:

0 . This can be easily solved computing

D(i, j ) = log(P(i | i )) + log(P(i | j ))

− log(P (i | j )) − log(P ( j | i ))

JULY 2007

Tutor: Enric Guaus

Where P(i|j) is the probability that data “i” would be generated by model “j”. Using this expression as showed in Shao [7], we force both conditions. Note that the signs have been set to adapt the concept of Maximum likelihood to the concept of Minimum distance. EVALUATION K-fold cross-validation K-fold cross-validation is the statistical practice that we have used to evaluate our classification systems. It consists in partitioning our database in K subsets and using K-1 of them to train the model while testing the other one. This process is repeated K times to test all the subsets. That means that if we have 90 songs of each genre and 10 different genres in our database and we perform a 10-fold cross-validation, in the first process we will take 81 songs of each genre to train our statistical models and the rest 9 songs of each genre will be classified. This process will be repeated 10 times to completely classify the entire database.

It has to be noted that with this implementation we can completely split the modelling process of the classification process, in which, access to the database is not needed. Aucouturier[1] doesn’t give some details about the decision criteria that we consider necessary to implement the classification algorithm. Once we have the distance of a song to all others, how we choose the predicted genre? Many options are possible between this two limit cases: Calculate a “genre distance” as the mean of all distances to songs of the same genre, and choose the genre that gives the lowest “genre distance”. Choose the genre of the song that gives the lowest “song distance”. In the results section it will be showed that the second option is better than the first one, but a better third, in fact, a combination of this two, will be presented. HMM Algorithm

DESCRIPTION

This variation is based on a S-state Hidden Markov Model. Firstly, a Hidden Markov Model is trained by all the training songs of a genre. To do this, we first cut all these in frames of long L and compute the first N MFC coefficients in the same way as we did before. Then, we train a randomly initialized model to obtain the genre model. That means: Transition Matrix SxS, Gaussian Mixture Matrix SxM, first and second order moments µ, σ of the observation distributions for each state. After doing this for each genre model, we compute the likelihood between each test song (set of MFCC vector frames) and the genre model. We choose as genre for each song those that have maximum likelihood. Here, no distance adaptation is needed. Alternatively, another algorithm has been implemented. Similarly as we did with the GMM algorithm, we have modelled every train song with a HMM and finally we have compute the likelihood between all the test-train pairs. The decision problem has been solved in the same way than we did in the previous section.

GMM Algorithm

IMPLEMENTATION DETAILS

Here we describe briefly the implemented algorithm as Aucouturier does in [1]. Every musical piece is cut into frames of long L and for each frame, we calculate a certain number N of the first MFC coefficients that form a vector. Then we have L vectors in a N dimension space that represent a song. After this, we model the flock of points with a Gaussian Mixture Model of M weighted and summed simple Gaussian densities, called components or states of the mixture. To compute the distances between each song we sample S samples and then we calculate the probability that this samples would be given by any of the other songs model. Finally we adapt this measure to the distance definition.

Here we explain some particular decisions we have taken to implement the algorithms described above.

Confusion matrix The results are typically visualized in a confusion matrix that consist on the same number of columns and rows where each one represents a different genre. Rows represent the actual genre and columns the prediction. After compute all the songs of a genre, the percentage of songs correctly predicted will be stored in the diagonal cell, and the wrong predictions will be stored in the same row but in the column of the predicted genre. Hence, in the ideal case we would expect a identity matrix. Since this never happens, an error rate will be compute as the percentage of songs with a incorrectly predicted genre.

The parameters of the GMM algorithm have been taken from Aucouturier[1]. That is: Long of frames (L): 50ms, 1024 at our rate (22050Hz) Number of MFC coefficients (N): 8 (or 4 coefficients plus 4 delta coefficients) Number of mixed Gaussians (M): 3 Number of samples in classification (S): 1000 The MATLAB functions we have used are: wavread: load a wav file of the hard disk.

JULY 2007

Tutor: Enric Guaus

Melcepst: calculate mel-cepstrum coefficients cutting the audio file in frames of long L and an overlapping of 50%. Gmm: Initializes the gmm model. Gaussmix: Calculate the model parameters Gmmsamp: Create samples of a given model. Gmmprob: compute the probability that a given data would belong to a given GMM model.

Both table 2 and table 5, show that the results using means of minim distances as classification criteria are always better than using mean of all distances. Blu blu cla cou

Apart from the common parameters, the parameters of the HMM algorithm are: Number of states (S): 5 Number of mixtures (M): 3

dis hip jaz met

And the used functions: Mhmm_em: Calculate the model parameters. Mhmm_logprob: compute the probability that a given data would belong to a given HMM model. K-fold cross validation (K): 10 Database: 90 thirty second fragments of each 10 genres: Blues, classical, country, disco, hip-hop, jazz, metal, pop, reagge and rock. The trickiest part of the source code deals with the Kfold cross validation. Many index and loops have been use to do it correctly. However, it’s not necessary pay excessive attention to this part to understand the algorithms proposed. RESULTS The first computation has been done using the GMM algorithm. The results are showed in table 1. In this case we have taken the first four coefficients and the first four differential coefficients. The likelihood has been computed as the mean of the three minimal distances. In the table 2 different resultant error rates are showed for other likelihood criteria. We observe the best results for the mean of 2 and 3 minimal distances. The next computation is showed on the table 3. It has been obtained using the HMM algorithm and taking the first 8 MFC coefficients. Here, all the songs of a singular genre have been used to train a genre model. Hence, the classification criterion is only the minimal distance between a song and each genre.

pop reg roc

0% 13%

dis

hip

jaz

met

pop

reg

roc

4%

2%

3%

9%

0%

3% 11%

9%

1%

0%

6%

0%

0%

0%

8% 30%

7%

9% 12% 18%

1% 80% 10%

cou

3%

1%

6%

0%

1%

1%

7% 38% 14%

4%

2% 13%

3%

1%

8% 11% 30%

2%

3%

8% 21% 12%

9% 10%

4%

4% 22%

4%

1% 51%

1%

4%

2%

8%

0%

3%

0%

3%

2% 72%

1%

0% 10%

0%

0%

9%

7%

6%

1%

0% 71%

6%

1%

4%

1%

8%

3% 22%

0%

0% 10% 48%

3%

7%

1% 18% 16% 14%

3%

7%

4%

9% 10% 16%

Table1. Results of the GMM algorithm for 4 coefficients and 4 differential coefficients and 3 minimal – mean distance likelihood. Error = 51,11% Mean Min 2min 3min 4min 5min 66% 53,77% 51,11% 51,11% 51,22% 51,11% error Table2. Errors of the GMM algorithm for different likelihood measurement. blu

cla

cou

dis

hip

jaz

met

pop

reg

roc

blu

42% 17%

2%

2%

2%

3% 16%

0%

4% 11%

cla

2% 91%

0%

1%

0%

2%

0%

0%

8% 17%

7%

8%

0% 26%

1% 17%

3% 14%

2%

4%

7% 26%

1% 20%

9% 19%

4%

3% 20% 13%

6%

cou dis hip jaz met pop reg roc

16%

1%

3% 20%

21%

0%

4%

2%

1%

8% 11%

6% 17%

1%

9%

2%

6%

10%

0%

0% 10%

0%

0% 79%

0%

0%

1%

0%

0%

6%

2%

3%

0% 79%

2%

1%

6% 13%

3%

1% 10% 21%

1%

7% 34%

30%

1% 13%

28%

3%

7%

4% 13%

2% 10% 16% 14%

4%

4%

Table3. Results of the HMM algorithm for 8 coefficients and genre modelling. Error 62,11% blu blu cla cou

The last calculation is a mix of the two described above and the results are showed on table 4. It has been used the HMM algorithm to model every song. Then, distances between songs are computed in the same way that we did in the first computation. However, in this case, the best result is achieved when we calculate the mean of the two minimal distances. Finally, table 5 shows the different error rates depending on the likelihood computation.

53%

cla

dis hip jaz met pop reg roc

cla

cou

dis

hip

jaz

met

pop

reg

Roc

2%

7%

1%

3%

3%

10%

0%

3%

12%

1% 86%

2%

1%

0%

7%

0%

0%

0%

3%

6%

8% 36%

12%

2%

9%

0%

7%

9%

12%

1%

0%

12% 51% 10%

1%

2%

9%

2%

11%

58%

2%

0%

4%

1%

2%

7%

22%

12%

1%

10%

11%

11% 38% 3%

4% 59%

0%

3%

2%

6%

4%

0%

0%

3%

2%

0% 74%

0%

1%

14%

1%

3%

4% 46%

6%

0%

0%

9%

10%

4%

1%

0% 71%

4%

1%

11%

9%

18%

1%

0%

8%

1%

12%

13%

9%

4%

10%

8%

6% 29%

JULY 2007

Tutor: Enric Guaus are different from timbre) finding a few similar songs gives more benefits in the classification.

Table 4. Results of the HMM algorithm for 8 coefficients and song modelling and 2 mean-min likelihood. Error = 45,53%

REFERENCES [1] Aucouturier J, Pachet F, “Representing musical 66,55% 48,55% 45,53% 45,67% 46% 46,33% genre: A state of art”, Journal of New Music Research, error Vol. 32, No. 1, 2003. Table2. Errors of the HMM algorithm for different [2] Guaus E, Herrera, P, “Comparing high level likelihood measurement and song modelling. descriptors for automatic genre classification”, ISMIR, 2007. CONCLUSIONS [3] Tzanekatis G, Cook P, “Musical Genre Classification of Audio Signals”, IEEE Transactions on In all the methods implemented, we can see that error Speech and Audio Processing, Vol. 10, No. 5, 2002. rates results are important. This can be consequence of [4] Scaringela N, Zoia G, Mlynek D, “Automatic genre the lack of sufficient timbre difference between some of classification of music content: a survey”, Signal the genres. In fact, the classical genre achieves very Processing Magazine, IEEE, Vol. 23, No. 2, 2006. good results due to this reason. Hence, rock is clearly [5] Molau S, Pitz M, Schlüter R, Ney H, “Computing confused since its timbres and textures have influenced Mel-frequency cepstral coefficients on the power all the others styles. However, we can see that in third spectrum” IEEE, 2001. computation, the correct genre is always the most often [6] Dempster A, Laird N, Rubin D. "Maximum decision likelihood from incomplete data via the EM algorithm". In second computation the resultant error rate is worse Journal of the Royal Statistical Society, Series but in some genres the results achieved is the best. This B, 39(1):1–38, 1977 can be interpreted in this way: In very consistently defined genres, taking mean of a lot ANNEX of the same genre songs is an advantage from finding a Codesource. singular song that is very similar. In the other hand, in very dispersive genres (where the common parameters Mean

Min

2min

3min

4min

5min

1.Code used for the first computation (table1) Test.m function [dades,c1010o,c10101,c10102,c10103,c10104,c10105] = test; [dades,c1010o,c10101,c10102,c10103,c10104,c10105]=globalprocedure(10,10) end Globalprocedure.m function [dades,c0,c1,c2,c3,c4,c5] = globalprocedure(cf,ngenres); lframe=1024; ntrain=90; dades=loaddades(lframe,ngenres,ntrain); [c0,c1,c2,c3,c4,c5] = samplingmethod(dades,cf) End Loaddades.m function genres = loaddades(lframe,ngenres,ntrain); audioDir='C:\Documents and Settings\Jaime\My Documents\data\train'; userGenres=strvcat('blu','cla','cou','dis','hip','jaz','met','pop','reg','roc'); for i=1:ngenres clear x; x=dir([audioDir,'\',userGenres(i,:),'\']); disp(num2str(i)); for j=1:ntrain clear Y; clear c; a=wavread([audioDir,'\',userGenres(i,:),'\',x(j+2).name]); Y(650000)=0; Y(1:650000)=a(1:650000); c(:,:)=melcepst(Y(:),22050,'Nd',4,lframe,floor(3*log(22050)),lframe/2,0,0.5);

JULY 2007

Tutor: Enric Guaus

genres(i,j) = gmm(8,3,'diag'); [genres(i,j).centres,genres(i,j).covars,genres(i,j).priors,lp(i,j)]=gaussmix(c(:,:),0.001,10.001,3); end end Samplingmethod.m function [c0,c1,c2,c3,c4,c5] = samplingmethod(dades,cf); long=1000; [X,Y]=size(dades); for z=1:cf for i=1:X for j=((z-1)*(Y/cf)+1):(z*(Y/cf)) data = gmmsamp(dades(i,j),long); for k=1:X if z>1 for l=1:((z-1)*(Y/cf)) model=gmmsamp(dades(k,l),long); s1= sum(log(gmmprob2(dades(i,j),data)+1)); %biaix de les dades AA A=test B=train s2= sum(log(gmmprob2(dades(k,l),model)+1)); %biaix del model BB s3= sum(log(gmmprob2(dades(k,l),data)+1)); %distancia bruta AB s4= sum(log(gmmprob2(dades(i,j),model)+1)); %distancia bruta BA d(i,j,k,l)=s1+s2-s3-s4; end end for l=z*(Y/cf)+1:Y model=gmmsamp(dades(k,l),long); s1= sum(log(gmmprob2(dades(i,j),data)+1)); %biaix de les dades AA A=test B=train s2= sum(log(gmmprob2(dades(k,l),model)+1)); %biaix del model BB s3= sum(log(gmmprob2(dades(k,l),data)+1)); %distancia bruta AB s4= sum(log(gmmprob2(dades(i,j),model)+1)); %distancia bruta BA d(i,j,k,l)=s1+s2-s3-s4; end order(1:Y) = sort(d(i,j,k,1:Y)); dg0(i,j,k)=mean(order(Y/cf+1:length(order))); dg1(i,j,k)=mean(order(Y/cf+1:Y/cf+1)); dg2(i,j,k)=mean(order(Y/cf+1:Y/cf+2)); dg3(i,j,k)=mean(order(Y/cf+1:Y/cf+3)); dg4(i,j,k)=mean(order(Y/cf+1:Y/cf+4)); dg5(i,j,k)=mean(order(Y/cf+1:Y/cf+5)); end [caca,genre0(i,j)]=min(dg0(i,j,:)); [caca,genre1(i,j)]=min(dg1(i,j,:)); [caca,genre2(i,j)]=min(dg2(i,j,:)); [caca,genre3(i,j)]=min(dg3(i,j,:)); [caca,genre4(i,j)]=min(dg4(i,j,:)); [caca,genre5(i,j)]=min(dg5(i,j,:)); disp([num2str(i),',',num2str(j),'--> ',num2str(genre4(i,j))]) end end end c0=confussionmatrix(genre0); c1=confussionmatrix(genre1); c2=confussionmatrix(genre2);

JULY 2007

Tutor: Enric Guaus

c3=confussionmatrix(genre3); c4=confussionmatrix(genre4); c5=confussionmatrix(genre5); confussionmatrix.m function matrix=confussionmatrix(genre); [U,V]=size(genre); matrix(U,U)=0; for i=1:U for j=1:U matrix(i,j)=length(find(genre(i,:)==j)); end end 2.Code used for the second computation (table 3) test.m function [c3,c10]=test; c10=tellmegenre(2048,90,10,10); end tellmegenre.m function c=tellmegenre(lframe,ntrain,ngenres,cf) audioDir='C:\Documents and Settings\Jaime\My Documents\data\train'; userGenres=strvcat('blu','cla','cou','dis','hip','jaz','met','pop','reg','roc'); Y=ntrain; N=ngenres; for z=1:cf models=HMMmodels(lframe,ngenres,1,((z-1)*(Y/cf)),z*(Y/cf)+1,Y); for i=1:N clear x; x=dir([audioDir,'\',userGenres(i,:),'\']); for j=((z-1)*(Y/cf)+1):(z*(Y/cf)) a=wavread([audioDir,'\',userGenres(i,:),'\',x(j+2).name]); %disp(['testejant...','genre: ', num2str(i),' tema: ',x(j+2).name,' carregada.']); F(650000)=0; F(1:650000)=a(1:650000); c(:,:)=melcepst(F(:),22050,'N',8,lframe,floor(3*log(22050)),lframe/2,0,0.5); for t=1:N loglik(t)=mhmm_logprob(c(:,:)', models(t).prior1, models(t).transmat1, models(t).mu1, models(t).Sigma1, models(t).mixmat1); end [caca,genre(i,j)]=max(loglik); genre end end clear models; end c=confusionmatrix(genre); HMMmodels.m function models = HMMmodels(lframe,ngenres,si,sf,si2,sf2); audioDir='C:\Documents and Settings\Jaime\My Documents\data\train'; userGenres=strvcat('blu','cla','cou','dis','hip','jaz','met','pop','reg','roc');

JULY 2007

Tutor: Enric Guaus

T=648; O=8; for i=1:ngenres clear x; x=dir([audioDir,'\',userGenres(i,:),'\']); disp(num2str(i)); for j=si:sf clear Y; clear c; a=wavread([audioDir,'\',userGenres(i,:),'\',x(j+2).name]); Y(650000)=0; Y(1:650000)=a(1:650000); c(:,:)=melcepst(Y(:),22050,'N',8,lframe,floor(3*log(22050)),lframe/2,0,0.5); data(:,:,j-si+1)=c(:,:)'; end for j=si2:sf2 clear Y; clear c; a=wavread([audioDir,'\',userGenres(i,:),'\',x(j+2).name]); Y(650000)=0; Y(1:650000)=a(1:650000); c(:,:)=melcepst(Y(:),22050,'N',8,lframe,floor(3*log(22050)),lframe/2,0,0.5); data(:,:,sf+(j-si2)+1)=c(:,:)'; end M = 3; Q = 5; left_right = 0; prior0 = normalise(rand(Q,1)); transmat0 = mk_stochastic(rand(Q,Q)); [mu0, Sigma0] = mixgauss_init(Q*M,data, 'diag'); mu0 = reshape(mu0, [O Q M]); Sigma0 = reshape(Sigma0, [O O Q M]); mixmat0 = mk_stochastic(rand(Q,M)); [models(i).LL, models(i).prior1, models(i).transmat1, models(i).mu1, models(i).Sigma1, models(i).mixmat1] = mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2); End Confussionmatrix.m function matrix=confussionmatrix(genre); [U,V]=size(genre); matrix(U,U)=0; for i=1:U for j=1:U matrix(i,j)=length(find(genre(i,:)==j)); end end 3.Code used for the third computation (table3) test.m function [c0,c1,c2,c3,c4,c5] = test(lframe,ngenres,ntrain,cf); [cc,models]=HMMmodels(lframe,ngenres,ntrain); [c0,c1,c2,c3,c4,c5] = markovmethod(lframe,models,cf,cc);

JULY 2007

Tutor: Enric Guaus

End HMMmodels.m function [cc,models] = HMMmodels(lframe,ngenres,ntrain); audioDir='C:\Documents and Settings\Jaime\My Documents\data\train'; userGenres=strvcat('blu','cla','cou','dis','hip','jaz','met','pop','reg','roc'); T=648; O=8; M = 3; Q = 5; left_right = 0; %nex=ntrain; for i=1:ngenres clear x; x=dir([audioDir,'\',userGenres(i,:),'\']); disp(num2str(i)); for j=1:ntrain clear Y; clear c; a=wavread([audioDir,'\',userGenres(i,:),'\',x(j+2).name]); Y(650000)=0; Y(1:650000)=a(1:650000); c(:,:)=melcepst(Y(:),22050,'N',8,lframe,floor(3*log(22050)),lframe/2,0,0.5); prior0 = normalise(rand(Q,1)); transmat0 = mk_stochastic(rand(Q,Q)); [mu0, Sigma0] = mixgauss_init(Q*M,c(:,:)', 'diag'); mu0 = reshape(mu0, [O Q M]); Sigma0 = reshape(Sigma0, [O O Q M]); mixmat0 = mk_stochastic(rand(Q,M)); [models(i,j).LL, models(i,j).prior1, models(i,j).transmat1, models(i,j).mu1, models(i,j).Sigma1, models(i,j).mixmat1] = mhmm_em(c(:,:)', prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2); cc(:,:,i,j)=c(:,:)'; end end Markovmethod.m function [c0,c1,c2,c3,c4,c5] = markovmethod(lframe,models,cf,cc); audioDir='C:\Documents and Settings\Jaime\My Documents\data\train'; userGenres=strvcat('blu','cla','cou','dis','hip','jaz','met','pop','reg','roc'); [X,Y]=size(models);

for z=1:cf for i=1:X clear x; x=dir([audioDir,'\',userGenres(i,:),'\']); for j=((z-1)*(Y/cf)+1):(z*(Y/cf)) for k=1:X if z>1 for l=1:((z-1)*(Y/cf)) c(:,:)=cc(:,:,i,j); c2(:,:)=cc(:,:,k,l); s3=mhmm_logprob(c(:,:), models(k,l).prior1, models(k,l).transmat1, models(k,l).mu1, models(k,l).Sigma1, models(k,l).mixmat1); s4=mhmm_logprob(c2(:,:), models(i,j).prior1, models(i,j).transmat1, models(i,j).mu1, models(i,j).Sigma1, models(i,j).mixmat1);

JULY 2007

Tutor: Enric Guaus

s1=mhmm_logprob(c(:,:), models(i,j).prior1, models(i,j).transmat1, models(i,j).mu1, models(i,j).Sigma1, models(i,j).mixmat1); s2=mhmm_logprob(c2(:,:), models(k,l).prior1, models(k,l).transmat1, models(k,l).mu1, models(k,l).Sigma1, models(k,l).mixmat1); d(i,j,k,l)=s1+s2-s3-s4; end end for l=z*(Y/cf)+1:Y %disp(['cansons que comparare: ',num2str(i),',',num2str(j),' <-> ',num2str(k),',',num2str(l)]); c(:,:)=cc(:,:,i,j); c2(:,:)=cc(:,:,k,l); s3=mhmm_logprob(c(:,:), models(k,l).prior1, models(k,l).transmat1, models(k,l).mu1, models(k,l).Sigma1, models(k,l).mixmat1); s4=mhmm_logprob(c2(:,:), models(i,j).prior1, models(i,j).transmat1, models(i,j).mu1, models(i,j).Sigma1, models(i,j).mixmat1); s1=mhmm_logprob(c(:,:), models(i,j).prior1, models(i,j).transmat1, models(i,j).mu1, models(i,j).Sigma1, models(i,j).mixmat1); s2=mhmm_logprob(c2(:,:), models(k,l).prior1, models(k,l).transmat1, models(k,l).mu1, models(k,l).Sigma1, models(k,l).mixmat1); d(i,j,k,l)=s1+s2-s3-s4; end order(1:Y) = sort(d(i,j,k,1:Y)); dg0(i,j,k)=mean(order(Y/cf+1:length(order))); dg1(i,j,k)=mean(order(Y/cf+1:Y/cf+1)); dg2(i,j,k)=mean(order(Y/cf+1:Y/cf+2)); dg3(i,j,k)=mean(order(Y/cf+1:Y/cf+3)); dg4(i,j,k)=mean(order(Y/cf+1:Y/cf+4)); dg5(i,j,k)=mean(order(Y/cf+1:Y/cf+5)); end [caca,genre0(i,j)]=min(dg0(i,j,:)); [caca,genre1(i,j)]=min(dg1(i,j,:)); [caca,genre2(i,j)]=min(dg2(i,j,:)); [caca,genre3(i,j)]=min(dg3(i,j,:)); [caca,genre4(i,j)]=min(dg4(i,j,:)); [caca,genre5(i,j)]=min(dg5(i,j,:)); disp([num2str(i),',',num2str(j),'--> ',num2str(genre0(i,j))]) end end end c0=confussionmatrix(genre0); c1=confussionmatrix(genre1); c2=confussionmatrix(genre2); c3=confussionmatrix(genre3); c4=confussionmatrix(genre4); c5=confussionmatrix(genre5); confussionmatrix.m function matrix=confussionmatrix(genre); [U,V]=size(genre); matrix(U,U)=0; for i=1:U for j=1:U matrix(i,j)=length(find(genre(i,:)==j)); end end

Practicum

Overview

More details

Related Documents

Practicum

Practicum

Practicum

Practicum Paper

M. Practicum

Practicum Syllabus