Music 421 Spring 2004-2005 Homework #8 Overlap-Add STFT Processing, Filter Banks 60 points Due in 5 days (5/31/2005)
1. (10 pts) Draw a block diagram of the filter bank interpretation of DFT, and briefly explain the functions of each of the blocks. 2. Define the signal yk (m) = Xm (ωk )ejωk mR , with k viewed as a fixed parameter, and m viewed as the independent variable. (a) (10 pts) Show that N −1 1 X yk (m) = w(0)x(mR) N k=0
if N ≥ M , or if N < M and w(mN ) = 0, m = ±1, ±2, . . . . (b) (2 pts) What does the term ejωk mR do in the reconstruction? (c) (8 pts) What are the disadvantages of using the case N < M ? (d) (10 pts) How do we recover x(n) for all n when R > 1? 3. (20 pts) Suppose the window transform W (ω) is a lowpass filter with cut-off frequency ωc = 2π/R. That is, W (ω) ≈ 0 for |ω| ≥ ωc . In this case, show that M −1 X
w(n − mR) ≈
m=0
1 W (0). R
If these approximations were exact equalities, specify the set of useable frame step sizes R0 such that M −1 X w(n − mR0 ) = constant. m=0
4. (Optional) Cross-Synthesis Download the skeleton program hw8xsynth.m1 and the sound source files, SteveJobs.wav2 and motorcycle.wav.3 The program analyzes the spectral envelope of the speech which is then imposed on the spectrum of a broadband signal, here, a motorcycle sound. (a) (5 pts) Fill in the comments (5 of them) in the program to explain what the code in the next few lines do and why we might want to do that. 1
http://www-ccrma.stanford.edu/˜jos/hw421/hw8/hw8xsynth.m http://www-ccrma.stanford.edu/˜jos/hw421/hw8/SteveJobs.wav 3 http://www-ccrma.stanford.edu/˜jos/hw421/hw8/motorcycle.wav
2
1
(b) (25 pts) Fill in the unfinished lines to make an alias-free cross synthesizer. Turn in your code with all the comments completed and a sample of your cross synthesis result between SteveJobs.wav and motorcycle.wav. Name the cross-synthesis wave file xxxxhw8.wav where xxxx are the first four letters of your last name. (c) (5 pts) For an arbitrary time n, plot the following: i. The short-time speech spectrum magnitude (dB). ii. The amplitude response of the all-pole filter 1/A(z) obtained by linear prediction analysis at that time. iii. The short-time spectral magnitude (dB) of the synthesized sample. (d) (5 pts) Discuss what are the criteria of the selection of two signals to be fed into the cross-synthesis, so that the synthesized speech is clearly intelligible. Remark: One good thing to do is to create bad examples as well as good examples, and investigate why they are good or bad.
2