Introduction To Signal Processing

  • Uploaded by: api-26157851
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Introduction To Signal Processing as PDF for free.

More details

  • Words: 6,260
  • Pages: 162
Introduction to Signal Processing and Some applications in audio analysis

Md. Khademul Islam Molla JSPS Research Fellow Hirose-Minematsu Laboratory Email: [email protected]

Outlines of the presentation Basics of discrete time signals Frequency domain signal analysis Basic Transformations Fourier Transform (FT), short-time FT (STFT) Wavelet Transform (WT) Empirical mode decomposition (EMD), and Hilbert spectrum (HS) Remarkable comparisons among FT, WT, HS Some applications in audio processing Some open problems to work with

Discrete time signal It is not possible to process continuous signals We need to make it discrete time signal with suitable sampling frequency and quantization The sampling theory Fs≥ 2fc where, fc expected signal frequency, Fs required sampling frequency Quantization is required in sampling

Discrete time signal

Signal sampling

Signal quantization

Discrete time signal

Effects of under sampling

Discrete time signal

Effects of required sampling frequency

Discrete time signal Telephone speech is usually sampled at 8 kHz to capture up to 4 kHz data 16 kHz is generally regarded as sufficient for speech recognition and synthesis The audio standard is a sample rate of 44.1 kHz (CD) or 48 kHz (Digital Audio Tape) to represent frequencies up to 20 kHz

Discrete time signal 5 4

Amplitud e

f(x) = 5 cos (x)

3 2 1 -10

-5

0 -1 0

5

10

-2 -3 -4 -5

5

Phase f(x) = 5 cos (x + 3.14)

3 1 -10

-5

-1 0

5

10

5

10

-3 -5

Frequenc f(x) = 5 cos (3 x + y 3.14)

5 3 1 -10

-5

-1 0 -3 -5

Time-domain signals

1

1

0.5

0.5

Magnitude

2 Hz

Magnitude

The Independent Variable is Time The Dependent Variable is the Amplitude Most of the Information is Hidden in the Frequency Content

0 -0.5 -1 0

0.5

0 -0.5 -1 0

1

4

0.5

2

0 -0.5 -1 0

0.5

Time

0.5

1

Time

1

Magnitude

Magnitude

Time

20 Hz

10 Hz

1

2 Hz + 10 Hz + 20Hz

0 -2 -4 0

0.5

Time

1

Signal Transformation Why  To obtain a further information from the signal that is not readily available in the raw signal.

Raw Signal  Normally the time-domain signal

Processed Signal  A signal that has been "transformed" by any of the available mathematical transformations

Fourier Transformation  The most popular transformation between time and frequency domains

Frequency domain analysis

Why Frequency Information is Needed Be able to see any information that is not obvious in time-domain

Types of Frequency Transformation Fourier Transform, Hilbert Transform, Short-time Fourier Transform,the Radon Transform, the Wavelet Transform …

Frequency domain analysis •Powerful & complementary to time domain analysis methods •Frequency domain representation shows the signal energy and phase with respect to frequency •Fast and efficient way to view signal’s information

time, t

analysis

General Transform as problem-solving tool

frequency, f

F S(f) = F[s(t)]

s(t)

s(t), S(f) : Transform Pair

synthesis Basic block diagram of signal transformation

Frequency domain analysis Complex numbers 4.2 + 3.7i 9.4447 – 6.7i -5.2 (-5.2 + 0i)

General Form Z = a + bi Re(Z) = a Im(Z) = b

Amplitude A = | Z | = √(a2 + b2)

Phase ϕ = ∠ Z = tan-1(b/a)

Frequency domain analysis Polar Coordinate Z = a + bi Amplitude A = √(a2 + b2) Phase ϕ = tan-1 (b/a)

ℑ A

ϕ a

b



Frequency domain analysis Frequency Spectrum  Be basically the frequency components (spectral components) of that signal  Show what frequencies exists in the signal

Fourier Transform (FT)  One way to find the frequency content  Tells how much of each frequency exists in a signal

Spectrum of speech signal

Fourier Transform •Fourier transform decomposes a function into a spectrum of its frequency components, •The inverse transform synthesizes a function from its spectrum of frequency components •Discrete Fourier transform pair is defined as:

Where Xk represents the frequency component

Where xn represents nth sample in time domain

Fourier Transform 5

Amplitude Only

4 3 2 1 0 -1 0

200

400

600

800

1000

1200

1400

-2 -3 -4

5

10 (Hz)

15

5

10 (Hz)

15

-5

5 4 3 2 1 0 -1 0

200

400

600

800

1000

1200

1400

-2 -3 -4 -5

Fourier Trans. of 1D signal 5 4 3 2 1 0 -1 0 -2 -3 -4 -5

200

400

600

800

1000

1200

1400

5

10 (Hz)

15

Fourier Spectrum of 1D

Fourier Transform Fourier analysis uses Sinusoids as the basis function in decomposition Fourier transforms give the frequency information, smearing time Samples of a function give the temporal information, smearing frequency

FS synthesis Square wave reconstruction from spectral terms 1.5 1.5 13 7 5 11 9 sw11 (t) (t)===∑ ⋅sin(kt) sin(kt) ]] ] sin(kt) ∑ ∑[[--[b-bbkkk⋅⋅⋅sin(kt) 7 5 13 9(t) kkk==1 =11

square signal, sw(t)

11 0.5 0.5 00 -0.5 -0.5 -1 -1 -1.5 -1.5 00

22

44

tt

66

88

10 10

Convergence may be slow (~1/k) - ideally need infinite terms. Practically, series truncated when remainder below computer tolerance (⇒ error). BUT … Gibbs’ Phenomenon.

Stationarity of the signal Stationary Signal Signals with frequency content unchanged over the entire time All frequency components exist at all times

Non-stationary Signal Frequency changes in time One example: the “Chirp Signal”

Stationarity of the signal 3

6 0 0

2

5 0 0

Magnitude

Stationary

Magnitude

2 Hz + 10 Hz + 20Hz

1

0

4 0 0

3 0 0

1

2 0 0

2

1 0 0

3 0

0 . 2

0 . 4

Time

0 . 6

0 . 8

Occur at all times

0 0

1

5

1 0

1 5

2 0

2 5

Frequency (Hz)

Do not appear at all times 2 5 0

0 . 8 0 . 6

Magnitude

NonStationary

1

Magnitude

0.0-0.4: 2 Hz + 0.4-0.7: 10 Hz + 0.7-1.0: 20Hz

0 . 4 0 . 2 0 0 . 2

2 0 0

1 5 0

1 0 0

0 . 4 0 . 6

5 0

0 . 8 1 0

0 . 5

Time

1

0 0

5

1 0

1 5

Frequency (Hz)

2 0

2 5

Chirp signal Frequency: 2 Hz to 20 Hz

Different in Time Domain 150

0.8

0.8

0.6

0.6

0.4

Magnitude

0.2 0 -0.2

100

0.2 0 -0.2

50

-0.4

50

-0.4

-0.6

-0.6

-0.8

-0.8

-1 0

150

0.4

100

Magnitude

Magnitude

1

Magnitude

1

Frequency: 20 Hz to 2 Hz

0.5

Time

1

0 0

5

10

15

20

Frequency (Hz)

25

-1 0

0.5

Time

1

0 0

5

10

15

20

Frequency (Hz)

Same in Frequency Domain At what time the frequency components occur? FT can not tell!

25

Limitations of Fourier Transform FT Only Gives what Frequency Components Exist in the Signal The Time and Frequency Information can not be seen at the same time Time-frequency representation of the signal is needed Most of Signals are Non-stationary

ONE SOLUTION: SHORT-TIME FOURIER TRANSFORM (STFT)

Short Time Fourier Transform Dennis Gabor (1946) used STFT  To analyze only a small section of the signal at a time -- a technique called Windowing the Signal.

The segment of signal is assumed stationary A 3D transform STFTX( ω ) ( t ′, f ) = ∫ [ x( t ) • ω* ( t − t ′) ] • e − j 2 πft dt t

ω( t ) : the window function A function of time and frequency

Short time Fourier Transform

FT

FT

Speech signal and its STFT

Drawbacks of STFT Unchanged Window Dilemma of Resolution  Narrow window -> poor frequency resolution  Wide window -> poor time resolution

Heisenberg Uncertainty Principle  Cannot know what frequency exists at what time intervals Via Narrow Via Wide Window Window

Wavelet Transform To overcome some limitations of Fourier transform

S S D

A1

1

D2

A2 A3

D3

Discrete Wavelet decomposition

Wavelet Overview

Wavelet

 A small wave

Wavelet Transforms

 Provide a way for analyzing waveforms, bounded in both frequency and duration  Allow signals to be stored more efficiently than by Fourier transform  Be able to better approximate real-world signals  Well-suited for approximating data with sharp discontinuities

“The Forest & the Trees”

 Notice gross features with a large "window“  Notice small features with a small "window”

Multi-resolution analysis Wavelet Transform  An alternative approach to the short time Fourier transform to overcome the resolution problem  Similar to STFT: signal is multiplied with a function

Multi-resolution Analysis  Analyze the signal at different frequencies with different resolutions  Good time resolution and poor frequency resolution at high frequencies  Good frequency resolution and poor time resolution at low frequencies  More suitable for short duration of higher frequency; and longer duration of lower frequency components

Advantages of WT over STFT

Width of the Window is Changed as the Transform is Computed for Every Spectral Components Altered Resolutions are Placed

Principles of WT

Split Up the Signal into a Bunch of Signals Representing the Same Signal, but all Corresponding to Different Frequency Bands Only Providing What Frequency Bands Exists at What Time Intervals

Principles of WT 1 * t − τ  CWT ( τ, s ) = Ψ ( τ, s ) = x( t ) • ψ  dt ∫ s  s  ψ x

ψ x

Translation (The location of the window)

Wavelet

Scale Mother Wavelet

 Small wave  Means the window function is of finite length

Mother Wavelet  A prototype for generating the other window functions  All the used windows are its dilated or compressed and shifted versions

Wavelet bases

Principles of WT

Time domain

Frequency domain

-1

2

j ω η −η Wavelet Basis Functions: Morlet (ω0 = frequency ) : π 4 e 0 e

2

2 m i m m! Paul ( m = order ) : DOG (1 − iη) −( m+1) π( 2m )! DOG ( m = devivative ) : Derivative Of a Gaussian M=2 is the Marr or Mexican hat wavelet

( - 1) m+1

(

d m −η2 e m 1 dη Γ m +  2 

2

)

Scale of wavelet Scale  S>1: dilate the signal  S<1: compress the signal

Low Frequency -> High Scale -> Nondetailed Global View of Signal -> Span Entire Signal High Frequency -> Low Scale -> Detailed View Last in Short Time Only Limited Interval of Scales is Necessary

Computation of WT CWT xψ ( τ, s ) = Ψxψ ( τ, s ) =

1 s

* t − τ  x ( t ) • ψ  dt ∫ s  

Step 1: The wavelet is placed at the beginning of the signal, and set s=1 (the most compressed wavelet); Step 2: The wavelet function at scale “1” is multiplied by the signal, and integrated over all times; then multiplied by ; 1 s Step 3: Shift the wavelet to t= , and get the transform value at t= and s=1;τ τ Step 4: Repeat the procedure until the wavelet reaches the end of the signal; Step 5: Scale s is increased by a sufficiently small value, the above procedure is repeated for all s; Step 6: Each computation for a given s fills the single row of the time-scale plane; Step 7: CWT is obtained if all s are calculated.

Time & Frequency Resolution Better time resolution; Poor frequency resolution Frequency

Better frequency resolution; Poor time resolution

Time • Each box represents a equal portion • Resolution in STFT is selected once for entire

Comparison of transformations

Discretization of WT It is Necessary to Sample the TimeFrequency (scale) Plane. At High Scale s (Lower Frequency f ), the Sampling Rate N can be Decreased. The Scale Parameter s is Normally Discretized on a Logarithmic Grid. The most Common Value is 2.

N 2 = s1 s2 ⋅ N1 = f1 f 2 ⋅ N1

S N

2 4 32 16

8 8

… …

Effective and Fast DWT The Discretized WT is not a True Discrete Transform Discrete Wavelet Transform (DWT) Provides sufficient information both for analysis and synthesis Reduce the computation time sufficiently Easier to implement S S Analyze the signal at different frequency bands with different resolutions A Decompose the signal into a coarse approximation and detail information D A 1

2

2

A3

D3

D1

Decomposition with DWT Halves the Time Resolution  Only half number of samples resulted

Doubles the Frequency Resolution  The spanned frequency band halved

0-1000 Hz

X[n]5 12

Filter 1

256 S S D1

A1 D2

A2 A3

D3

A1

256

Filter 2

128

D1: 500-1000 Hz

128 Filter 3

A2

64

D2: 250-500 Hz

64

D3: 125-250 Hz

A3: 0-125 Hz

Decomposition of nonstationary signal fL

Signal: 0.0-0.4: 20 Hz 0.4-0.7: 10 Hz 0.7-1.0: 2 Hz fH

Wavelet: db4 Level: 6

Decomposition of nonstationary signal fL

Signal: 0.0-0.4: 2 Hz 0.4-0.7: 10 Hz 0.7-1.0: 20Hz fH

Wavelet: db4 Level: 6

Reconstruction from WT What How those components can be assembled back into the original signal without loss of information? A Process After decomposition or analysis. Also called synthesis

How Reconstruct the signal from the wavelet coefficients Where wavelet analysis involves filtering and downsampling, the wavelet reconstruction process consists of upsampling and filtering

Reconstruction from WT Lengthening a signal component by inserting zeros between samples (upsampling) MATLAB Commands: idwt and waverec.

Wavelet Applications Typical Application Fields  Astronomy, acoustics, nuclear engineering, subband coding, signal and image processing, neurophysiology, music, magnetic resonance imaging, speech discrimination, optics, fractals, turbulence, earthquake-prediction, radar, human vision, and pure mathematics applications

Sample Applications  De-noising signals  Breakdown detecting  Detecting self-similarity  Compressing images  Identifying pure tone

Signal De-noising Highest Frequencies Appear at the Start of The Original Signal Approximations Appear Less and Less Noisy Also Lose Progressively More High-frequency Information. In A5, About the First 20% of the Signal is Truncated

Breakdown Detection The Discontinuous Signal Consists of a Slow Sine Wave Abruptly Followed by a Medium Sine Wave. The 1st and 2nd Level Details (D1 and D2) Show the Discontinuity Most Clearly Things to be Detected  The site of the change  The type of change (a rupture of the signal, or an abrupt change in its first or second derivative)  The amplitude of the change

Discontinuit y Points

Detecting Self-similarity Purpose

 How analysis by wavelets can detect a self-similar, or fractal, signal.  The signal here is the Koch curve -- a synthetic signal that is built recursively

Analysis

 If a signal is similar to itself at different scales, then the "resemblance index" or wavelet coefficients also will be similar at different scales.  In the coefficients plot, which shows scale on the vertical axis, this selfsimilarity generates a characteristic pattern.

Image Compression Fingerprints

 FBI maintains a large database of fingerprints — about 30 million sets of them.  The cost of storing all this data runs to hundreds of millions of dollars.

Results

 Values under the threshold are forced to zero, achieving about 42% zeros while retaining almost all (99.96%) the energy of the original image.  By turning to wavelets, the FBI has achieved a 15:1 compression ratio  better than the more traditional JPEG compression

Identifying Pure Tone Purpose

 Resolving a signal into constituent sinusoids of different frequencies  The signal is a sum of three pure sine waves

Analysis

 D1 contains signal components whose period is between 1 and 2.  Zooming in on detail D1 reveals that each "belly" is composed of 10 oscillations.  D3 and D4 contain the medium sine frequencies.  There is a breakdown between approximations A3 and A4 -> The medium frequency been subtracted.  Approximations A1 to A3 be used to estimate the medium sine.  Zooming in on A1 reveals a period of around 20.

Empirical Mode Decomposition Principle

Objective — From one observation of x(t), get a AM-FM type representation : K

k=1

x(t) =

Σ

ak(t) Ψk(t)

with ak(.) amplitude modulating functions and Ψk(.) oscillating functions. Idea — “signal = fast oscillations superimposed to slow oscillations”. Operating mode — (“EMD”, Huang et al., ’98) (1) identify locally in time, the fastest oscillation ; (2) subtract it from the original signal ; (3) iterate upon the residual.

Empirical Mode Decomposition Principle

1

A LF sawtooth

0

-1

+

0

1

1

A linear FM

0

-1 0

=

1

0

0

1

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

FirstIntrinsicM odeFunction

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

S econdIntrinsicM odeFunction

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

ThirdIntrinsicM odeFunction

Empirical Mode Decomposition Algorithmic definition

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

R esidu

S I F T I N G P R O C E S S

Empirical Mode Decomposition Algorithmic definition

S ignal

1stIntrinsicM odeFunction

2ndIntrinsicM odeFunction

3rdIntrinsicM odeFunction

R esidu

Empirical Mode Decomposition Intrinsic Mode Functions

— Quasi monochromatic harmonic oscillations  #{zero crossing} = #{extrema} ± 1  symmetric envelopes around the y=0 axis — IMF ≠ Fourier mode and, in nonlinear situations, IMF = several Fourier modes — Output of a self-adaptive time-varying filter (≠ standard linear filter) ex: 2 sinus FM + gaussian wave packet

Empirical Mode Decomposition Instantaneous frequency (IF)

— Analytic version of each IMF Ci(t) is computed

using Hilbert transform as: z i (t ) = C i (t ) + jH [C i (t )] = a i (t )e jθi (t )

— hence zi(t) becomes complex with phase and

amplitude. Then IF can be computed as: dθi (t ) ωi (t ) = dt

-Hilbert spectrum (HS) is a triplet as H(ω ,t)={t, ω i(t), ai(t)}

Empirical Mode Decomposition Intrinsic Mode Functions Signal

time

Spectrum

frequency

Time-Frequency representation

Empirical Mode Decomposition Intrinsic Mode Functions

S ignal

Empirical Mode Decomposition Intrinsic Mode Functions

Signal

1st IMF frequency

time

2nd IMF

3rd IMF

Empirical Mode Decomposition Intrinsic Mode Functions

•EMD is fully data adaptive decomposition for spectral and time-frequency representation of non-linear and non-stationary time series •It does not employ any basis function for decomposition •It produces perfect localization of the signal components in high resolution time-frequency space of the time series

Empirical Mode Decomposition Comparison between HS and STFT

Time-frequency representation of two pure tones (100Hz and 250Hz) using HS and STFT

Hilbert spectrum (HS)

STFT

Empirical Mode Decomposition Comparison between Wavelet and HS

Wavelet

Hilbert spectrum (HS)

Remarks on FT •Fourier Transform has a mathematical foundation •Can be used in robust analysis having phase information •The detail signal information is limited with the basis (sinusoid) function •STFT analysis includes some addition cross-spectral energy that degrades the performance in some applications

Remarks on WT •WT employs data adaptive basis function base on its time and frequency scales •It can produce more detail signal information in T-F representation •WT also perform well in multi-band decomposition •The reconstruction error of multi-band representation is much less than the FT •It can not preserve the phase information for perfect reconstruction from T-F space

Remarks on EMD and HS •EMD is fully adaptive multi-band decomposition method •It produces the perfect localization of signal components in T-F space •HS can represent the instantaneous spectra of the signal •The signal can be reconstructed with negligible error terms •It does not have mathematical foundation yet •It is difficult to use EMD based decomposition in robust analysis

Application of DSP in audio analysis •Audio source separation from mixture using independent subspace analysis (ISA) •Audio source separation by spatial localization in underdetermined case •Robust pitch estimation using EMD

Application of DSP in audio analysis

Audio source separation from mixture using independent subspace analysis (ISA)

Source separation by Independent subspace analysis (ISA) Audio mixture

s(t) STFT STFTM

φ

Source spectrograms

ISTFT

PCA Basis vector selection

ICA

Basis vector clustering

Individual sources

T-F representation of mixture Mixture Audio

Short Time Fourier Transform (STFT) Magnitude Spectrogram X

Phase Information window 30ms Overlap20ms

Proposed separation model Mixture spectrogram X=∑ xi

xi=BiAi

Source Spectrograms

Bi = [b1(i ) , b2(i ) ......bn(i ) ]

Ai =

(i ) (i ) (i ) T [a1 , a2 ........an ]

Bi  Invariant frequency n-component basis Ai  Corresponding amplitude envelope

Ai=BiTX,

Bi=XAiT

** To find independent Bi or Ai

Dimension Reduction Rows or columns of X > > number of sources Subject to reduce the dimension of X Singular value decomposition (SVD) is used Xn× k=Un× nSn× kVk× kT

-U and V  orthogonal matrices (column-wise) -S  diagonal matrix of elements (singular values) σ 1≥ σ 2≥ σ 3 ≥ .… σ n≥ 0

p basis vectors (from U or V) are selected by setting ϕ =0.5 to 0.6 in inequality 

1 n

∑σ i i =1

p

∑σ i =1

i

≥ϕ

Proposed separation model To derive the basis vectors Singular value decomposition (SVD) is applied as PCA Some principal components are selected as basis vectors Independent component analysis (ICA) is applied to make the bases independent

Independent basis vectors

before ICA

after ICA

**The bases Independent along time frames

Producing source subspaces The bases of the speech signal

Time bases

Frequency bases

Source Subspaces (cont.) Mixture Spectrogram PCA+Basis Selection+ICA B

Basis vectors

A

KLd based clustering B1A1

Source Subspaces Source Spectrograms

B2A2

Source re-synthesis Separated subspaces (spectrograms) Mixture of speech & bip-bip sound Inverse STFT Separated speech Append phase information

Si (n, k ) = xi .e

j [φ ( n ,k )] Separated bip-bip sound

Experimental results Separated signals with proposed algorithm

mixtures Speech+bip-bip

Male+female speech

separated

Application of DSP in audio analysis

Audio source separation by spatial localization in underdetermined case

Localization based separation To avoid the spectral dependency and signal content in separation To increase the number of sources The spatial location is considered The use of Binaural mixtures instead of single mixture

Localization based separation (cont.) Consider a multi-source audio situation Human can easily localize and separate the sources by HAS (human auditory system) The binaural cues ITD and ILD are mainly used in source localization Separation is performed by applying Beamforming and Binary mask

Source localization Cues • Interaural time difference (ITD) between two microphones’ signals (like two ears of human) • Interaural level difference (ILD)

ITD

ILD

Source localization Xr(ω ) and Xl(ω ) are STFT of xr(t) and xl(t) ITD and ILD are calculated as  | X l (ω ) |  1  τ I T D(ω ) = {φ l (ω ) − φ r (ω ) } κ ILD(ω ) = 20log  | X ( ω ) | ω  r 

where φ r(ω ) and φ l(ω ) are unwrap phase of Xr(ω ) and Xl(ω ) respectively at frequency ω

Source localization (cont.) ITD becomes ambiguous at higher frequency (factor of mics` spacing) ITD

ITD

At high frequency AtILD low dominates frequency to resolve the problem

Source localization (cont.) ITD and ILD are quantized into 50 levels Collection of T-F points corresponding to each ITD/ILD quantized pair produces peaks

Separation by beamforming ITD is derived for each of the localized sources Spatial Beamforming is applied Linearly constrained minimum variance Beamforming (LCMVB) is used The gain is selected based on the spatial locations

Separation by binary mask with HS It is required to avoid the limitations of spatial beamforming Separation is performed by binary mask estimation based on ITD/ILD The sources are considered as disjoint orthogonal in T-F space  not more than one source is active at any T-F point

Computing ITD and ILD Each mixture is transformed to T-F domain using Hilbert spectrums (HL and HR) ITD and ILD are measured as:

 1 H L (ω , t f ) H R (ω , t f )  { ITD(ω , t f ), ILD(ω , t f )} =  ∠ ,   ω H R (ω , t f ) H L (ω , t f )   H (ω , t ) 2  f  R  ILDdB (ω , t f ) = 20 log 10 2   H L (ω , t f )  where tf is the time frame  

ITD-ILD Space Localization ITD and ILD are quantized into 50 levels Collection of T-F points from HS corresponding to each ITD/ILD quantized pair produces peaks

Source Separation Each peak region in the histogram refers to a source of the binaural mixtures Construct a binary mask (nullifying T-F points of interfering sources) Mi(ω ,t) The HS of ith source is separated as

H i (ω , t ) = M i (ω , t ) H L (ω , t )

Time domain ith source is given as

si (t ) = ∑ H i (ω , t ) ⋅ cos[φ (ω , t )] ω

Source disjoint orthogonality Disjoint orthogonality (DO) of audio sources assumes that not more than one source is active at any T-F point

F1 (ω , t ) F2 (ω , t ) = 0; ∀ ω , t where F1 and F2 are TFR of two signals SIR (signal to interference ratio) is used as the basis to measure DO

Source disjoint orthogonality s1

Frequency

s2

(cont.) s3

Three audio sources Microphone s TFR

s1 s2 s3

Time

Source disjoint orthogonality (cont.)

The SIR of the jth source is defined as:

SIR j = ∑ ∑ ω

t

X j (ω , t ) Y j (ω , t )

; Y j (ω , t ) ≠ 0

N

Y j (ω , t ) = ∑ X i (ω , t ) i =1 i≠ j

Yj  sum of interfering sources

Source disjoint orthogonality (cont.)

Dimensions of HS and STFT of same signal may be different DO is defined as the percentage over the entire TFR region Average DO (ADO) of all sources is

1 ADO = N

N



SIR j

j =1

N number of sources

Experimental results The three mixtures are defined as m1{sp1(-40°, 0°), sp2(30°, 0°), ft(0°, 0°)}, m2{sp1(20°, 10°), sp2(0°, 10°), ft(-10°,10°)}, m3{sp1(40°, 20°), sp2(30°, 20°), ft(-20°, 20°)}

The separation efficiency is measured as OSSR (original to separated signal ratio) defined as: w OSSR =

1 T

T

   log 10     

∑ t =1

 (t + i )   i =1  w  2 s separated (t + i )   i =1 





2 s original

Experimental results (cont.) The comparative separation efficiency (OSSR) using HS and STFT : Mixtures m1 m2 m3

TFR

OSSR of sp1

OSSR of sp2

OSSR of ft

HS

-0.0271

0.0213

0.0264

STFT

0.0621

-0.0721

-0.0531

HS

0.0211

-0.0851

-0.0872

STFT

0.0824

0.1202

0.1182

HS

0.0941

-0.0832

0.0225

STFT

-0.1261

0.1092

-0.0821

Experimental results This experiment also compares the DO using HS and STFT as TFR STFT is affected by many factors  window function and its length, overlapping, FFT points HS is independent of such factors It is slightly affected by the number of frequency bins used in TFR

Experimental results (cont.) The ADO of HS and STFT as a function of number of frequency bins (N=3):

Experimental results (cont.) The ADO of only STFT is affected by the factor of window overlapping (%)

Experimental results (cont.) STFT includes more cross-spectral energy terms The TFR of two pure tones using HS and STFT

Experimental results (cont.) Always HS has better DO for audio signals DO depends on the resolution of TFR 1 STFT has to satisfy the inequality

∆ t∆ω ≥

2

The frequency resolution of HS is up to Nyquist frequency Its time resolution is up to sampling rate and hence offers better resolution

Remarks The separation efficiency is independent of the signal’s spectral characteristics The performance is affected by the apart angles and disjointness of the sources HS produces better disjointness in T-F domain and hence better separation The Binaural mixtures are recorder in anechoic room of NTT

Application of DSP in audio analysis

Robust pitch estimation using EMD

Why EMD in pitch estimation? Pitch facilitates speech coding, enhancement, recognition etc. Autocorrelation function is mostly used in pitch estimation algorithm Autocorrelation (AC) functionrecalls the periodic property of the speech

EMD in pitch estimation (cont.)

EMD in pitch estimation (cont.) Pitch is the sample difference between two consecutive peaks in AC function Sometimes the pitch peak may be less prominent specially due to noise

EMD in pitch estimation (cont.) EMD decomposes any signal into higher to lower frequency component It produces the local and global oscillations of the signal The global oscillation almost represents the envelop of the signal The IMF of global oscillation is used to estimate the pitch

Pitch estimation with EMD

Pitch estimation with EMD (cont.) There exists an IMF in EMD domain representing the global oscillation of the AC function That IMF represents the sinusoid of the pitch period Pitch is the frequency of that IMF rather than finding the pitch peak

Pitch estimation with EMD (cont.) In EMD, IMF-5 is the oscillation of pitch period It is a crucial step to determine the target IMF representing the sinusoid with pitch period The IMF of low frequency oscillation (than pitch period) can be discarded by energy thresholding

Pitch estimation with EMD (cont.) A reference pitch is computed by weighted AC (WAC) method Such pitch information is used to select the IMF with pitch period The periodicity of the selected each IMF is computed as pitch period

Pitch estimation with EMD (cont.) The peak at zero-lag is selected Two cycles are selected from both sides Average samples are the periodicity

Proposed Pitch estimation Algorithm Normalized autocorrelation (AC) of the speech frame is computed Determine rough pitch period using WAC method Apply EMD on AC function Select the IMF of pitch period on the basis of WAC based method The period of the selected IMF is the estimated pitch

Experimental results Keele pitch database is used here 20kHz sampling rate Frame length is 25.6ms with 10ms shifting Each frame is filtered by band-pass filter of pitch range (50-500Hz) Gross pitch error (GPE) is used to measure the performance

Experimental results (cont.) The %GPE of male and female speech with different SNR are presented here Total number of frames is 1823 SNR 30dB 20dB 10dB 0dB -5dB -15dB Female 1.90 2.83 3.93 10.12 21.83 64.24 Male

2.15 3.78 5.22 11.89 23.56 66.76

Remarks The use of EMD makes the pitch estimation method more robust EMD of AC function can extract the fundamental oscillation of the signal The pitch can be easily estimated from the single sinusoid of fundamental oscillation It is not affected by the prominent nonpitch peak

Future works The open problem is to identify the IMF with pitch period In present algorithm the error to estimate pitch roughly in ACF can propagate to the performance of final estimation The performance is not yet tested with other existing algorithm

Open Problem-1

Instantaneous Pitch (IP) estimation using EMD •Frame based pitch estimation is already done •Paper is accepted by EUROSPEECH 2007

Three methods of pitch estimation

•We have used the pitch information based WAC to compute the exact pitch (IMF) from EMD space •Problem to compute IP only from EMD space

Open Problems-2

Voiced/Unvoiced Detection with EMD

•Useful in speech enhancement and speech/speaker recog. •Paper with preliminary results is published in ICICT2007

V/UV differentiation

•Problem to derive better separation region for V/UV and to conduct experiment with large speech data

Open Problems-3

Robust Audio Source Localization

•Localization is done by delay-attenuation computed in T-F space of binaural mixtures- NOT noise robust

Localization of three sources using TD-LD computed in T-F space

•The problem is to derive mathematical mode for robust localization in underdetermined situation

Open Problems-4

Speech denoising using image processing

•Noisy speech can be represented as an image with time-frequency (T-F) representation e.g. Spectrogram

Speech with white noise

•Image processing algorithm can be used for denoising •It seems easy for musical/white noises

•Problem is to deal with other noise even by using binaural mixtures

Open Problems-5

Auditory segmentation with binaural mixtures •Auditory segmentation is the first stage of source separation using auditory scene analysis (ASA) Source separation by ASA

•Problem is to use of binaural mixtures for improved auditory segmentation as source separation •T-F representation other than FT can be employed

Open Problems-6

Two stage speech enhancement

•Single stage speech enhancement is not efficient in all noisy situations •For example, musical noise is introduced with binary masking and some thresholding methods •Noise may not be separated perfectly by using ICA, ISA (independent subspace analysis) based techniques •Multi-stage enhancement with suitable order can improve the performance Noisy speech

First stage enhancement

Second stage enhancement

Clean speech

Open Problems-7

Informative features extraction

•To use spectral dynamics in speech/speaker recog. •Special type of speech features are required

Mixed signal with its spectrogram

•How to parameterized speech signal to represent speech dynamics •WT, HS based spectral analysis can be studied>>>

Open Problems-8

Source based audio indexing

•Useful in multimedia applications and moving audio source separation s2 Separation of moving sources

s1

Audio sources at different azimuth angles (0 to 180 degree)

s3

1.5m

•Several new method could be used for indexing  Ada-boost, Tree-ICA, condition random field

Open Problems-9

Time-series prediction with EMD

•Subject to financial and environment time series •Conventional methods use Kalman filter (for smoothing) and AR model for prediction

Non-stationary time-series

•EMD can be used as smoothing filter to enhance the prediction accuracy

Open Problems-10

Heart-rate analysis with ECG data using EMD

Different parts of ECG signal

•ECG Variability analysis at different frequency region •Analysis of instantaneous ECG condition •Abnormality analysis of heart-rate using EMD based spectral modeling

The End

Questions/Suggestion Please

Source Separation Each peak region in the histogram refers to a source of the stereo mixtures Construct a binary mask (nullifying TF points of interfering sources) Mi(n,t) The HS of ith source is separated as

H i (n, t ) = M i (n, t ) H L (n, t ) Time domain ith source is given as

si (t ) = ∑ H i (n, t ) ⋅ cos[φ (n, t )] n

The End

Questions/Suggestion Please

Related Documents