Digital Signal Processin-tu Darmstadt

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Digital Signal Processin-tu Darmstadt as PDF for free.

More details

  • Words: 48,514
  • Pages: 166
Darmstadt University of Technology

Digital Signal Processing Abdelhak M. Zoubir Signal Processing Group Institute of Telecommunications Darmstadt University of Technology Merckstr. 25, 64283 Darmstadt, Germany e-mail: [email protected] URL: http://www.spg.tu-darmstadt.de

Manuscript draft for Digital Signal Processing

c

Abdelhak Zoubir 2006

Last revised: October 2006.

Contents Preface

xi

1 Discrete-Time Signals and Systems 1.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 The unit sample sequence . . . . . . . . . . 1.1.2 The unit step sequence . . . . . . . . . . . . 1.1.3 Exponential and sinusoidal sequences . . . 1.1.4 Periodic Sequences . . . . . . . . . . . . . . 1.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 The unit sample response of linear systems 1.3 Convolution . . . . . . . . . . . . . . . . . . . . . . 1.4 Stability and Causality . . . . . . . . . . . . . . . . 1.4.1 Stability . . . . . . . . . . . . . . . . . . . . 1.4.2 Causality . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

1 1 2 3 4 6 6 8 9 9 10 11

2 Digital Processing of Continuous-Time Signals 2.1 Periodic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Reconstruction of Band-Limited Signals . . . . . . . . . . . . . . . . . . . . . . . 2.3 Discrete-time Processing of Continuous-Time Signals. . . . . . . . . . . . . . . .

13 13 15 15

3 Filter Design 3.1 Introduction . . . . . . . . . . . . . . . . 3.2 Digital Filters . . . . . . . . . . . . . . . 3.3 Linear Phase Filters . . . . . . . . . . . 3.3.1 Generalised Linear Phase for FIR 3.4 Filter Specification . . . . . . . . . . . . 3.5 Properties of IIR Filters . . . . . . . . . 3.5.1 Stability . . . . . . . . . . . . . . 3.5.2 Linear phase . . . . . . . . . . .

. . . . . . . . . . . . . . . Filters . . . . . . . . . . . . . . . . . . . . .

4 Finite Impulse Response Filter Design 4.1 Introduction . . . . . . . . . . . . . . . . . . . 4.2 Design of FIR Filters by Windowing . . . . . 4.3 Importance of the Window Shape and Length 4.4 Kaiser Window Filter Design . . . . . . . . . 4.5 Optimal Filter Design. . . . . . . . . . . . . . 4.6 Implementation of FIR Filters . . . . . . . . .

i

. . . . . .

. . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

19 19 20 21 22 27 28 30 31

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

33 33 33 35 40 45 48

ii 5 Design of Infinite Impulse Response Filters 5.1 Introduction . . . . . . . . . . . . . . . . . . . 5.2 Impulse Invariance Method . . . . . . . . . . 5.3 Bilinear Transformation . . . . . . . . . . . . 5.4 Implementation of IIR Filters . . . . . . . . . 5.4.1 Direct Form I . . . . . . . . . . . . . . 5.4.2 Direct Form II . . . . . . . . . . . . . 5.4.3 Cascade form . . . . . . . . . . . . . . 5.5 A Comparison of FIR and IIR Digital Filters

CONTENTS

. . . . . . . .

53 53 54 57 60 60 61 62 63

6 Introduction to Digital Spectral Analysis 6.1 Why Spectral Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65 65 65

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

7 Random Variables and Stochastic Processes 7.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Multiple Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.8 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.9 Operations on Random Variables . . . . . . . . . . . . . . . . . . . . . . 7.1.10 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.11 Expected Value of a Function of Random Variables . . . . . . . . . . . . 7.1.12 The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.13 Transformation of a Random Variable . . . . . . . . . . . . . . . . . . . 7.1.14 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Stochastic Processes (Random Processes) . . . . . . . . . . . . . . . . . . . . . 7.3.1 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Second-Order Moment Functions and Wide-Sense Stationarity . . . . . 7.4 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Definition of Power Spectral Density . . . . . . . . . . . . . . . . . . . . 7.4.2 Relationship between the PSD and SOMF: Wiener-Khintchine Theorem 7.4.3 Properties of the PSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Cross-Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Definition of the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Random Signals and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . 7.6.2 Linear filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Linear filtered process plus noise . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69 69 69 69 70 70 71 72 73 74 77 77 78 79 80 81 83 84 86 88 93 93 95 96 97 99 100 100 101 106

8 The Finite Fourier Transform 111 8.1 Statistical Properties of the Finite Fourier Transform . . . . . . . . . . . . . . . . 111 8.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

CONTENTS

iii

9 Spectrum Estimation 115 9.1 Use of bandpass filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10 Non-Parametric Spectrum Estimation 10.1 Consistency of Spectral Estimates . . . . . . . . . . . . . 10.2 The Periodogram . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Mean of the Periodogram . . . . . . . . . . . . . . 10.2.2 Variance of the Periodogram . . . . . . . . . . . . 10.3 Averaging Periodograms - Bartlett Estimator . . . . . . . 10.4 Windowing Periodograms - Welch Estimator . . . . . . . 10.5 Smoothing the Periodogram . . . . . . . . . . . . . . . . . 10.6 A General Class of Spectral Estimates . . . . . . . . . . . 10.7 The Log-Spectrum . . . . . . . . . . . . . . . . . . . . . . 10.8 Estimation of the Spectrum using the Sample Covariance 10.8.1 Blackman-Tukey Method . . . . . . . . . . . . . . 10.9 Cross-Spectrum Estimation . . . . . . . . . . . . . . . . . 10.9.1 The Cross-Periodogram . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

119 119 120 120 122 124 126 127 130 133 133 134 135 136

11 Parametric Spectrum Estimation 11.1 Spectrum Estimation using an Autoregressive Process . . . . . . . . . . . . . . . 11.2 Spectrum Estimation using a Moving Average Process . . . . . . . . . . . . . . . 11.3 Model order selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

139 139 141 143

12 Discrete Kalman Filter 12.1 State Estimation . . . . . . . . . . . . 12.2 Derivation of Kalman Filter Equations 12.3 Extended Kalman Filter . . . . . . . . .1 Discrete-Time Fourier Transform . . .

145 145 145 149 152

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . . . . . . . . . . .

. . . .

. . . .

iv

CONTENTS

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13

Example of a continuous-time signal. . . . . Example of a continuous-time signal. . . . . Example of a sampled continuous-time signal. The sequence p(n). . . . . . . . . . . . . . . Discrete Time Signal. . . . . . . . . . . . . . Exponential Sequence with 0 < α < 1. . . . . Exponential Sequence with −1 < α < 0. . . . Sinusoidal Sequence. . . . . . . . . . . . . . . Real and Imaginary parts of x(n). . . . . . . Representation of a discrete-time system. . . Linear Shift Invariant System. . . . . . . . . Transversal Filter input and output for q = 3. Convolution operation. . . . . . . . . . . . . .

. . . . . . . . . . . . .

1 2 2 3 3 4 5 5 6 7 8 10 10

2.1 2.2

Continuous-Time to digital converter. . . . . . . . . . . . . . . . . . . . . . . . . Effect in the frequency domain of sampling in the time domain: (a) Spectrum of the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete-time processing of continuous-time signals. . . . . . . . . . . . . . . . . . Frequency Spectra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous-Time Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.3 2.4 2.5 3.1 3.2 3.3 3.4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

A basic system for discrete-time filtering of a continuous-time signal. . . . . . . . Linear and time-invariant filtering of s(n). . . . . . . . . . . . . . . . . . . . . . . Examples of the four types of linear phase FIR filters . . . . . . . . . . . . . . . . Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters. 3.9 Specifications of a low-pass filter. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Low-pass filter with transition band. . . . . . . . . . . . . . . . . . . . . . . . .

v

16 17 17 17 20 21 25 26 26 26 27 28 29 29

vi

LIST OF FIGURES 3.11 Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c) Bandpass. (d) Stopband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

4.12 4.13 4.14 4.15 4.16 4.17 4.18 5.1 5.2 5.3 5.4 5.5

The sequence h(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magnitude of the Fourier transform of a rectangular window (N = 8). . . . . . . Commonly used windows for FIR filter design N = 200. . . . . . . . . . . . . . . Fourier transform (magnitude) of rectangular window (N = 51). . . . . . . . . . Fourier transform (magnitude) of Bartlett (triangular) window (N = 51). . . . . Fourier transform (magnitude) of Hanning window (N = 51). . . . . . . . . . . . Fourier transform (magnitude) of Hamming window (N = 51) . . . . . . . . . . . Fourier transform (magnitude) of Blackman window (N = 51). . . . . . . . . . . Fourier transform of a Hamming window with window length of N = 101. . . . Fourier transform of a Hamming window with window length of N = 201. . . . (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21. (b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed). Response functions. (a) Impulse response. (b) Log magnitude. . . . . . . . . . . Kaiser window estimate. (a) Impulse response. (b) Log magnitude. . . . . . . . . Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flowchart of Parks-McClellan algorithm. . . . . . . . . . . . . . . . . . . . . . . . Signal flow graph for a delay element. . . . . . . . . . . . . . . . . . . . . . . . . Signal Flow graph representing the convolution sum. . . . . . . . . . . . . . . . Efficient signal flow graph for the realisation of a type I FIR filter. . . . . . . . .

30 35 36 37 38 38 38 39 39 39 40

41 43 44 46 50 51 51 51

Low-pass filter specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magnitude transfer function Hc (jΩ). . . . . . . . . . . . . . . . . . . . . . . . . . Filter specifications for Example 5.2.2. . . . . . . . . . . . . . . . . . . . . . . . . Mapping of the s-plane onto the z-plane using the bilinear transformation . . . . Mapping of the continuous-time frequency axis onto the unit circle by bilinear transformation (Td = 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signal flow graph of direct form I structure for an (N − 1)th-order system. . . . Signal flow graph of direct form II structure for an N − 1th-order system. . . . Cascade structure for a 6th-order system with a direct form II realisation of each second-order subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

6.1

A Michelson Interferometer.

66

7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12

Probability function. . . . . . . . . . . . . . . . . . . χ2ν distribution with ν = 1, 2, 3, 5, 10 . . . . . . . . . The quadrant D of two arbitrary real numbers x and Bivariate probability distribution function. . . . . . . Properties for n=2 . . . . . . . . . . . . . . . . . . . Probability that (X1 , X2 ) is in the rectangle D3 . . . Voltage waveforms emitted from a noise source. . . . Measurement of SOMF. . . . . . . . . . . . . . . . . A linear filter. . . . . . . . . . . . . . . . . . . . . . . Interaction of two linear systems . . . . . . . . . . . Interaction of two linear systems. . . . . . . . . . . . Cascade of L linear systems. . . . . . . . . . . . . . .

5.6 5.7 5.8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

55 56 56 58 58 61 62

71 73 74 75 76 76 85 90 101 104 105 106

LIST OF FIGURES

vii

7.13 Cascade of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.14 A linear filter plus noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.1

Transfer function of an ideal bandpass filter.

. . . . . . . . . . . . . . . . . . . . 114

9.1 9.2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Transfer function of an ideal band-pass filter . . . . . . . . . . . . . . . . . . . . 116

10.1 The window function N1 |∆N (ejω )|2 for N = 20, N = 10 and N = 5. . . . . . . . 10.2 The periodogram estimate of the spectrum (solid line) and the true spectrum ω (dotted line). The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The periodogram as an estimate of the spectrum using 4 non-overlapping segω while the ordinate shows the ments. The abscissa displays the frequency 2π estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. . . . . . . . . . . . . . . . . . . 10.5 The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to ω bottom right). The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

122

11.1 11.2 11.3 11.4

139 141 143

Linear and time-invariant filtering of white noise . . . . . . . . . . . . . . . . . . Filter model for a moving average process . . . . . . . . . . . . . . . . . . . . . . Estimate of an MA process with N = 100 and q = 5 . . . . . . . . . . . . . . . . ω True and estimated AR(6) spectrum. The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . 11.5 Akaike and MDL information criteria for the AR(6) process for orders 1 through 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

123

125 129

130

132

144 144

12.1 Relationship between system state and measurements assumed for Kalman filter. 146

viii

LIST OF FIGURES

List of Tables 3.1

The four classes of FIR filters and their associated properties . . . . . . . . . . .

24

4.1 4.2

Suitability of a given class of filter for the four FIR types . . . . . . . . . . . . . Properties of different windows. . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 37

ix

x

LIST OF TABLES

Preface This draft is provided as an aid for students studying the subject Digital Signal Processing. It covers the material presented during lectures. Useful references on this subject include: 1. Oppenheim, A.V. & Schafer, R.W. Discrete-Time Signal Processing, Prentice-Hall, 1989. 2. Proakis, J.G. & Manolakis, D.G. Digital Signal Processing Principles, Algorithms, and Applications, Maxwell MacMillan Editions, 1992. 3. Mitra, S.K. Digital Signal Processing: A Computer-Based Approach, Prentice-Hall, 1998. 4. Diniz, P. & da Silva, E. & Netto, S. Digital Signal Processing, System Analysis and Design, Cambridge University Press, 2002 5. Porat, B. A Course on Digital Signal Processing, J. Wiley & Sons Inc., 1997 6. Kammeyer, K.D. & Kroschel, K. Digitale Signalverarbeitung, Teubner Studienb¨ ucher, 1998 7. H¨ ansler, E. Statistische Signale, Springer Verlag, 3. Auflage, 2001 8. B¨ ohme, J.F. Stochastische Signale, Teubner Studienb¨ ucher, 1998 9. Brillinger, D.R. Time Series, Data Analysis and Theory, Holden Day Inc., 1981 10. Priestley, M.B. Spectral Analysis and Time Series, Academic Press, 2001

xi

Chapter 1

Discrete-Time Signals and Systems 1.1

Signals

Most signals can in practice be classified into three broad groups: 1. Analog or continuous-time signals are continuous in both time and amplitude. Examples: speech, image, seismic and radar signals. An example of a continuous time signal is shown in Figure 1.1. x(t)

t

Figure 1.1: Example of a continuous-time signal. 2. Discrete-time signals are discrete in time and continuous in amplitude. An example of such a signal is shown in Figure 1.2. A common way to generate discrete-time signals is by sampling continuous-time signals. There exist, however, signals which are intrinsically discrete in time, such as the average monthly rainfall. 3. Digital signals are discrete in both time and amplitude. Digital signals may be generated through the amplitude quantisation of discrete-time signals. The signals used by computers are digital, being discrete in both time and amplitude. The development of digital signal processing concepts based on digital signals requires a detailed treatment of amplitude quantisation, which is extremely difficult and tedious. In addition, many useful insights are lost in such a treatment because of the mathematical complexity. For this reason, digital signal processing concepts have been developed based on discretetime signals and continuous-time signals. Experience shows that the theories developed 1

2

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS 6 x(n)

-

1 2 3

n

Figure 1.2: Example of a continuous-time signal. based on continuous-time or discrete-time signals are often applicable to digital signals. A discrete-time signal (sequence) will be denoted by functions whose arguments are integers. For example, x(n) represents a sequence which is defined for all integer values of n. If n is not integer then x(n) is not zero, but is undefined. The notation x(n) refers to the discrete-time function x or to the value of the function x, at a specific n. Figure 1.3 gives an example of a sampled signal with sampling period T . Herein x(n) = x(t)|t=nT = x(nT ), n integer and T > 0. x(n)







t

Figure 1.3: Example of a sampled continuous-time signal. There are some sequences which play an important role in digital signal processing. We discuss these next.

1.1.1

The unit sample sequence

The unit sample sequence is denoted by δ(n) and is defined as  1, n=0, δ(n) = 0, otherwise. The sequence δ(n) may be regarded as an analog to the impulse function used in continuoustime system analysis. An important aspect of the impulse sequence is that any sequence can be represented as a linear combination of delayed impulses. For example, the sequence p(n) in Figure 1.4 can be expressed as p(n) = a−1 δ(n + 1) + a1 δ(n − 1) + a3 δ(n − 3) .

1.1. SIGNALS

3

In general, p(n)

6

a−1

a1

a3 - n

−1

1

3

Figure 1.4: The sequence p(n).

x(n) =

∞ X

k=−∞

as shown in Figure 1.5.

x(k)δ(n − k) ,

6x(n)

-

1 6

n

x(k) δ(n − k)

-

n

k

Figure 1.5: Discrete Time Signal.

1.1.2

The unit step sequence

The unit step sequence denoted by u(n) is defined as  1, n ≥ 0, u(n) = 0, otherwise. The sequence u(n) is related to δ(n) by u(n) =

n X

k=−∞

δ(k) ,

4

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS

or alternatively u(n) =

∞ X k=0

δ(n − k) .

Conversely, the impulse sequence can be expressed as the first backward difference of the unit step sequence, i.e., n−1 n X X δ(k) δ(k) − δ(n) = u(n) − u(n − 1) = k=−∞

1.1.3

k=−∞

Exponential and sinusoidal sequences

Exponential and sinusoidal sequences play an important role in digital signal processing. The general form of an exponential sequence is x(n) = A · αn ,

(1.1)

where A and α may be complex numbers. For real A > 0 and real 0 < α < 1, x(n) is a decreasing sequence similar in appearance to Figure 1.6. x(n)

n

Figure 1.6: Exponential Sequence with 0 < α < 1. For −1 < α < 0, the sequence values alternate in sign but again decrease in magnitude with increasing n, as shown in Figure 1.7. If |α| > 1, then x(n) grows in magnitude as n increases. A sinusoidal sequence has the general form x(n) = A · cos(ωo n + φ) with A and φ real. The sinusoidal sequence is shown in Figure 1.8 for φ = Now let A and α be complex so that A = |A|ejφ and α = |α|ejω0 , then

3π 2 .

x(n) = Aαn = |A|ejφ |α|n ejnω0

= |A| · |α|n ej(ω0 n+φ)

= |A| · |α|n [cos(ω0 n + φ) + j sin(ω0 n + φ)] A plot of the real part of x(n) = A · αn for A = |A|ejω0 is given in Figure 1.9 for |α| > 1. In this case we set ω0 = π and φ = 0, i.e., ℜ{x(n)} = |A| · |α|n cos(πn) = |A| · |α|n (−1)n .

1.1. SIGNALS

5 x(n)

n

Figure 1.7: Exponential Sequence with −1 < α < 0. 6x(n)

π 2ω0

n

Figure 1.8: Sinusoidal Sequence. The imaginary part is zero for all n. For |α| = 1, x(n) is referred to as the complex exponential sequence, and has the form x(n) = |A| · ej(ω0 n+φ) = |A| [cos(πn + φ) + j sin(πn + φ)] , i.e., real and imaginary parts of ejω0 n vary sinusoidally with n. Note. In the continuous-time case, the quantity ω0 is called the frequency and has the dimension radians per second. In the discrete-time case n is a dimensionless integer. Thus the dimension of ω0 must be radians. To maintain a closer analogy with the continuous-time case we can specify the units of ω0 to be radians per sample and the units of n to be samples. Following this rationale we can define the period of a complex exponential sequence as 2π/ω0 which has the dimension samples.

6

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS x(n)

n

Figure 1.9: Real and Imaginary parts of x(n).

1.1.4

Periodic Sequences

A sequence x(n) is called periodic with period N if x(n) satisfies the following condition: x(n) = x(n + N )

∀ n,

where N is a positive integer. For example, the sequence cos(πn) is periodic with period 2k, k ∈ Z, but the sequence cos(n) is not periodic. Note that a sinusoidal discrete-time sequence A cos (ω0 n) is periodic with period 2π if ω0 N = 2πk, k ∈ Z. A similar statement holds for the complex exponential sequence C · ejω0 n , i.e., periodicity with period N requires ejω0 (n+N ) = ejω0 n ,

which is true only for ω0 N = 2πk, k ∈ Z .

(1.2)

Consequently, complex exponential and sinusoidal sequences are not necessarily periodic with period 2π/ω0 and, depending on the value of ω0 , may not be periodic at all. For example, for ω0 = 3π/4, the smallest value of N for which equation (1.2) can be satisfied with k an integer is N = 8 ( corresponding to k = 3). For ω0 = 1, there are no integer values of N or k for which Equation (1.2) is satisfied. When we combine the condition of Equation (1.2) with the observation that ω0 and ω0 +2πr, r ∈ Z, are indistinguishable frequencies, then it is clear that there are only N distinguishable frequencies for which the corresponding sequences are periodic with period N . One set of frequencies is ωk = 2πk/N , k = 0, 1, . . . , N − 1. These properties of complex exponential and sinusoidal sequences are basic to both the theory and the design of computational algorithms for discrete-time Fourier analysis.

1.2

Systems

A system T that maps an input x(n) to an output y(n) is represented by: y(n) = T[x(n)]

1.2. SYSTEMS

7 x(n)

-

- y(n)

T[·]

Figure 1.10: Representation of a discrete-time system. Example 1.2.1 The ideal delay system is defined by y(n) = x(n − nd ), nd ∈ Z+ . The system would shift the input to the right by nd of samples corresponding to a time delay. This definition of a system is very broad. Without some restrictions, we cannot completely characterise a system when we know the output of a system to a certain set of inputs. Two types of restrictions greatly simplify the characterisation and analysis of systems; linearity and shift invariance (LSI). Many systems can be approximated in practice by a linear and shift invariant system. A system T is said to be linear if T[ax1 (n) + bx2 (n)] = ay1 (n) + by2 (n) holds, where T[x1 (n)] = y1 (n), T[x2 (n)] = y2 (n), and a and b are any scalar constants. The property T[ax(n)] = aT[x(n)] is called homogeneity, and T[x1 (n) + x2 (n)] = T[x1 (n)] + T[x2 (n)] is called additivity. Additivity and homogeneity are collectively known as the Principle of Superposition. A system is said to be shift invariant (SI) if T[x(n − n0 )] = y(n − n0 ) where y(n) = T[x(n)] and n0 is any integer. Example 1.2.2 A system described by the relation y(n) =

n X

x(k)

k=−∞

is called an accumulator. This system is both linear and shift invariant. Example 1.2.3 Consider a system which is described by y(n) = T[x(n)] = x(n) · g(n) where g(n) is an arbitrary sequence. This system is shown in Figure 1.11. system is easily verified since T [ax1 (n) + bx2 (n)] = g(n) [ax1 (n) + bx2 (n)] = ag(n)x1 (n) + bg(n)x2 (n) = aT [x1 (n)] + bT [x2 (n)] .

Linearity of the

8

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS -

x(n)

×

- y(n)

6

T g(n)

Figure 1.11: Linear Shift Invariant System. However, the system is shift variant because T [x(n − n0 )] = x(n − n0 )g(n) does not equal y(n − n0 ) = x(n − n0 )g(n − n0 ) . Example 1.2.4 Consider a system described by y(n) = T [x(n)] = x2 (n) . Although this system is shift-invariant, it is non-linear. The system is non-linear because T [x1 (n) + x2 (n)] = [x1 (n) + x2 (n)]2 , which is different from y1 (n) + y2 (n) = x21 (n) + x22 (n) . The system is shift-invariant because T[x(n − n0 )] = x2 (n − n0 ) , and y(n − n0 ) = x2 (n − n0 ) .

1.2.1

The unit sample response of linear systems

Consider a linear system T. Using the relationship x(n) =

∞ X

k=−∞

x(k)δ(n − k)

and the linearity property, the output y(n) of a linear system T for an input x(n) is given by y(n) = T[x(n)] # " ∞ X x(k)δ(n − k) = T k=−∞

= T[· · · + x(−1)δ(n + 1) + x(0)δ(n) + x(1)δ(n − 1) + · · · ] ∞ X x(k)T[δ(n − k)] = k=−∞

1.3. CONVOLUTION

9

This result indicates that a linear system can be completely characterised by the response of the system to a unit sample sequence δ(n) and its delays δ(n − k). Moreover, if the system is shift-invariant, we have: h(n − n0 ) = T[δ(n − n0 )]

where h(n) = T[δ(n)] is the unit sample response of a system T to an input δ(n).

1.3

Convolution

The input-output relation of a linear and shift-invariant (LSI) system T is given by y(n) = T[x(n)] =

∞ X

k=−∞

x(k)h(n − k)

This equation shows that knowledge of h(n) completely characterises the system, i.e., knowing the unit sample response, h(n), of a LSI system T, allows us to completely determine the output y(n) for a given input x(n). This equation is referred to as convolution and is denoted by the convolution operator ⋆. For a LSI system, we have y(n) = x(n) ⋆ h(n) ∞ X x(k)h(n − k) = k=−∞

Remark 1: The unit sample response h(n) loses its significance for a nonlinear or shift-variant system. Remark 2: The choice of δ(n) as an input in characterising a LSI system is the simplest, both conceptually and in practice. Example 1.3.1 A linear transversal filter is given by y(n) =

q X k=0

x(n − k)a(k)

= a(0)x(n) + a(1)x(n − 1) + . . . + a(q)x(n − q) 1 If we take a(0) = a(1) = . . . = a(q) = q+1 , the filter output sequence is an average of q + 1 samples of the input sequence around the nth sample. A transversal filter input and output for q = 3 are shown in Figure 1.12.

Example 1.3.2 To compute the relationship between the input and the output with a transversal filter, h(−k) is first plotted against k; h(−k) is simply h(k) reflected or “flipped” around k = 0, as shown in Figure 1.13. Replacing k by k − n, where n is a fixed integer, leads to a shift in the P origin of the sequence h(−k) to k = n. y(n) is then obtained by the x(k)h(n − k). See Figure 1.13.

1.4

Stability and Causality

To implement a LSI system in practice (realisability), we have to impose two additional constraints – stability and causality.

10

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS

x(k)

-

n

n−3

k y(k)

-

n

n−3

k

Figure 1.12: Transversal Filter input and output for q = 3. x(k) 6

6h(k)

3 2

2

1

-

1

-

k

k y(n)

6

7 6

6h(−k)

4 2 1

-

1

k

-

n

Figure 1.13: Convolution operation.

1.4.1

Stability

A system is considered stable in the bounded-input bounded-output (BIBO) sense if and only if a bounded input always leads to a bounded output. A necessary and sufficient condition for

1.4. STABILITY AND CAUSALITY

11

an LSI system to be stable is that the unit sample response h(n) is absolutely summable, i.e., ∞ X

n=−∞

|h(n)| < ∞

Such a sequence h(n) is sometimes referred to as stable sequence. The proof is quite simple. First let x(n) be bounded by M , i.e., |x(n)| ≤ M < ∞ ∀ n, then using convolution ∞ X h(k)x(n − k) |y(n)| = ≤

k=−∞ ∞ X

|h(k)||x(n − k)|

k=−∞ ∞ X

≤ M

k=−∞

|h(k)|

So given a bounded input |x(n)| ≤ M , the output y(n) is bounded if and only if h(n) is absolutely summable.

1.4.2

Causality

Causality is a fundamental law of physics stating that information cannot travel from the future to the past. In the context of systems we say that a system is causal if and only if the current output y(n) does not depend on any future values of the input such as x(n + 1), x(n + 2), . . .. Causality holds for any practical system regardless of linearity or shift invariance. A necessary and sufficient condition for an LSI system to be causal is that h(n) = 0, for n < 0. The proof follows from convolution, y(n) =

∞ X

k=−∞

x(n − k)h(k).

To obtain y, at index n, causality implies that we cannot assume knowledge of x at index n − k for n − k > n, or simply, for k < 0. Setting h(k) = 0 for k < 0 ensures this.

12

CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS

Chapter 2

Digital Processing of Continuous-Time Signals Discrete-time signals arise in many ways, but they most commonly occur in representations of continuous-time signals. This is partly due to the fact that the processing of continuoustime signals is often carried out on discrete-time sequences obtained by sampling continuoustime signals. It is remarkable that under reasonable constraints a continuous-time signal can be adequately represented by a sequence. In this chapter we discuss the process of periodic sampling.

2.1

Periodic Sampling

Let xc (t) be a continuous-time signal and its Fourier transform be denoted by Xc (jΩ) Z +∞ xc (t)e−jΩt dt . Xc (jΩ) = −∞

The signal xc (t) is obtained from Xc (jΩ) by Z +∞ 1 xc (t) = Xc (jΩ)e+jΩt dΩ . 2π −∞ The typical method of obtaining a discrete-time representation of a continuous-time signal is through periodic sampling, wherein a sequence of samples x(n) is obtained from the continuoustime signal xc (t) according to n = 0, ±1, ±2, . . . ,

x(n) = xc (t)|t=nT = xc (nT ),

T > 0.

Herein, T is the sampling period, and its reciprocal, fs = 1/T , is the sampling frequency, in samples per second and Ωs = 2πfs is the sampling frequency in rad/sec. This equation is the input-output relationship of an ideal A/D converter. It is generally not invertible since many continuous-time signals can produce the same output sequence of samples. This ambiguity can be removed by restricting the class of input signals to the sampler, however. It is convenient to represent the sampling process in two stages, as shown in Figure 2.1. This consists of an impulse train modulator which is followed by conversion of the impulse train into a discrete sequence. The system G in Figure 2.1 represents a system that converts an impulse

13

14

CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS -

xc (t)

-

×

- x(n) = xc (nT )

G

6 xs (t)

p(t) Figure 2.1: Continuous-Time to digital converter. train into a discrete-time sequence. The modulation signal p(t) is a periodic impulse train, ∞ X

p(t) =

n=−∞

δ(t − nT ) ,

where δ(t) is called the Dirac delta function. Consequently, we have xs (t) = xc (t) · p(t) = xc (t) ·

∞ X

n=−∞

δ(t − nT ) =

∞ X

n=−∞

xc (nT )δ(t − nT ) =

∞ X

n=−∞

x(n)δ(t − nT ).

The Fourier transform of xs (t), Xs (jΩ), is obtained by convolving Xc (jΩ) and P (jΩ). The Fourier transform of a periodic impulse train is a periodic impulse train, i.e., ∞ ∞ 2πk 2π X 2π X δ(Ω − δ(Ω − kΩs ). )= P (jΩ) = T T T k=−∞

k=−∞

Since Xs (jΩ) =

1 Xc (jΩ) ∗ P (jΩ), 2π

it follows that    ∞ ∞ 2πk 1 X 1 X Xc j Ω − Xc (jΩ − kjΩs ) . = Xs (jΩ) = T T T k=−∞

k=−∞

The same result can be obtained by noting that p(t) is periodic so that it can be represented as a Fourier series with coefficients T1 , i.e., p(t) = Thus

∞ 1 X jnΩs t e . T n=−∞

xs (t) = xc (t)p(t) =

∞ 1 X xc (t)ejnΩs t . T n=−∞

Taking the Fourier transform of xs (t) leads to

∞ 1 X Xc (j (Ω − kΩs )) . Xs (jΩ) = T k=−∞

2.2. RECONSTRUCTION OF BAND-LIMITED SIGNALS

15

Figure 2.2 depicts the frequency domain representation of impulse train sampling. Figure 2.2(a) represents the Fourier transform of a band-limited signal where the highest non-zero frequency component in Xc (jΩ) is at ΩB . Figure 2.2(b) shows the periodic impulse train P (jΩ), Figure 2.2(c) shows Xs (jΩ), which is the result of convolving Xc (jΩ) with P (jΩ). From Figure 2.2(c) it is evident that when Ωs − ΩB > ΩB , or Ωs > 2ΩB the replica of Xc (jΩ) do not overlap, and therefore xc (t) can be recovered from xs (t) with an ideal low-pass filter with cut-off frequency of at least ΩB but not exceeding ΩS −ΩB . However, in Figure 2.2(d) the replica of Xc (jΩ) overlap, and as a result the original signal cannot be recovered by ideal low pass filtering. The reconstructed signal is related to the original continuous-time signal through a distortion referred to as aliasing. If xc (t) is a band-limited signal with no components above ΩB rad/sec, then the sampling frequency has to be equal to or greater than 2ΩB rad/sec. This is known as the Nyquist criterion and 2ΩB is known as the Nyquist frequency.

2.2

Reconstruction of Band-Limited Signals

If x(n) is the input to an ideal low pass filter with frequency response Hr (jΩ) and impulse response hr (t), then the output of the filter will be xr (t) =

∞ X

n=−∞

x(n)hr (t − nT ) .

where T is the sampling interval. The reconstruction filter commonly has a gain of T and a cutoff frequency of Ωc = Ωs /2 = π/T . This choice is appropriate for any relationship between Ωs and ΩB that avoids aliasing, i.e., so long as Ωs > 2ΩB . The impulse response of such a filter is given by sin (πt/T ) hr (t) = . πt/T Consequently, the relation between xr (t) and x(n) is given by xr (t) =

∞ X

n=−∞

x(n)

sin[π(t − nT )/T ] , π(t − nT )/T

with xc (nT ) = x(n).

2.3

Discrete-time Processing of Continuous-Time Signals.

A continuous-time signal is processed by digital signal processing techniques using analog-todigital (A/D) and digital-to-analog (D/A) converters before and after processing. This concept is illustrated in Figure 2.3. The low pass filter limits the bandwidth of the continuous-time signal to reduce the effect of aliasing. The spectral characteristics (magnitude) of the low pass filter is shown in Figure 2.5. Example 2.3.1 Assume that we wanted to restore the signal sc (t) from yc (t) = sc (t) + nc (t), where nc (t) is a background noise. The spectral characteristics of these signals are shown in Figure 2.4.

16

CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS (a)

X c (j Ω) 1

− ΩB (b)

ΩB



P(j Ω) 2π −−− T

−2 Ω s

− Ωs

Ωs

(c) 1 −−− T

1 −−− T



X s (j Ω)

ΩB (d)

2 Ωs

Ωs



X s (j Ω)

Ωs



Figure 2.2: Effect in the frequency domain of sampling in the time domain: (a) Spectrum of the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB . We can perform the processing for this problem using either continuous-time or digital techniques. Using a continuous-time approach, we would filter the signal sc (t) using an continuoustime low pass filter. Alternatively, we could adopt a digital approach, as shown in Figure 2.3. We would first low pass filter yc (t) to prevent aliasing, and then convert the continuous-time signal into a discrete-time signal using an A/D converter. Any processing is then performed

2.3. DISCRETE-TIME PROCESSING OF CONTINUOUS-TIME SIGNALS. Analog

Analog

(or Continuous

Digital

- Low pass

-Time) Signal

-

A/D

Processed

- Signal

Filter

17

-

-

D/A

Processing

Continuous-Time Signal

Figure 2.3: Discrete-time processing of continuous-time signals. digitally on the discrete-time signal. The processed signal is then converted back to continuous form using an D/A converter. Nc (jΩ) 6

Sc (jΩ) 6 1

1

-

-ΩB

+ΩB

-



-ΩB1

-ΩB

ΩB

ΩB1 Ω

Yc (jΩ) 6 1

-

-ΩB1

ΩB

ΩB

ΩB1 Ω

Figure 2.4: Frequency Spectra.

H(jΩ)

6

yc (t)

Processed Signal

-

-

-

-ΩB

0

ΩB Ω

Figure 2.5: Continuous-Time Filter. The advantages of processing signals digitally as compared with an analog approach are:

18

CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS 1. Flexibility, if we want to change the continuous-time filter because of change in signal and noise characteristics, we would have to change the hardware components. Using the digital approach, we only need to modify the software. 2. Better control of accuracy requirements. Tolerances in continuous-time circuit components make it extremely difficult for the system designer to control the accuracy of the system. 3. The signals can be stored without deterioration or loss of signal quality. 4. Lower cost of the digital implementation.

Chapter 3

Filter Design 3.1

Introduction

Filters are an important class of linear time-invariant systems. Their objective is to modify the input signal. In practice filters are frequency selective, meaning they pass certain frequency components of the input signal and totally reject others. The objective of this chapter and the following ones is to introduce the concepts of frequency selective filters and the most popular filter design techniques. The design of filters (continuous-time or digital) involves the following three steps. 1. Filter specification This step consists of setting the required characteristics of the filter. For example, to restore a signal which has been degraded by background noise, the filter characteristics will depend on the spectral characteristics of both the signal and background noise. Typically (but not always) specifications are given in the frequency domain. 2. Filter design Determine the unit sample response, h(n) of the filter or its system function, H(z) and H(ejω ). 3. Filter implementation Implement a discrete time system with the given h(n) or H(z) or one which approaches it. These three steps are closely related. For instance, it does not make much sense to specify a filter that cannot be designed or implemented. So, we shall consider a particular class of digital filters which possess the following practical properties: 1. h(n) is real. 2. h(n) is causal, i.e., h(n) = 0 for n < 0. 3. h(n) is stable, i.e,

∞ X

n=0

|h(n)| < ∞ .

19

20

CHAPTER 3. FILTER DESIGN

In a practical setting, the desired filter is often implemented digitally on a chip and is used to filter a signal derived from a continuous-time signal by means of periodic sampling followed by analog-to-digital conversion. For this reason, we refer to digital filters as discrete-time filters even though the underlying design techniques relate to the discrete-time nature of the signals and systems. -

xc (t)

A/D

x(n)

- Discrete-Time System

-

y(n)

6

D/A

-

yc (t)

6

T

T

Figure 3.1: A basic system for discrete-time filtering of a continuous-time signal. A basic system for discrete-time filtering of continuous-time signals is given in Figure 3.1. Assuming the input, xc (t), is band-limited and the sampling frequency is high enough to avoid aliasing, the overall system behaves as a linear time-invariant system with frequency response  H(ejΩT ), |Ω| < π/T Hef f (jΩ) = 0, |Ω| > π/T In such cases, it is straightforward to convert from specifications on the effective continuous-time filter to specifications on the discrete-time filter through the relation ω = ΩT . That is H(ejω ) is specified over one period by the equation  ω , |ω| < π . H(ejω ) = Hef f j T

This concept will be treated in detail in Chapter 5.

3.2

Digital Filters

Throughout the text we will interchangeably use system and filter. Linear time-invariant discrete-time systems can be characterised by linear constant coefficient difference equations with initial rest condition of the type M X k=0

ak y(n − k) =

N −1 X r=0

br x(n − r),

(3.1)

where x(n) and y(n), n = 0, ±1, . . . are the input and output of the system and max(N − 1, M ) is the order. The unit sample response of the system may be of either finite duration or infinite duration. It is useful to distinguish between these two classes as the properties of the digital processing techniques for each differ. If the unit sample response is of finite duration, we have a finite impulse response (FIR) filter, and if the unit sample response is of infinite duration, we have an infinite impulse response (IIR) filter. If M = 0 so that N −1 1 X y(n) = br x(n − r) (3.2) a0 r=0

3.3. LINEAR PHASE FILTERS

21

then the system corresponds to an FIR filter. In fact comparison with the convolution sum y(n) =

∞ X

k=−∞

h(k)x(n − k)

shows that Equation (3.2) is identical to the convolution sum, hence it follows that  bn a0 , n = 0, . . . , N − 1 h(n) = 0, otherwise. An FIR filter can be always described by a difference equation of the form (3.1) with M = 0. M must be greater than zero for an IIR filter.

3.3

Linear Phase Filters

A digital filter with unit sample response h(n) is said to have linear phase if H(ejω ) can be expressed as H(ejω ) = HM (ejω )e−jωα |ω| < π

where HM (ejω ) is a real function of ω and α is a constant. The significance of a linear phase filter lies in the result that if a discrete-time signal is filtered by a linear-shift-invariant (LSI) system whose transfer function has magnitude HM (ejω ) = 1 and linear phase (see Figure 3.2), then the spectrum of the output is s(n)

-

HM (ejω ) = 1, α

- y(n)

Figure 3.2: Linear and time-invariant filtering of s(n). S(ejω )e−jωα ↔ s(n − α) = y(n) . Simply put, a linear phase filter preserves the shape of the signal. Should the filter have the property that |H(ejω )| = 1, but does not have linear phase, then the shape of the signal will be distorted. We generalise the notion of linear phase to H(ejω ) = HM (ejω )e−jωα ejβ = HM (ejω )e−jφ(ω) , φ(ω) = −β + ωα where HM (ejω ) is a real function of ω, and α and β are constants. The case where β = 0 is referred to as linear phase as opposed to generalised linear phase where we have H(ejω ) = HM (ejω ) cos(β − ωα) + jHM (ejω ) sin(β − ωα) ∞ X = h(n)e−jωn =

n=−∞ ∞ X

n=−∞

h(n) cos(ωn) − j

∞ X

n=−∞

h(n) sin(ωn),

h(n) real

22

CHAPTER 3. FILTER DESIGN

leading to

∞ X

sin(β − ωα) n=−∞ tan(β − ωα) = =− ∞ X cos(β − ωα)

h(n) sin(ωn) h(n) cos(ωn)

n=−∞

and

∞ X

n=−∞

h(n) sin(ω(n − α) + β) = 0.

(3.3)

To have φ(ω) = −β + ωα, (3.3) must hold for all ω, and is a necessary, but not sufficient, condition on h(n), α and β. For example, one can show that β = 0 or π , 2α = N − 1 , h(2α − n) = h(n)

(3.4)

satisfy (3.3). Alternatively β=

π 3π , or , 2α = N − 1 , h(2α − n) = −h(n) 2 2

(3.5)

also satisfies (3.3). To have a causal system we require ∞ X

n=0

h(n) sin(ω(n − α) + β) = 0

to hold for all ω. Causality and the previous conditions (3.4) and (3.5) imply that h(n) = 0, n < 0 and n > N −1, i.e., causal FIR filters have generalised linear phase if they have impulse response length N and satisfy either h(2α − n) = h(n) or h(2α − n) = −h(n). Specifically if,  h(N − 1 − n), 0 ≤ n ≤ N − 1, h(n) = 0, otherwise , then H(ejω ) = He (ejω )e−jω(N −1)/2 , where He (ejω ) is a real, even, periodic function of ω. Similarly if  −h(N − 1 − n), 0 ≤ n ≤ N − 1 , h(n) = 0, otherwise , then H(ejω ) = jHO (ejω )e−jω(N −1)/2 = HO (ejω )e−jω(N −1)/2+jπ/2 , where HO (ejω ) is a real, odd, periodic function of ω. The above two equations for h(n) are sufficient, but not necessary conditions, which guarantee a causal system with generalised linear phase.

3.3.1

Generalised Linear Phase for FIR Filters

We shall restrict our discussion to linear phase FIR filters. We classify FIR filters into four types depending on whether h(n) is symmetric (h(n) = h(N − 1 − n)) or anti-symmetric (h(n) = −h(N − 1 − n)) about the point of symmetry or anti-symmetry, and whether the filter length is odd or even. The four types of FIR filters and their associated properties are discussed below and summarised in Table 3.1. Figure 3.3 shows examples of the four types of FIR filters.

3.3. LINEAR PHASE FILTERS Type I FIR linear phase filter.

23 A type I filter has a symmetric impulse response

h(n) = h(N − 1 − n),

0≤n≤N −1

with N an odd integer. The corresponding frequency response is H(ejω ) =

N −1 X

h(n)e−jωn = e−jω

N −1 2

n=0

  N −1 2 X  a(k) cos(ωk) , k=0

where a(0) = h( N 2−1 ) and a(k) = 2h( N 2−1 − k), k = 1, 2, . . . , N 2−1 .

Type II FIR linear phase filter. A type II filter has a symmetric impulse response with N an even integer. The corresponding frequency response is   N/2 X  N −1 H(ejω ) = e−jω 2 b(k) cos[ω(k − 1/2)] ,   k=1

where

b(k) = 2h(N/2 − k), Type III FIR linear phase filter.

k = 1, 2, . . . , N/2.

A type III filter has an antisymmetric impulse response

h(n) = −h(N − 1 − n),

0≤n≤N −1

with N an odd integer. The corresponding frequency response is   −1)/2 (NX  N −1 H(ejω ) = je−jω 2 c(k) sin(ωk) ,   k=1

where

c(k) = 2h((N − 1)/2 − k),

k = 1, 2, . . . , (N − 1)/2.

Type IV FIR linear phase filter. A type IV filter has an antisymmetric impulse response with N an even integer. The corresponding frequency response is   N/2 X  N −1 d(k) sin[ω(k − 1/2)] , H(ejω ) = je−jω 2   k=1

where

d(k) = 2h(N/2 − k),

k = 1, 2, . . . , N/2.

24

CHAPTER 3. FILTER DESIGN Table 3.1: The four classes of FIR filters and their associated properties

Type

Symmetry of h(n)

N

α

β

Form of HM (ejω ) N −1 2

h(n) = h(N − 1 − n)

I

odd

N −1 2

X

0

a(n) · cos ωn

real

1 b(n) · cos ω(n − ) 2

real

n=0

symmetric

π

Constraints on HM (ejω ) · ejβ

N

h(n) = h(N − 1 − n)

II

even

N −1 2

0

2 X

n=1

symmetric

π

HM (ejπ ) = 0

N −1 2

III

h(n) = −h(N − 1 − n)

odd

N −1 2

X

π/2

n=1

anti-symmetric

3π/2

c(n) · sin ωn

Pure Imaginary HM (ej0 ) = HM (ejπ ) = 0

N

IV

h(n) = −h(N − 1 − n)

even

N −1 2

π/2

2 X

n=1

anti-symmetric

3π/2

1 d(n) · sin ω(n − ) 2

Pure Imaginary HM (ej0 ) = 0

Remarks 1. A filter will have no phase distortion if H(ejω ) is real and even since h(n) will be real and even. 2. Non-causal FIR filters can be made causal by an appropriate time shift so that the first non-zero value occurs at n = 0. The point of symmetry can therefore occur at two discretetime points N 2−1 depending on whether N was odd or even. Example 3.3.1 Example of a type I FIR filter. Given a filter with unit sample response  1, 0 ≤ n ≤ 4 , h(n) = 0, otherwise we can evaluate the frequency response to be H(ejω ) =

4 X

e−jωn =

n=0

1 − e−j5ω sin(5 · ω/2) = e−j2ω . 1 − e−jω sin(ω/2)

The magnitude and the phase of the frequency response of the filter are depicted in Figure 3.4 Example 3.3.2 Example of a type II FIR filter. Consider an extension, by one sample of the impulse response of the filter in example 3.3.1. Then the frequency response of the system is given by jω

H(e ) =

5 X

5

e−jωn = e−jω 2

n=0

whose magnitude and phase are depicted in Figure 3.5

sin(3 · ω) , sin(ω/2)

3.3. LINEAR PHASE FILTERS

25 h(n) 3 6 2 1

TYPE I



-

n h(n) 3 6 2 1

TYPE II



-

n

h(n) 6

2

TYPE III

1 -

n

-1 -2

h(n) 6 3

2 1

TYPE IV -

-1

n

-2 -3 Figure 3.3: Examples of the four types of linear phase FIR filters Example 3.3.3 Example of a type III FIR filter. Consider a filter with unit impulse response h(n) = δ(n) − δ(n − 2) , where δ(n) is Kronecker’s delta defined as δ(n) =



1 0

, ,

n=0 . n 6= 0

We can then evaluate the frequency response as π

H(ejω ) = 1 − e−j2ω = j[2 sin(ω)]e−jω = 2e−j (ω− 2 ) sin(ω). The magnitude and the phase of H(ejω ) are depicted in Figure 3.6.

26

CHAPTER 3. FILTER DESIGN 5

4

4.5

3

4 2 Phase (Radians)

Magnitude

3.5 3 2.5 2

1

0

−1

1.5 −2 1 −3

0.5 0 0

1

2

3 w rad/s

4

5

−4 0

6

1

2

3 w rad/s

4

5

6

Figure 3.4: Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase (right). 6

4

3 5

Phase (Radians)

2

Magnitude

4

3

1

0

−1 2 −2 1 −3

0 0

1

2

3 w rad/s

4

5

−4 0

6

1

2

3 w rad/s

4

5

6

Figure 3.5: Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase (right). 3

3

2

Phase (Radians)

2.5

Magnitude

2

1.5

1

0

1

−1

0.5

−2

0 0

1

2

3 w rad/s

4

5

6

−3 0

1

2

3 w rad/s

4

5

6

Figure 3.6: Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase (right).

3.4. FILTER SPECIFICATION

27

Example 3.3.4 Example of a type IV FIR filter. Consider a filter with unit impulse response h(n) = δ(n) − δ(n − 1). We can then evaluate the frequency response as ω

π

H(ejω ) = 1 − e−jω = j[2 sin(ω/2)]e−jω/2 = 2e−j ( 2 − 2 ) sin The magnitude and the phase of H(ejω ) are depicted in Figure 3.7.

ω  2

.

3

3

2

Phase (Radians)

2.5

Magnitude

2

1.5

1

0

1

−1

0.5

−2

0 0

1

2

3 w rad/s

4

5

6

−3 0

1

2

3 w rad/s

4

5

6

Figure 3.7: Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase (right).

3.4

Filter Specification

Like continuous-time filters, digital filters are specified in the frequency domain. Since H(ejω ) = H(ej(ω+2π) ) for all ω, H(ejω ) is completely specified for 0 ≤ ω < 2π. In addition, since h(n) is real, we have H(ejω ) = H(e−jω )∗ . This implies that specifying H(ejω ) for 0 ≤ ω < π completely specifies H(ejω ) for all ω. To uniquely specify the complex frequency response H(ejω ), we need to specify both the magnitude and the phase of H(ejω ). However, since we are dealing with linear phase FIR filters, we need only to specify the magnitude. The ideal structure (magnitude of the frequency response) of a filter is given in Figure 3.8 for the low-pass, high-pass, band-pass and stop-band. Example 3.4.1 Specification of a low-pass filter. The ideal low-pass filter is composed of a pass-band and stop-band region. In practice we also have a transition band, as a sharp transition cannot be achieved. Ideally, |H(ejω )| is unity in the passband and zero in the stopband. In practice, we supply error tolerances that require 1 − α1 ≤ |H(ejω )| ≤ 1 + α1 for 0 ≤ ω ≤ ωp , and |H(ejω )| < α2 for ωs ≤ ω ≤ π ,

where α1 and α2 are called the passband tolerance and stopband tolerance, respectively. A lowpass filter is then completely specified by 4 parameters α1 , α2 , ωp and ωs . Figures 3.9 and 3.10 illustrate the concept of tolerance bands in filter design.

28

CHAPTER 3. FILTER DESIGN

1

6 |H(ejω )|

-

π

1

ω

6 |H(ejω )|

-

π

ω

jω 6 |H(e )|

1

-

ω1

ω2

π

ω

jω 6 |H(e )|

1

-

ω1

ω2

π

ω

Figure 3.8: Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters. A similar arrangement holds for high pass, bandstop and band pass filters (see Figure 3.11).

3.5

Properties of IIR Filters

As discussed an IIR filter has a unit sample response of infinite duration. In general, an IIR filter with arbitrary unit sample response h(n) cannot be realised, since computing each output sample requires a large amount of arithmetic operations. Therefore we restrict ourselves to the class of filters which: 1. have a unit sample response h(n) that is real, causal, and stable.

3.5. PROPERTIES OF IIR FILTERS

29

Figure 3.9: Specifications of a low-pass filter. |H(ejω )| 6 Transition Band ?

ωp

-

ω

ωs

Figure 3.10: Low-pass filter with transition band. 2. possess a rational z-transform with a0 = 1, i.e.,

H(z) =

N −1 X

bk z −k

k=0 M X

1−

. ak z −k

r=1

A causal h(n) with a rational z-transform can be realised by a linear constant coefficient difference equation (LCCDE) with initial rest conditions (IRC) where the output is computed recursively. Thus IIR filters can be implemented efficiently. Example 3.5.1 Consider a causal impulse response described by  n 1 h(n) = · u(n) , 2 with H(z) given by H(z) =

1 1 − 21 z −1

.

30

CHAPTER 3. FILTER DESIGN jω 6 |H(e )|

(a)

1 + α1 ............................................. .............................. 1 − α1 ............... α2

........................................................................................................................................................................ ωp

1 + α1

ωs

jω 6 |H(e )|

π

-

ω (b)

........................................................................................................................................................................ ........................................................................................................................................................................

1 − α1 α2 ............................................. ωs 1 + α1

6 |H(ejω )|

1 − α1

-

ωp

π

(c) ............................................. .............................................

α2 .............................................

........................................................................................................................

ωs1 ωp1

ωp2 ωs2

jω 6 |H(e )|

π

-

ω (d)

........................................................................................................................ ........................................................................................................................

1 + α1 ............................................. .............................. 1 − α1 ............... α2

ω

............................................. ωp1 ωs1

ωs2 ωp2

-

π

ω

Figure 3.11: Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c) Bandpass. (d) Stopband. This can be realised by

1 y(n) = y(n − 1) + x(n) 2 assuming IRC. Note that although h(n) is of infinite extent, the number of arithmetic operations required per output is only one multiplication and one addition using a LCCDE formulation.

3.5.1

Stability

An FIR is stable if h(n) is finite for all n, therefore stability is not a problem in practical design or implementation. With an IIR filter, all the poles of H(z) must lie inside the unit circle to ensure filter stability.

3.5. PROPERTIES OF IIR FILTERS

3.5.2

31

Linear phase

It is simple to achieve linear phase with FIR filters, to the point that we require all FIR filters to possess this property. It is much more difficult to control the phase of an IIR filter. As a result, we specify only the magnitude response for the IIR filter and accept the resulting phase characteristics.

32

CHAPTER 3. FILTER DESIGN

Chapter 4

Finite Impulse Response Filter Design 4.1

Introduction

The filter design problem is that of determining h(n), H(ejω ) or H(z) to meet the design specification. The first step is to choose the type of filter from the previously discussed four types of FIR filters. Some types of filters may not be suitable for a given design specification. For example, from Table 3.1 a Type II filter is not suitable for a high-pass filter design as it will always have H(ejω ) = 0 at ω = π since cos π(n − 1/2) = 0. Table 4.1 shows what type of filters are not suitable for a given class of filter. Table 4.1: Suitability of a given class of filter for the four FIR types Type I II III IV

Low-pass

Not suitable Not suitable

High-pass

Band-pass

Not suitable Not suitable

Band stop Not suitable Not suitable Not suitable

The two standard approaches for FIR filter design are: 1. The window method. 2. The optimal filter design method. We now discuss each of these approaches in turn.

4.2

Design of FIR Filters by Windowing

Many idealised systems are defined by piecewise-constant or piecewise-functional frequency responses with discontinuities at the boundaries between bands. As a result, they have non causal impulse responses that are infinitely long. The most straightforward approach for obtaining a causal FIR approximation to such systems is to truncate the ideal response. 33

34

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN Let Hd (ejω ) be the desired frequency response of the filter. Then Hd (ejω ) =

∞ X

hd (n)e−jωn ,

n=−∞

where hd (n) is the corresponding unit sample response sequence given by Z π 1 hd (n) = Hd (ejω )ejωn dω . 2π −π The simplest way to obtain a causal FIR filter from hd (n) is to define a new system with unit sample response  hd (n), 0 ≤ n ≤ N − 1 , h(n) = 0, elsewhere , which can be re-written as h(n) = hd (n) · w(n) , and therefore

1 H(e ) = 2π jω

H(ejω )

Z

π

−π

Hd (ejθ ) · W (ej(ω−θ) )dθ .

Note that represents a smeared version of Hd (ejω ). If hd (n) is symmetric (or antisymmetric) with respect to a certain point of symmetry (or anti-symmetry) and w(n) is also symmetric with respect to the same point of symmetry, then hd (n) · w(n) is the unit sample response of a linear phase filter. Example 4.2.1 Low-pass filter design with specifications α1 ,α2 , ωp , ωs . We choose a Type I filter for the design. Since the desired frequency response Hd (ejω ) should be consistent with the type of filter used, from Table 3.1, Hd (ejω ) is given by  −jωα e , 0 ≤ |ω| ≤ ωc , jω Hd (e ) = 0, ωc ≤ |ω| ≤ π where α = Hd (ejω ) is

N −1 2 ,

ωc =

ωp +ωs 2 ,

and N is the filter length (odd). The inverse Fourier transform of hd (n) =

sin ωc (n − α) π(n − α)

If we apply a rectangular window to hd (n), the resulting filter impulse response h(n) is ( sin ωc (n−α) , for 0 ≤ n ≤ N − 1 , π(n−α) h(n) = 0 , otherwise . The sequence h(n) is shown in Figure 4.1 for N = 7 and ωc = π/2. Note that this filter may or may not meet the design specification.

4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH

35

6 h(n)

0.5 . 0.318

0.318

-

n −0.106

-0.106

Figure 4.1: The sequence h(n).

4.3

Importance of the Window Shape and Length

In the time-domain we have h(n) = hd (n) · w(n) , which corresponds in the frequency domain to H(ejω ) = Hd (ejω ) ⋆ W (ejω ) Z π 1 Hd (ejθ )W (ej(ω−θ) )dθ = 2π −π

(4.1)

If we do not truncate at all so that w(n) = 1 for all n, W (ejω ) is a periodic impulse train with period 2π, and therefore H(ejω ) = Hd (ejω ). This interpretation suggests that if w(n) is chosen so that W (ejω ) is concentrated in a narrow band of frequencies around ω = 0, then H(ejω ) approximates Hd (ejω ) very closely except where Hd (ejω ) changes very abruptly. Thus, the choice of window is governed by the desire to have w(n) as short as possible in duration so as to minimise computation in the implementation of the filter while having W (ejω ) approximate an impulse; that is, we want W (ejω ) to be highly concentrated in frequency so that the convolution (4.1) faithfully reproduces the desired frequency response. These are conflicting requirements. Take, for example, the case of the rectangular window, where W (ejω ) =

N −1 X n=0

e−jωn =

1 − e−jωN sin(ωN/2) = e−jω(N −1)/2 . −jω 1−e sin(ω/2)

The magnitude of the function sin(ωN/2)/ sin(ω/2) is shown in Figure 4.2 for the case N = 8. Note that W (ejω ) for the rectangular window has a generalised linear phase. As N increases, the width of the “main-lobe” decreases. The main-lobe is usually defined as the region between the first zero crossing on either side of the origin. For the rectangular window, the main-lobe width is ∆ω = 4π/N . However, for the rectangular window, the side lobes are large and, in fact, as N increases, the peak amplitudes of the main-lobe and the side-lobes grow in a manner such that the area under each lobe is a constant even though the width of each lobe decreases with N . Consequently, as W (ej(ω−θ) ) “slides by” a discontinuity of Hd (ejω ) with increasing ω, the integral of W (ej(ω−θ) )Hd (ejω ) will oscillate as each side-lobe of W (ej(ω−θ) ) moves past the discontinuity.

36

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN 10 9

Mainlobe

8

Magnitude

7 6 5 4

Peak sidelobe

3 2 1 0

−2

0

2

4

6

8

w rad/s

Figure 4.2: Magnitude of the Fourier transform of a rectangular window (N = 8). We can modify this effect by tapering the window smoothly to zero at each end so that the height of the side-lobes are diminished; however this is achieved at the expense of a wider main-lobe and thus a wider transition at the discontinuity. Five common windows are used in practice: the rectangular, Bartlett, Hanning, Hamming and Blackman window. The formula for these windows are given below. Rectangular w(n) = Bartlett

Hanning



1, 0 ≤ n ≤ N − 1 , 0, elsewhere.

  

2n N −1 , 0≤n≤ , N −1 2 w(n) =  2 − 2n , N − 1 ≤ n ≤ N − 1 .  N −1 2   1 2πn w(n) = 1 − cos( ) , 0 ≤ n ≤ N − 1. 2 N −1

Hamming w(n) = 0.54 − 0.46 cos



2πn N −1



, 0 ≤ n ≤ N − 1.

Blackman w(n) = 0.42 − 0.5 cos



2πn N −1



+ 0.08 cos



4πn N −1



, 0 ≤ n ≤ N − 1.

Figure 4.3 compares these window functions for N = 200. A good choice of the window corresponds to choosing a window with small main-lobe width and small side-lobe amplitude. Clearly, the shape of a window affects both main-lobe and side-lobe behavior. Figures 4.4, 4.5, 4.6, 4.7 and 4.8 show these windows in the frequency domain for N = 51, using a 256-point FFT. For comparison, Figures 4.9 and 4.10 show |W (ejω )| of the Hamming

4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH

37

1.2

1 Rectangular 0.8

w(n)

Bartlett 0.6

Hamming

0.4 Blackman 0.2 Hanning 0 0

50

100 n

150

200

Figure 4.3: Commonly used windows for FIR filter design N = 200. window for N = 101 and 201, respectively. This shows that main-lobe behavior is affected primarily by window length. The effect of the shape and size of the window can be summarised as follows: 1. Window shape controls side-lobe as well as main-lobe behavior. 2. Window size controls main-lobe behavior only. Since side-lobe behavior and therefore passband and stop-band tolerance is affected only by the shape of the window, the passband and stop-band requirements dictate which window is chosen. The window size is then determined from the transition width requirements. Table 4.2 (column 1 and 2) contains information useful in choosing the shape and length of the window. Table 4.2: Properties of different windows.

Window Type Rectangular Bartlett Hanning Hamming Blackman

Peak Amplitude of Side-lobe (dB) (relative) -13 -25 -31 -41 -57

Example 4.3.1 Filter design problem.

Approximate Width (rad) of main-lobe ∆ωm 4π/N 8π/(N − 1) 8π/(N − 1) 8π/(N − 1) 12π/(N − 1)

Peak Approximation Error (dB) 20 log10 (min(α1 , α2 )) -21 -25 -44 -53 -74

38

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

0

-10

20 log|W|

-20

-30

-40

-50

-60 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.4: Fourier transform (magnitude) of rectangular window (N = 51).

0 -20 -40

20 log|W|

-60 -80 -100 -120 -140 -160 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.5: Fourier transform (magnitude) of Bartlett (triangular) window (N = 51).

0

-20

20 log|W|

-40

-60

-80

-100

-120

-140 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.6: Fourier transform (magnitude) of Hanning window (N = 51).

4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH

0 -10 -20

20 log|W|

-30 -40 -50 -60 -70 -80 -90 -100 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.7: Fourier transform (magnitude) of Hamming window (N = 51)

0

-20

20 log|W|

-40

-60

-80

-100

-120

-140 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.8: Fourier transform (magnitude) of Blackman window (N = 51).

0

-20

20 log|W|

-40

-60

-80

-100

-120 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.9: Fourier transform of a Hamming window with window length of N = 101.

39

40

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN 0

-20

20 log|W|

-40

-60

-80

-100

-120 0

0.5

1

1.5 2 w rad/s

2.5

3

3.5

Figure 4.10: Fourier transform of a Hamming window with window length of N = 201. We have to design a low-pass filter with a required stop-band attenuation of −50dB. Which filter is most appropriate? The third column of Table 4.2 contains information on the best stop-band (or passband) attenuation we can expect from a given window shape. Therefore, we can see that the rectangular, Bartlett and Hanning windows will not meet the specifications. The Hamming window may just meet the requirements, but the Blackman window will clearly be the most appropriate. From the second column of Table 4.2 the length of the window required can be derived. The distance between the peak ripples on either side of discontinuity is approximately equal to the main-lobe width ∆ωm . The transition width ∆ω = ωs − ωp is therefore somewhat less than the main-lobe width. For example, for the Blackman window, N can be determined from the relationship 12π/(N − 1) ≃ ∆ω. Therefore N ≈ 12π ∆ω + 1 points.

4.4

Kaiser Window Filter Design

We can quantify the trade-off between main-lobe width and side-lobe area by seeking the window function that is maximally concentrated around ω = 0 in the frequency domain. Kaiser proposed the window function   I0 [β(1 − [(n − α)/α]2 )1/2 ] , 0 ≤ n ≤ N − 1, w(n) = I0 (β)  0, elsewhere , where α = N 2−1 , and I0 (·) is the zeroth-order modified Bessel function of the first kind. In contrast to the previously discussed windows, this one has two parameters; the length N , and a shape parameter β. Thus, by varying N and β we can trade side-lobe amplitude for mainlobe width. Figure 4.11 shows the trade-off that can be achieved. If the window is tapered more, the side-lobes of the Fourier transform become smaller, but the main-lobe becomes wider. Alternatively, we can hold β constant and increase N , causing the main-lobe to decrease in width without affecting the amplitude of the side-lobes.

4.4. KAISER WINDOW FILTER DESIGN

41

1.2

1

Amplitude

0.8

0.6

0.4

0.2

0 0

2

4

6

8

10 Samples

12

14

16

18

20

(a) 0 −10 −20

20 log |W|

−30 −40 −50 −60 −70 −80 −90 −100 0

0.5

1

1.5 w rad/s

2

2.5

3

2

2.5

3

(b) 0 −10 −20

20 log |W|

−30 −40 −50 −60 −70 −80 −90 −100 0

0.5

1

1.5 w rad/s

(c) Figure 4.11: (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21. (b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed).

42

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

Kaiser obtained formula that permitted the filter designer to estimate the values of N and β which were needed to meet a given filter specification. Consider a frequency selective filter with the specifications shown in Figure 3.11(a). Kaiser found that over a usefully wide range of conditions α1 is determined by the choice of β. Given α1 , the passband cutoff frequency ωp of the low-pass filter is defined to be the highest frequency such that |H(ejω )| ≥ 1 − α1 . The stop-band cutoff frequency ωs is defined to be the lowest frequency such that |H(ejω )| ≤ α2 . The transition region width is therefore ∆ω = ωs − ωp for the low-pass filter approximation. With A being the negative peak approximation error (pae) A = −20 log10 min (α1 , α2 ) , Kaiser determined empirically that to achieve a specified value   0.1102(A − 8.7), 0.5842(A − 21)0.4 + 0.07886(A − 21), β=  0,

of A, we required that A > 50, 21 ≤ A ≤ 50, A < 21.

To achieve prescribed values of A and ∆ω, N must satisfy   A−8 N= +1 . 2.285∆ω With these formula no iterations or trial is required.

Example 4.4.1 Given the specifications of a low-pass filter ωp = 0.4π, ωs = 0.6π, α1 = α2 = 0.001 and ∆ω = 0.2π, find the unit sample response using Kaiser’s method. We find A = −20 log10 α1 = 60 and β = 5.635, N = 38 . Then, h(n) =

(

sin ωc (n−α) π(n−α)

0,

·

I0 [β(1−[(n−α)/α]2 )1/2 ] , I0 (β)

0 ≤ n ≤ N − 1, elsewhere

where α = N 2−1 = 37 2 = 18.5 and ωc = (ωs + ωp ) /2. Since N is an even integer, the resulting linear phase system would be of type II. Figure 4.12 shows h(n) and H(ejω ). Example 4.4.2 Design of a high-pass filter. In the following, we consider a high-pass filter with ωs = 0.35π ωp = 0.5π α1 = 0.021 Again using the formulae proposed by Kaiser, we get β = 2.6 and N = 25. Figure 4.13 shows h(n) and H(ejω ). The resulting peak error is 0.0213 > α1 = α2 = 0.021 as specified. Thus we increase N by 26. In this case the filter would be of type II and is not appropriate for designing a high-pass. Therefore we use N = 27, which exceeds the required specifications.

4.4. KAISER WINDOW FILTER DESIGN

43

0.6

0.5

0.4

Amplitude

0.3

0.2

0.1

0

−0.1

−0.2 0

5

10

15 20 25 Sample number (n)

30

35

40

(a) 20

0

20 log |H|

−20

−40

−60

−80

−100

−120 0

0.5

1

1.5 w rad/s

2

2.5

3

(b) Figure 4.12: Response functions. (a) Impulse response. (b) Log magnitude. The design of FIR filters by windowing is straightforward to apply and it is quite general even though it has a number of limitations. However, we often wish to design a filter that is the “best” that can be achieved for a given value of N . We describe two methods that attempt to improve upon the windowing method by introducing a new criterion of optimality.

44

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

0.7 0.6 0.5 0.4 Amplitude

0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 0

5

10

15 Sample number (n)

20

25

30

(a) 20

0

20 log |H|

−20

−40

−60

−80

−100 0

0.5

1

1.5 w rad/s

2

2.5

3

(b) Figure 4.13: Kaiser window estimate. (a) Impulse response. (b) Log magnitude.

4.5. OPTIMAL FILTER DESIGN.

4.5

45

Optimal Filter Design.

As seen, the design of a filter with the window method is straightforward and easy to handle but it is not optimal in the sense of minimizing the maximum error. Through the Fourier approximation, any filter design which uses windowing attempts to minimize the integrated error Z π 1 2 Hd (ejω ) − H(ejω ) 2 dω. ǫ = 2π −π

In many applications we get a better result by minimizing the maximum error. Using the windowing method and Fourier approximation we always get the Gibbs phenomenon which implies that the maximum error will be at discontinuities of Hd (ejω ). Better results can be achieved if the approximation error is spaced equally over the whole frequency band. In the following section only type I linear phase FIR filters are taken into account but the implementation of other linear phase FIR filters is straightforward. Take a non-causal FIR filter with the impulse response h(n) = h(−n), the frequency response is



H(e ) =

L X

h(n)e−jωn

n=−L

= h(0) +

L X

2h(n) cos(ωn)

n=1

=

=

L X

n=0 L X

n=0

=

L X

a(n) cos(ωn)

a(n)

"

n X

k

βkn cos(ω)

k=0

#

a ˇ(n) cos(ω)n

n=0

which is real and even. The filter specifications for Hd (ejω ) are shown in Fig. 4.14 To design the filter the Parks-McClellan algorithm formulates the problem as one of polynomial approximation which can be solved by the Chebyshev approximation over disjoint sets of frequencies (F = {0 ≤ ω ≤ ωp , ωs ≤ ω ≤ π}). The approximation error function, E(ω), is defined as   E(ω)=W ˆ (ω) Hd (ejω ) − H(ejω ) .

(4.2)

E(ω) measures the error between the optimum and attained frequency response, which is weighted by W (ω) where  α1 : 0 ≤ ω ≤ ωp α2 W (ω) = . 1 : ωs ≤ ω ≤ π

46

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

1 + α1 1 1 − α1

α2 −α2

ωp

ωs

Figure 4.14: Filter Specification The optimal H(ejω ) minimizes the maximum weighted error which means we have to search for an H(ejω ) which fulfills       min max |E(ω)| = min max W (ω) Hd (ejω ) − H(ejω ) . a ˇ(n) : 0≤n≤L

ω∈F

a ˇ(n) : 0≤n≤L

ω∈F

Theorem 4.5.1 (Alternation Theorem) With a closed disjoint subset F on the real axis a necessary and sufficient condition for jω

H(e ) =

L X

a ˇ(n) cos(ω)n

n=0

to be the unique, best weighted Chebyshev approximation of Hd (ejω ) in F is that the error function E(ω) has at least L + 2 alternating frequencies in F . This means that there must exist at least L + 2 frequencies ωi in F such that ω0 < ω1 < · · · < ωL+1 with E(ωi ) = −E(ωi+1 ) and |E(ωi )| = max |E(ω)|

i = 0, . . . , L + 1.

ω∈F

The maximum of the alternating frequencies can be found from the derivative of E(ω). If we calculate the derivative of E(ω) and take into account that Hd (ejω ) is constant in F , we get for the maximal frequencies dE(ω) dω

  d  W (ω) Hd (ejω ) − H(ejω ) dω dH(ejω ) W (ω) = − dω L X nˇ a(n) cos(ω)n−1 = 0. = W (ω) sin(ω)

=

(4.3)

n=1

This means that there are at most L + 1 maxima belonging to the polynomial structure and to the frequencies at ω = 0 and ω = π. Two more maxima will occur at ω = ωp and ω = ωs

4.5. OPTIMAL FILTER DESIGN.

47

because of the discontinuity and the inability to approximate this. As it can be seen in Eq. (4.3), the error function E(ω) and the approximation function H(ejω ) possess maxima at the same frequencies ωi . Thus from Theorem 4.5.1 we can write the error E(ω) at the ωi , i = 0 . . . L + 1, as   E(ωi ) = W (ωi ) Hd (ejωi ) − H(ejωi ) = (−1)i ρ

i = 0...L + 1

and can express the ideal frequency response as

L

jωi

Hd (e

jωi

) = H(e

(−1)i ρ (−1)i ρ X a(n) cos(ωi n) + )+ = W (ωi ) W (ωi ) n=0

or 

1 cos(ω0 ) · · ·  . .. ..  .. . .  1 cos(ωL+1 ) · · ·

cos(Lω0 ) .. . cos(LωL+1 )

1 W (ω0 )

.. .

(−1)L+1 W (ωL+1 )

    

   a0 Hd (ejω0 ) ..    .. .  = . . aL  jω L+1 Hd (e ) ρ

To solve these equations for the parameter an , we use an algorithm called ”Remez” which was developed by Rabiner et. al. in 1975. After an initial guess of L + 2 extremal frequencies of H(ejω ) we can compute ρ according to ρ=

γ0 Hd (ejω0 ) + . . . + γL+1 Hd (ejωL+1 ) γ0 W (ejω0 )

+ ... +

(−1)L+1 γL+1 W (ejωL+1 )

(4.4)

with L+1 Y

γk =

n=0, n6=k

1 . cos(ωk ) − cos(ωn )

Since E(ω) has the same extremal frequencies as H(ejω ) we can write i ˆ jωi ) = Hd (ejωi ) − (−1) ρ . H(e W (ωi )

Using Lagrange interpolation it follows that jω

H(e ) =

PL

βk =

L Y

with

βk ˆ jωi ) k=0 H(e cos(ω)−cos(ωk ) PL βk k=0 cos(ω)−cos(ωk )

n=0, n6=k

1 . cos(ωk ) − cos(ωn )

With H(ejω ) it is possible to calculate the error E(ω) according to Eq. (4.2). After calculating E(ω), L + 2 extremal frequencies belonging to the largest alternating peaks can be selected and the algorithm can start again by using Eq. (4.4). The Algorithm is summarised in Fig. 4.15.

48

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

By using the Remez algorithm we are able to find the δ that defines the upper bound of |E(ω)|, which is the optimum solution for the Chebyshev approximation. After the algorithm converges the filter coefficients h(n) can be calculated by taking L + 1 frequencies from H(ejω ) with L

h(n) =

2πkn 1 X H(eωn )ej L+1 . L+1

n=0

Property of Optimal Filters The error between the desired and attained filter frequency response achieve a minimum and maximum consecutively at a finite number of frequencies in the frequency range of interest (passband and stopband regions) called alternation frequencies. The minimum number of alternating frequencies for a Type I optimum filter is N 2+3 (with L = N 2−1 ). Because of this characteristic, optimal filters are called equi-ripple filters. Remarks 1. Designing an optimal filter requires much more computation than the windowing method. 2. The length of the filter designed by the window method is 20% to 50% longer than the length of the optimal filter. c 3. Optimal filter design is the most widely used FIR filter design method. In MATLAB the function that performs optimum filter design is called ”firpm”.

4.6

Implementation of FIR Filters

Here we discuss the implementation of a discrete time system with unit sample response h(n), where h(n) may be a filter. The input-output relationship is given by the convolution sum y(n) =

∞ X

k=−∞

h(k)x(n − k) =

N −1 X k=0

h(k)x(n − k) ,

i.e., the output y(n) represents a sum of weighted and delayed versions of the input x(n). The signal flow graph for a single delay element is shown in Figure 4.16. The convolution sum can be represented as in Figure 4.17. The computational efficiency of the algorithm in Figure 4.17 may be improved by exploiting the symmetry (or anti-symmetry) of h(n). For type I Filters, we have h(n) = h(N − 1 − n) and

4.6. IMPLEMENTATION OF FIR FILTERS

49

N is odd. This leads to y(n) =

N −1 X k=0

h(k) · x(n − k)

N −1 −1 2

X

=

k=0 N −1 X

+

h(k) · x(n − k) + h

k= N 2−1 +1 N −1 −1 2

y(n) =

X k=0

+ h





N −1 2

   N −1 x n− 2

h(k) · x(n − k)

h(k) · [x(n − k) + x(n − (N − 1 − k))]

N −1 2

   N −1 x n− 2

This realisation is shown by the flow graph in Figure 4.18. This method reduces the number of multiplications by 50% without affecting the number of additions or storage elements required.

50

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

Input filter parameters

? Initial guess of (L + 2) extremal frequencies ? Calculate the optimum ρ on extremal set ? Interpolate through (L + 1) points to obtain H(ejω ) ? Calculate error E(ω) and find local maxima where |E(ω)| ≥ ρ ?

More than (L + 2)

Yes

extrema?

No Changed

-

Retain (L + 2) largest extrema

 ?

Check whether the extremal points changed Unchanged ? Best approximation

Figure 4.15: Flowchart of Parks-McClellan algorithm.

4.6. IMPLEMENTATION OF FIR FILTERS

51

- z −1

x(n)

- y(n)

Figure 4.16: Signal flow graph for a delay element.

x(n) -

?h(0)

z −1

-

-

-

z −1

?h(1)

-

-

?h(2)

-

z −1

? h(N − 1) y(n)

-

Figure 4.17: Signal Flow graph representing the convolution sum.

-

x(n)

- z −1 

-

- z −1



-

- z −1



 ?

]

] 

?h(0)

y(n)



z −1 

h(1) ? 

] 

]

z −1 

?h(2)



z −1 

h( N 2−3 ) ?



Figure 4.18: Efficient signal flow graph for the realisation of a type I FIR filter.

h( N 2−1 ) ?

52

CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

Chapter 5

Design of Infinite Impulse Response Filters 5.1

Introduction

When designing IIR filters one must determine a rational H(z) which meets a given magnitude specification. The two different approaches are : 1. Design a digital filter from an analog or continuous-time filter. 2. Design the filter directly. Only the first approach is considered since it is much more useful. It consists of the following four steps: Step 1 : Specification of the digital filter. Step 2 : Translation to a continuous-time filter specification. Step 3 : Determination of the continuous-time filter system function Hc (s). Step 4 : Transformation of Hc (s) to a digital filter system function H(z). Remarks 1. Step 2 depends on Step 4 since translation of the digital filter specification to a continuoustime filter specification depends on how the continuous-time filter is transformed back to the digital domain. 2. It is desirable for Step 4 to always transform a causal and stable Hc (s) to a causal and stable H(z), and also that the transformation results in a digital filter with a rational H(z) when the continuous-time filter has a rational Hc (s). 3. The continuous-time filter frequency response Hc (jΩ) = Hc (s)|s=jΩ should map the digital filter frequency response H(ejω ) = H(z = ejω ). Typically Butterworth, Chebyshev and elliptic filters are used as continuous-time prototypes. The impulse invariance method and bilinear transformation are two approaches that satisfy all the above properties, and are often used in designing IIR filters. We discuss their implementation in the following sections. 53

54

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS

5.2

Impulse Invariance Method

In the impulse invariance method, we obtain the unit sample response of the digital filter by sampling the impulse response of the continuous-time filter: 1. From Hc (s) and Hc (jΩ), we determine hc (t). 2. The sequence h(n) is obtained by h(n) = Td · hc (nTd ), where Td is the design sampling interval which does not need to be the same as the sampling period T associated with A/D and D/A conversion. 3. H(z) is obtained from h(n). Hc (s)

H(z)

hc (t)

h(n)

Example 5.2.1 A continuous-time filter is described by Hc (s) =

1 . s − sc

The inverse Laplace transform is hc (t) = esc t · u(t) . Therefore, h(n) = hc (t)|t=nTd = esc n · u(n) assuming Td = 1, which leads to H(z) =

1 1 − esc z −1

In general, to transform Hc (s) to H(z), Hc (s) is first expressed as a linear combination of 1 system functions of the form (s−s , which is then transformed to give H(z), i.e. perform a c) partial fraction expansion and transform each component. If Hc (s) has multiple poles at s = sc , Hc (s) would have a component of the form (s−s1 c )p with p > 1 . Its z-transform can be obtained by following the same steps as the above example. In previous chapters, we have seen that   ∞ X ω 2π jω Hc j H(e ) = +j k . (5.1) Td Td k=−∞

So, H(ejω ) is not simply related to Hc (jΩ) because of aliasing. If there were no aliasing effect,   ω jω for − π ≤ ω ≤ π , (5.2) H(e ) = Hc j Td and the shape of H(ejω ) would be the same as Hc (jΩ). Note that in practice Hc (jΩ) is not band limited and therefore aliasing is present which involves some difficulty in the design. In practice, we suppose that the relation H(ejω ) = Hc (j Tωd ) holds over [−π, π) and we carry out

5.2. IMPULSE INVARIANCE METHOD

55

the remaining steps, instead of using (5.1). Note that the discrete-time system frequency ω is related to the continuous-time system frequency Ω by ω = ΩTd . Because of the approximation, the resulting filter may not satisfy the digital filter specifications and some iterations may be needed. Also due to aliasing, it is difficult to design band stop or high-pass filters that have no significant aliasing. In the impulse invariance design procedure, we transform the continuous-time filter specifications through the use of Eq. (5.2) to the discrete-time filter specifications. We illustrate this in the following example. Example 5.2.2 Assume that we have the low-pass filter specifications as shown in Figure 5.1 with α1 = 0.10875, α2 = 0.17783, ωp = 0.2π, Td = 1 and ωs = 0.3π. In this case the passband is between 1 − α1 and 1 rather than between 1 − α1 and 1 + α1 . Historically, most continuousjω 6|H(e )|

1 .............................. 1 − α1 ..............................

.................................

α2

-

ωp

ωs

π

ω

Figure 5.1: Low-pass filter specifications. time filter approximation methods were developed for the design of passive systems (having gain less than 1). Therefore, the design specifications for continuous time filters have typically been formulated so that the passband is less than or equal to 1. We will design a filter by applying impulse invariance to an appropriate Butterworth continuoustime filter. A Butterworth low-pass filter is defined by the property that the magnitude-squared function is 1 |Hc (jΩ)|2 = 1 + (Ω/Ωc )2N where N is an integer corresponding to the order of the filter. The magnitude response of a Butterworth filter of N th order is maximally flat in the passband, i.e., the first (2N − 1) derivatives of the magnitude-squared function are zero at Ω = 0. An example of this characteristic is shown in Figure 5.2. The first step is to transform the discrete-time specifications to continuous-time filter specifications. Assuming that the aliasing involved in the transformation from Hc (jΩ) to H(ejω ) will be negligible, we obtain the specification on Hc (jΩ) by applying the relation Ω = ω/Td to obtain the continuous-time filter specifications shown in Figure 5.3. The Butterworth filter should have the following specifications, 1 − α1 = 0.89125 ≤ |Hc (jΩ)| ≤ 1 , 0 ≤ |Ω| ≤ 0.2π , and |Hc (jΩ)| ≤ 0.17783 = α2 , 0.3π ≤ |Ω| ≤ π .

56

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS 1 0.9 0.8 0.7

Magnitude

0.6 N=2 0.5 N=4

0.4 0.3

N=8

0.2 0.1 0 0

1

2

3 Frequency

4

5

6

Figure 5.2: Magnitude transfer function Hc (jΩ). 6|Hc (jΩ)|

1 1 − α1

α2 0

ωp Td

ωs Td

π Td

-



Figure 5.3: Filter specifications for Example 5.2.2. Because |Hc (jΩ)| is a monotonic function of Ω, the specifications will be satisfied if |Hc (j0.2π)| ≥ 0.89125 and |Hc (j0.3π)| ≤ 0.17783 Now, we determine N and Ωc , which are parameters of |Hc (jΩ)|2 =

1 , 1 + (Ω/Ωc )2N

to meet the specifications. Thus, we have 1+



0.2π Ωc

2N

=



1 0.89125

2

5.3. BILINEAR TRANSFORMATION and 1+



0.3π Ωc

57 2N

=



1 0.17783

2

which leads to N = 5.8858 and Ωc = 0.70474. Rounding N up to 6, we find Ωc = 0.7032. With these values, the passband specification will be met exactly and the stopband specifications will be exceeded, reducing the potential of aliasing. With N = 6, the 12 poles of Hc (s)Hc (−s) = 1/[1 + (s/jΩc )2N ] are uniformly distributed around a circle of radius ωc = 0.7032. Consequently, the poles of Hc (s) are the 3 pole pairs in the left half of the s-plane with coordinates s1 = −0.182 ± j0.679

s2 = −0.497 ± j0.497 s3 = −0.679 ± j0.182 so that Hc (s) =

0.12093 . (s2 + 0.364s + 0.4942)(s2 + 0.994s + 0.494)(s2 + 1.358s + 0.4542)

If we express Hc (s) as a partial fraction expansion, we transform

1 (s−sk )

into

1 (1−esk z −1 )

and get

H(z) =

0.2871 − 0.4466z −1 −2.1428 + 1.1455z −1 1.8557 − 0.6303z −1 + + . 1 − 1.2971z −1 + 0.6949z −2 1 − 1.0691z −1 + 0.3699z −2 1 − 0.9972z −1 + 0.2570z −2

5.3

Bilinear Transformation

The other main approach to the design problem is the bilinear transformation. With this approach, H(z) is obtained directly from the continuous-time frequency response Hc (s) by replacing s by   2 1 − z −1 s= Td 1 + z −1

so that

H(z) = Hc



2 Td



1 − z −1 1 + z −1



.

A sample on the unit circle, z = ejω , corresponds to the point s=

2 2 1 − e−jω = j tan(ω/2). · −jω Td 1 + e Td

The bilinear transformation transforms the imaginary axis of the s-plane to the unit circle as shown in Figure 5.4. The inverse transform is given by z=

2 Td 2 Td

+s −s

=

1 + σTd /2 + jΩTd /2 1 + (Td /2)s = . 1 − (Td /2)s 1 − σTd /2 − jΩTd /2

The region ℜ{s} < 0 is transformed to |z| < 1, i.e. the left-hand half plane ℜ{s} < 0 of the s-plane is mapped inside the unit circle. This means the bilinear transformation will preserve the stability of systems.

58

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS 6jΩ

s-plane

z-plane

6ℑ

Image of s = jΩ (unit circle) -

-

σ





Image of left half-plane

Figure 5.4: Mapping of the s-plane onto the z-plane using the bilinear transformation In the definition of the transformation, T2d is a scaling factor. It controls the deformation of the frequency axis when we use the bilinear transformation to obtain a digital transfer function H(z) from the continuous-time transfer function Hc (s). From s = j T2d tan(ω/2) = σ + jΩ, it follows that 2 Ω= tan(ω/2) and σ = 0 . Td For low frequencies, the deformation between Ω and ω is very small,i.e. for small ω, we can assume Ω ≈ ω2 . This was the criterion for choosing 2/Td as a scaling factor. This is shown in Figure 5.5. The choice of another criterion will yield another scaling factor. 4

3

Discrete−time frequency

2

1

0

−1

−2

−3

−4 −20

−15

−10

−5 0 5 Continuous−time frequency

10

15

20

Figure 5.5: Mapping of the continuous-time frequency axis onto the unit circle by bilinear transformation (Td = 1).

Remarks 1. This transformation is the most used in practice. It calculates digital filters from known continuous-time filter responses. 2. By taking into account the frequency deformation, Ω=

ω 2 tan , Td 2

5.3. BILINEAR TRANSFORMATION

59

the detailed shape of Hc (jΩ) is not preserved. 3. The delay time is also modified τd = τc [1 + ((ωc Td )/2)2 ] . If the continuous-time filter has a constant delay time, the resulting digital filter will not have this property. 4. To design a digital filter using this method one must first transform the discrete time filter specifications to the continuous time specifications, design a continuous-time filter with these new requirements and then apply the bilinear transformation. Example 5.3.1 Consider the discrete-time filter specifications of Example 5.2.2, where we illustrated the impulse invariance technique for the design of a discrete-time filter. In carrying out the design using the bilinear transformation, the critical frequencies of the discrete-time filter must be prewarped to the corresponding continuous-time frequencies using Ω = (2/Td ) tan(ω/2) so that the frequency distortion inherent in bilinear transformation will map them back to the correct discrete-time critical frequencies. For this specific filter, with |Hc (jΩ)| representing the magnitude response function of the continuous-time filter, we require that   2 0.2π 0.89125 ≤ |Hc (jΩ)| ≤ 1, 0 ≤ |Ω| ≤ tan , Td 2 2 tan Td

|Hc (jΩ)| ≤ 0.17783,



0.3π 2



≤ |Ω| ≤ ∞.

For convenience we choose Td = 1. Also, as with Example 5.2.2, since a continuous-time Butterworth filter has monotonic magnitude response, we can equivalently require that |Hc (j2 tan(0.1π))| ≥ 0.89125 and |Hc (j2 tan(0.15π))| ≤ 0.17783. The form of the magnitude-squared function for the Butterworth filter is 1 . 1 + (Ω/Ωc )2N

|Hc (jΩ)|2 =

Solving for N and Ωc with the equality sign we obtain 1+ and 1+





2 tan(0.1π) Ωc

2N

2 tan(0.15π) Ωc

2N

=



=



1 0.89125

2

(5.3)

2

(5.4)

1 0.17783

and solving for N gives N=

1 1 )2 − 1)/(( 0.89125 )2 − 1)] log[(( 0.17783 = 5.30466. 2 log[tan(0.15π)/ tan(0.1π)]

60

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS

Since N must be an integer, we choose N = 6. Substituting this N into Eq. ( 5.4) we obtain Ωc = 0.76622. For this value of Ωc , the passband specifications are exceeded and stop band specifications are met exactly. This is reasonable for the bilinear transformation since we do not have to be concerned with aliasing. That is, with proper prewarping, we can be certain that the resulting discrete-time filter will meet the specifications exactly at the desired stopband edge. In the s-plane, the 12 poles of the magnitude-squared function are uniformly distributed in angle on a circle of radius 0.7662. The system function of the continuous-time filter obtained by selecting the left half-plane poles is Hc (s) =

(s2

+ 0.3966s +

0.5871)(s2

0.20238 . + 1.0836s + 0.5871)(s2 + 1.4802s + 0.5871)

We then obtain the system function H(z) for the discrete-time filter by applying the bilinear transformation to Hc (s) with Td = 1, H(z) =

0.0007378(1 + z −1 )6 . (1 − 1.268z −1 + 0.7051z −2 )(1 − 1.0106z −1 + 0.3583z −2 )(1 − 0.9044z −1 + 0.2155z −2 )

Since the bilinear transformation maps the entire jΩ-axis of the s-plane onto the unit circle, the magnitude response of the discrete-time filter falls off much more rapidly than the original continuous-time filter, i.e. the entire imaginary axis of the s-plane is compressed onto the unit circle. Hence, the behaviour of H(ejω ) at ω = π corresponds to the behaviour of Hc (jΩ) at Ω = ∞. Therefore, the discrete-time filter will have a sixth-order zero at z = −1 since the continuous-time Butterworth filter has a sixth-order zero at s = ∞. Note that as with impulse invariance, the parameter Td is of no consequence in the design procedure since we assume that the design problem always begins with discrete-time filters specifications. When these specifications are mapped to continuous-time specifications and then the continuous-time filter is mapped back to a discrete-time filter, the effect of Td is cancelled. Although we retained the parameter Td in our discussion, in specific problems any convenient value of Td can be chosen.

5.4

Implementation of IIR Filters

An IIR filter has an infinite extent and cannot be implemented by direct convolution in practice. For this reason, we require the IIR filter to have a rational z-transform, as we have seen that this can be realised by a difference equation.

5.4.1

Direct Form I

The difference equation that describes the system is given by y(n) =

M X k=1

ak y(n − k) +

N −1 X r=0

br x(n − r)

This can be realized PN −1in a straightforward manner PM as depicted in Figure 5.6 which represents the pair v(n) = r=0 br x(n − r) and y(n) = k=1 ak y(n − k) + v(n) for (N − 1) = M . This

5.4. IMPLEMENTATION OF IIR FILTERS

61

representation can be viewed as an implementation of H(z) through the decomposition       1   H(z) = H2 (z)H1 (z) =   M   X 1 − ak z −k 

N −1 X

br z

−r

r=0

!

.

k=1

Herein we have represented the delay element - z −1

x(n)

-

b0

-

z −1 ? x(n − 1)

x(n − N + 2)

z −1

-

by

v(n)

-

6

-

-

a1

b1



6

6

a2

bN −2

aN −2

-





z −1 ? x(n − N + 1)

y(n − 1) z ?

b2

-

-

aN −1 

−1

y(n − 2)

y(n − N + 2) z ?

bN −1

y(n)

? z −1

6

-

z −1 ? x(n − 2)

-

−1

y(n − N + 1)

Figure 5.6: Signal flow graph of direct form I structure for an (N − 1)th-order system.

5.4.2

Direct Form II

The previous system can also be viewed as a cascade of two systems whose system functions are given by H1 (z) and H2 (z). Since H1 (z)H2 (z) = H(z) = H2 (z)H1 (z) , we can change the order of H1 (z) and H2 (z) without affecting the overall system function. This is because the delay elements are redundant and can be combined. We then obtain the direct form II structure. The advantage of the second form lies in the reduction in the number of storage elements required, while the number of arithmetic operations required is constant. The flow graph of direct form II structure is depicted in Figure 5.7

62

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS w(n) b0 x(n) y(n) ? z −1

6

a1

6

b1



-

z ?

6

−1

a2

6

b2



-

aN −2

bN −2



-

z ?

−1

aN −1

bN −1



-

Figure 5.7: Signal flow graph of direct form II structure for an N − 1th-order system.

5.4.3

Cascade form

The rational system function

H(z) =

N −1 X

r=0 M X

1− can be expressed in the general form,

br z −r

H(z) = A

ak z −k

k=1

Y

Hk (z)

k

where Hk (z) is given by : Hk (z) =

1 + β1k z −1 + β2k z −2 1 − α1k z −1 − α2k z −2

The constants A, β1k , β2k , α1k and α2k are real, since h(n) is real. The previous equations have shown that H(z) can be represented by a cascade of second order sections Hk (z). The cascade form is less sensitive to coefficient quantisation. If we change the branch transmittance in the two direct forms, the poles and zeroes of the system function will be affected. In the cascade form, however, the change in one transmittance branch will affect the poles or zeroes only in the second order section where the branch transmittance is affected. Figure 5.8 gives the cascade structure for a 6th-order system with a direct form II realisation of each second-order subsystem.

5.5. A COMPARISON OF FIR AND IIR DIGITAL FILTERS w1 (n) y1 (n) w2 (n) y2 (n) w3 (n) β01 β02 β03 - R - R - R x(n) 6 α11 

α21



? z −1

β11

-

? z −1

β21

-

6

6 α12 

α22



? z −1

β12

-

? z −1

β22

-

6

6 α13 

α23



? z −1

β13

-

63 y3 (n) -

y(n)

6

? z −1

β23

-

Figure 5.8: Cascade structure for a 6th-order system with a direct form II realisation of each second-order subsystem.

5.5

A Comparison of FIR and IIR Digital Filters

After having discussed a wide range of methods for the design of both infinite and finite duration impulse response filters several questions naturally arise: What type of system is best, IIR or FIR? Which methods yields the best results? The answer is neither. We have discussed a wide range of methods for both IIR and FIR filter because no single type of filter nor single design method is best for all circumstances. The choice between an FIR filter and an IIR filter depends upon the relative weight that one attaches to the advantages and disadvantages of each type of filter. For example, IIR filters have the advantage that a variety of frequency selective filters can be designed using closed-form design formulae. That is, once the problem has been defined in terms of appropriate specifications for a given kind of filter (e.g., Butterworth, Chebyshev, or elliptic), then the coefficients (or poles and zeros) of the desired digital filter are obtained by straightforward substitution into a set of design equations. This kind of simplicity in the design procedure is attractive if only a few filters are to be designed or if limited computational facilities are available. In the case of FIR filters, closed-form design equations do not exist. While the window method can be applied in a straightforward manner, some iteration may be necessary to meet a prescribed specification. Most of the other FIR design methods are iterative procedures which require relatively powerful computational facilities for their implementation. In contrast, it is often possible to design frequency selective IIR digital filters using only a hand calculator and tables of continuous-time filter design parameters. A drawback of IIR filters is that the closedform IIR designs are preliminary limited to low-pass, band-pass, and high-pass filters, etc. Furthermore, these designs generally disregard the phase response of the filter. For example, with a relatively simple computational procedure we may obtain excellent amplitude response characteristics with an elliptic low-pass filter while the phase response will be very nonlinear (especially at the band edge). In contrast, FIR filters can have precise linear phase. Also, the window method and most algorithmic methods afford the possibility of approximating more arbitrary frequency response characteristics with little more difficulty than is encountered in the design of low-pass filters. Also, it appears that the design problem for FIR filters is much more under control than the IIR design problem because there is an optimality theorem for FIR filters that is meaningful in a wide range of practical situations. Finally, there are questions of economics in implementing a digital filter. Concerns of this type are usually measured in terms of hardware complexity or computational speed. Both these factors are strongly related to the filter order required to meet a given specification. Putting

64

CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS

aside phase considerations, a given amplitude response specification will, in general, be met most efficiently with an IIR filter. However, in many cases, the linear phase available with an FIR filter may be well worth the extra cost. Thus a multitude of tradeoffs must be considered in designing a digital filter. Clearly, the final choice will most often be made in terms of engineering judgement on such questions as the formulation of specifications, the method of implementation, and computational facilities available for carrying out the design.

Chapter 6

Introduction to Digital Spectral Analysis 6.1

Why Spectral Analysis?

There are mainly three principal reasons for spectral analysis. 1. To provide useful descriptive statistics. 2. As a diagnostic tool to indicate which further analysis method might be relevant. 3. To check postulated theoretical models.

6.2

Applications

We encounter many areas where spectral analysis is needed. A few examples are given below. Physics. In spectroscopy we wish to investigate the distribution of power in a radiation field as a function of frequency (wavelength). Consider, for example, the situation shown in Figure 6.1 where a Michelson interferometer is used to measure the spectrum of a light source. Let S(t) be a zero-mean stochastic stationary process which may be complex. It can be seen that X(t) = A (S(t − t0 ) + S(t − t0 − τ )) and V (t) = |X(t)|2 = A2 |S(t − t0 ) + S(t − t0 − τ )|2 .

Using the notation t˜ for t − t0 , we get   V (t) = A2 S(t˜)S(t˜)∗ + S(t˜)S(t˜ − τ )∗ + S(t˜ − τ )S(t˜)∗ + S(t˜ − τ )S(t˜ − τ )∗ .

After low-pass filtering, we obtain (ideally)

w(τ ) = E [V (t)] = A2 [cSS (0) + cSS (τ ) + cSS (−τ ) + cSS (0)] = A2 [2cSS (0) + 2ℜ{cSS (τ )}] Performing a Fourier transform leads to W (ejω ) = F{w(τ )} = A2 (2cSS (0)δ(ejω ) + CSS (ejω ) + CSS (e−jω )) 65

66

CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS

11111 00000 00000 11111 00000 11111 2

Fixed mirror

Beam splitter

S(t)

Moveable mirror

1 1

X(t)

111 000 d=0

AS(t − t0 )

AS(t − t0 − τ )

τ = 2d/c Detector V (t) Low-pass filter w(τ ) Fourier transform W (ejω )

Figure 6.1: A Michelson Interferometer.

Thus, the output of a Michelson interferometer is proportional to the spectrum of the light source. One of the objectives in spectroscopic interferometry is to get a good estimate of the spectrum of S(t) without moving the mirror a large d. Electrical Engineering. Measurement of the power in various frequency bands of some electromagnetic signal of interest. In radar there is the problem of signal detection and interpretation. Acoustics. The power spectrum plays the role of a descriptive statistic. We may use the spectrum to calculate the direction of arrival (DOA) of an incoming wave or to characterise the source. Geophysics. Earthquakes give rise to a number of different waves such as pressure waves, shear waves and surface waves. These waves are received by an array of sensors. Even for a

6.2. APPLICATIONS

67

100km distance between the earthquake and the array, these types of waves can arrive at almost the same time at the array. These waves must be characterised. Mechanical Engineering. Detection of abnormal combustion in a spark ignition engine. The power spectrum is a descriptive statistic to test if a combustion cycle is a knocking one. Medicine. EEG (electroencephalogram) and ECG (electrocardiogram) analysis. It is well known that the adult brain exhibits several major modes of oscillation, the level of brain activity in these various frequency bands may be used as a diagnostic tool.

68

CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS

Chapter 7

Random Variables and Stochastic Processes 7.1

Random Variables

A random variable is a number X(ζ) assigned to every outcome ζ ∈ S of an experiment. This could be the gain in a game of chance, the voltage of a random source, or the cost of a random component. Given ζ ∈ S, the mapping ζ → X(ζ) ∈ R is a random variable, if for all x ∈ R the set {ζ : X(ζ) ≤ x} is also an event {X ≤ x} := {ζ : X(ζ) ≤ x} with probability P ({X ≤ x}) := P ({ζ : X(ζ) ≤ x})

7.1.1

Distribution Functions

The mapping x ∈ R → FX (x) = P ({X ≤ x}) is called the probability distribution function of the random variable X.

7.1.2

Properties

The probability distribution function has the following properties 0 ≤ FX (x) ≤ 1

1.

FX (∞) = 1 FX (−∞) = 0 2. FX (x) is continuous from the right i.e., lim FX (x + ǫ) = FX (x)

ǫ→0

3. If x1 < x2 then FX (x1 ) ≤ FX (x2 ), i.e., FX (x) is a non-decreasing function of x P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) ≥ 0 69

70

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES 4. The probability density function is defined as: fX (x) :=

dFX (x) dx

and if it exists has the property fX (x) ≥ 0.

To find FX (x), we integrate fX (x), i.e.,

P ({X ≤ x}) = FX (x) = and because FX (∞) = 1,

Z

Further,



Z

x

fX (u)du −∞

fX (x)dx = 1.

−∞

P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) =

Z

x2

fX (x) dx x1

5. For a continuous variable X, P(x1 < X < x2 ) = P(x1 < X ≤ x2 ) because P({x}) = P ((−∞, x]) − P ((−∞, x)) = FX (x) − FX (x − 0) = 0

7.1.3

Uniform Distribution

X is said to be uniformly distributed on [a, b] if it admits the probability density function 1 [u(x − a) − u(x − b)] , a < b b−a and probability distribution function Z x 1 [(x − a)u(x − a) − (x − b)u(x − b)] fX (u)du = FX (x) = b − a −∞ fX (x) =

Both FX (x) and fX (x) are displayed in Figure 7.1.

7.1.4

Exponential Distribution

Let X be exponentially distributed with parameter λ. Then, fX (x) = λe−λx u(x), and where

λ>0

  FX (x) = 1 − e−λx u(x), u(x) =



1, x ≥ 0 0, x < 0.

The exponential distribution is useful in many applications in engineering, for example, to describe the life-time X of a transistor. The breakdown rate of a transistor is defined by lim

∆→0+

and is given by

1 P {x < X ≤ x + ∆} ∆ P {x < X}

fX (x) = λ = constant 1 − FX (x)

7.1. RANDOM VARIABLES

71 6fx (x) 1 b−a

-

a

b

x

61

-

a

b

x

Figure 7.1: Probability function.

7.1.5

Normal Distribution

 X is normally distributed with mean µ and variance σ 2 X ∼ N (µ, σ 2 ) if 1 2 1 fX (x) = √ e− 2σ2 (x−µ) σ 2 > 0, −∞ < µ < ∞ σ 2π

A closed-form expression for FX (x) does not exist. For µ = 0, σ 2 = 1, X is said to be standard normally distributed, and its distribution function is denoted by ΦX (x), i.e., we have   x−µ FX (x) = ΦX σ R∞ Example 7.1.1 Show that for the normal distribution −∞ fX (x) dx = 1. Z ∞ Z ∞ 1 2 1 √ e− 2σ2 (x−µ) dx fX (x) dx = −∞ σ 2π −∞ If we let s =

Because

x−µ σ ,

− 12 x2

√1 e 2π

then

Z



fX (x)dx =

Z



−∞

−∞

> 0, we calculate Z

∞ −∞

1 2 1 √ e− 2 x dx 2π

2

= = = =

Z

Z



Z0 ∞ 0

p x2 + y 2 and ϕ = arctan

y x .

Z



−∞ −∞ 2π Z ∞

Z0 ∞

= 1 where r =

1 2 1 √ e− 2 s ds 2π

0

1 − 1 (x2 +y2 ) e 2 dx dy 2π 1 − 1 r2 e 2 r dr dϕ 2π

1 2

e− 2 r r dr e−s ds

72

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Example 7.1.2 A random variable X ∼ N (1000, 50). Find the probability that X is between 900 and 1050. We have P({900 ≤ X ≤ 1050}) = FX (1050) − FX (900) Numerical values of ΦX (x) over a range of x are shown in Table 1. Using Table 1, we conclude that P({900 ≤ X ≤ 1050}) = ΦX (1) + ΦX (2) − 1 = 0.819. since ΦX (−x) = 1 − ΦX (x). Often we define the following as the error-function. Note that erf(x) = ΦX (x) − there are other definitions in the literature. Z x 1 1 2 e−x /2 du = ΦX (x) − erf(x) = √ 2 2π 0

1 2

though

Table 1: Numerical values for erf(x) function.

7.1.6

x 0.05 0.10 0.15 0.20 0.25

erf(x) 0.01994 0.03983 0.05962 0.07926 0.09871

x 0.80 0.85 0.90 0.95 1.00

erf(x) 0.28814 0.30234 0.31594 0.32894 0.34134

x 1.55 1.60 1.65 1.70 1.75

erf(x) 0.43943 0.44520 0.45053 0.45543 0.45994

x 2.30 2.35 2.40 2.45 2.50

erf(x) 0.48928 0.49061 0.49180 0.49286 0.49379

0.30 0.35 0.40 0.45 0.50

0.11791 0.13683 0.15542 0.17364 0.19146

1.05 1.10 1.15 1.20 1.25

0.35314 0.36433 0.37493 0.38493 0.39495

1.80 1.85 1.90 1.95 2.00

0.46407 0.46784 0.47128 0.47441 0.47726

2.55 2.60 2.65 2.70 2.75

0.49461 0.49534 0.49597 0.49653 0.49702

0.55 0.60 0.65 0.70 0.75

0.20884 0.22575 0.24215 0.25804 0.27337

1.30 1.35 1.40 1.45 1.50

0.40320 0.41149 0.41924 0.42647 0.43319

2.05 2.10 2.15 2.20 2.25

0.47982 0.48214 0.48422 0.48610 0.48778

2.80 2.85 2.90 2.95 3.00

0.49744 0.49781 0.48813 0.49841 0.49865

χ2 Distribution

P Let X = νk=1 Yk2 be a random variable where Yk ∼ N (0, 1). X is then χ2 distributed with ν degrees of freedom. The probability density function of a χ2ν random variable is displayed in Figure 7.2 for ν = 1, 2, 3, 5 and 10. The probability density function as ν

x x 2 −1 fX (x) = ν ν e− 2 u(x) , 2 2 Γ( 2 )

where Γ(x) is the gamma function, Γ(x) =

Z

0



ξ x−1 e−ξ dξ .

7.1. RANDOM VARIABLES

73

ν=1 0.5

0.4

χ2ν

ν=2 0.3

ν=3 0.2

ν=5 ν = 10

0.1

0

0

1

2

3

4

5

6

7

8

9

10

x Figure 7.2: χ2ν distribution with ν = 1, 2, 3, 5, 10 The mean of X is E [X] = ν and the variance is 2 σX = 2ν .

7.1.7

Multiple Random Variables

The probability of two events A = {X1 ≤ x1 } and B = {X2 ≤ x2 } have already been defined as functions of x and y, respectively called probability distribution functions FX1 (x1 ) = P({X1 ≤ x1 })

FX2 (x2 ) = P({X2 ≤ x2 })

We must introduce a new concept to include the probability of the joint event {X1 ≤ x1 , X2 ≤ x2 } = {(X1 , X2 ) ∈ D}, where x1 and x2 are two arbitrary real numbers and D is the quadrant shown in Figure 7.3. The joint distribution FX1 X2 (x1 , x2 ) of two random variables X1 and X2 is the probability of the event {X1 ≤ x1 , X2 ≤ x2 }, FX1 X2 (x1 , x2 ) = P ({X1 ≤ x1 , X2 ≤ x2 }) . For n random variables X1 , . . . , Xn the joint distribution function is denoted by FX1 X2 ...Xn (x1 , . . . , xn ) = P ({X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn })

74

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES X2 6

x2 D

- X1

x1

Figure 7.3: The quadrant D of two arbitrary real numbers x and y.

7.1.8

Properties

The multivariate probability distribution function has the following properties. 1) lim

x1 →∞,...,xn →∞

FX1 X2 ...Xn (x1 , . . . , xn ) = F (∞, ∞, . . . , ∞) = 1

lim FX1 X2 ...Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn )

xi →−∞

= Fx1 ,x2 ,...,xn (x1 , . . . , xi−1 , −∞, xi+1 , . . . , xn ) = 0

2) F is continuous from the right in each component xi when the remaining components are fixed. 3) F is a non decreasing function of each component xi when the remaining components are fixed. 4) For the nth difference ∆ba F =

X

(−1)[n−

Pn

i=1

ki ]

k∈{0,1}n

1 n )≥0 F (bk11 a1−k , . . . , bknn a1−k n 1

where a = (a1 , . . . , an )T , b = (b1 , . . . , bn )T with ai < bi (i = 1, . . . , n) where T denotes transpose. For n = 1, ∆ba F is equivalent to F (b) − F (a), and for n = 2, ∆ba F is equivalent to F (b1 , b2 ) − F (b1 , a2 ) − F (a1 , b2 ) + F (a1 , a2 ). To show that (1)–(3) are not sufficient to define a probability, one should discuss the function FX1 X2 (x1 , x2 ), defined as:  0, x1 + x2 < 0, FX1 X2 (x1 , x2 ) = 1, elsewhere. and is depicted in Figure 7.4. Suppose that in a combined experiment the result of an experiment i depends on the first to (i − 1)th experiments. With the density function fX1 (x1 ) for the first experiment one constructs the density function of the total experiment fX1 ...Xn (x1 , . . . , xn ) = fX1 (x1 ) = fX1 (x1 )

n Y

i=2 n Y i=2

fXi (xi |x1 , . . . , xi−1 ) fXi (xi |X1 = x1 , . . . , Xi−1 = xi−1 )

7.1. RANDOM VARIABLES

75

x2 6

FX1 ,X2 (x1 , x2 ) = 1 - x1

FX1 ,X2 (x1 , x2 ) = 0

Figure 7.4: Bivariate probability distribution function. If the ith experiment is independent of the previous ones, then fXi (xi |x1 , . . . , xi−1 ) = fXi (xi ) and fX1 ...Xn (x1 , . . . , xn ) =

n Y

fXi (xi )

n Y

FXi (xi )

i=1

or equivalently FX1 ...Xn (x1 , . . . , xn ) =

i=1

which means that the events {X1 ≤ x1 }, . . . , {Xn ≤ xn } are independent. For the special case for n = 2, we have the following properties: 1) FX1 X2 (∞, ∞) = P(X1 ≤ ∞, X2 ≤ ∞) = 1

FX1 X2 (−∞, x2 ) = P(X1 ≤ −∞, X2 ≤ x2 ) = 0 = FX1 X2 (x1 , −∞) because {X1 = −∞, X2 ≤ x2 } ⊂ {X1 = −∞}, {X1 ≤ x1 , X2 = −∞} ⊂ {X2 = −∞}. 2) lim FX1 X2 (x1 + ǫ, x2 ) = lim FX1 X2 (x1 , x2 + ǫ) = FX1 X2 (x1 , x2 ).

ǫ→0

ǫ→0

3) The event {x11 < X1 ≤ x12 , X2 ≤ x2 } consists of all the points (X1 , X2 ) in D1 , and the event {X1 ≤ x1 , x21 < X2 ≤ x22 } consists of all the points in D2 . From Figure 7.5, we see that {X1 ≤ x12 , X2 ≤ x2 } = {X1 ≤ x11 , X2 ≤ x2 } ∪ {x11 < X1 ≤ x12 , X2 ≤ x2 }. The two summands are mutually exclusive, hence P({X1 ≤ x12 , X2 ≤ x2 }) = P({X1 ≤ x11 , X2 ≤ x2 }) + P({x11 < X1 ≤ x12 , X2 ≤ x2 }) and therefore, P({x11 < X1 ≤ x12 , X2 ≤ x2 }) = FX1 X2 (x12 , x2 ) − FX1 X2 (x11 , x2 ).

76

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES X2 6

x22 D2

x2 x21 x1 x11

x12

- X1

D1

Figure 7.5: Properties for n=2 Similarly, P({X1 < x1 , x21 < X2 ≤ x22 }) = FX1 X2 (x1 , x22 ) − FX1 X2 (x1 , x21 ) 4) P({x11 < X1 ≤ x12 , x21 < X2 ≤ x2 2}) =

FX1 X2 (x12 , x22 ) − FX1 X2 (x11 , x22 ) − FX1 X2 (x12 , x21 ) + FX1 X2 (x11 , x21 ) > 0

This is a probability that (X1 , X2 ) is in the rectangle D3 (see Figure 7.6). X2 6

x22 D3

x21 x11

x12

-X1

Figure 7.6: Probability that (X1 , X2 ) is in the rectangle D3 . Example 7.1.3 The bivariate normal distribution fX1 X1 (x1 , x2 ) =

1

p 1 − ρ2 "    ## x2 − µX2 2 1 (x1 − µX1 )(x2 − µX2 ) x1 − µX1 2 × exp − − 2ρ + 2(1 − ρ2 ) σ X1 σ X2 σ X2 σ X2 2πσX1 σX2 "

2 > 0 and σ 2 > 0 the correwhere µX1 and µX2 are respectively the means of X1 and X2 , σX X2 1 2 ) and N (µ , σ 2 ) sponding variances, and −1 < ρ < 1 is the correlation coefficient. N (µX1 , σX X2 X2 1 are the marginal distributions.

7.1. RANDOM VARIABLES

77 σ

2 The conditional density function fX2 (x2 |x1 ) of X2 when X1 = x1 is N (µX2 + ρ σX (x1 − X 2

2 (1 − ρ2 )). If X and X are independent then ρ = 0, and clearly f µX2 ), σX 1 2 X1 X2 (x1 , x2 ) = 2 fX1 (x1 )fX2 (x2 ).

7.1.9

Operations on Random Variables

The concept of a random variable was previously introduced as a means of providing a symmetric definition of events defined on a sample space. It formed a mathematical model for describing characteristics of some real, physical world random variables, which are mostly based on a single concept – expectation.

7.1.10

Expectation

The mathematical expectation of a random variable X, E [X] which may be read “the expected value of X” or “the mean value of X” is defined as Z ∞ x · fX (x)dx E [X] := −∞

where fX (x) is the probability density function of X. If X happens to be discrete with N possible values xi having probabilities P(xi ) of occurrence, then fX (x) =

N X i=1

E [X] =

N X

P(xi ) · δ(x − xi ) xi P(xi )

i=1

Example 7.1.4 i) Normal Distribution: If X ∼ N (µ, σ 2 ), then   1 1 2 2 fX (x) = √ · exp − (x − µ) /σ 2 σ 2π E [X] =

Z



xfX (x)dx Z ∞ 1 1 √ x exp{− (x − µ)2 /σ 2 }dx 2 σ 2π −∞ Z ∞ 1 1 √ (σu + µ) exp{− u2 }σdu 2 σ 2π −∞   Z ∞ Z ∞ 1 2 1 1 2 √ exp{− u }du u · exp{− u }du + µ σ 2 2 2π −∞ −∞ √ i 1 h √ σ · 0 + µ 2π 2π µ −∞

= = = = =

78

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

ii) Uniform distribution: If X is uniformly distributed on the interval [a, b], then fX (x) = where rect[a,b] =

E [X] = = = =

Z

1 · rect[a,b] (x) b−a 

1, a ≤ x ≤ b 0, else.



1 xfX (x)dx = b − a −∞ 1 1 2 (b − a2 ) b−a2 (b − a)(b + a) 2(b − a) a+b 2

Z

b

xdx a

which can be verified graphically. iii) Exponential Distribution: If X is exponentially distributed, then   x−a 1 u(x − a) fX (x) = exp − b b E [X] = = = = E [X] = Note:

Z



x · fX (x)dx   x−a 1 u(x − a)dx x · exp − b b −∞   Z 1 ∞ x−a x · exp − dx b a b Z h xi hai 1 ∞ x · exp − dx exp b a b b a+b −∞ Z ∞

If a random variable’s density is symmetrical about a line x = a, then E [X] = a, i.e. E [X] = a, if fX (x + a) = fX (−x + a)

7.1.11

Expected Value of a Function of Random Variables

Suppose we are interested in the mean of the random variable Y = g(X) Z ∞ y · fY (y)dx E [Y ] = −∞

and we are given fX (x). Then, E [Y ] = E [g(X)] =

Z

∞ −∞

g(x) · fX (x)dx

7.1. RANDOM VARIABLES

79

Example 7.1.5 The average power in a 1 Ω resistor can be found as Z ∞  2 v 2 fV (v)dv E V = −∞

where V and fV (v) are respectively the random voltage and its probability density function. In particular, if we apply the transformation g(·) to the random variable X, defined as g(X) = X n , n = 0, 1, 2, . . . , the expectation of g(X) is known as the nth order moment of X and denoted by µn : Z ∞ xn fX (x)dx . µn := E [X n ] = −∞

It is also of importance to use central moments of X around the mean. These are defined as µ′n := E [(X − E [X])n ] , n = 0, 1, 2 . . . Z Z ∞ (x − E [X])n · fX (x)dx = (x − µ)n fX (x)dx −∞

Clearly the first order central moment is zero.

7.1.12

The Variance

The variance of a random variable X is by definition Z ∞ 2 σ := (x − E [X])2 · fX (x)dx −∞

The positive constant σ, is called the standard deviation of X. From the definition it follows that σ 2 is the mean of the random variable (X − E [X])2 . Thus,     σ 2 = E (X − E [X])2 = E X 2 − 2X · (E [X]) + (E [X])2   = E X 2 − 2E [XE [X]] + (E [X])2   = E X 2 − 2 (E [X])2 + (E [X])2   σ 2 = E X 2 − (E [X])2   = E X 2 − µ2 Example 7.1.6 We will now find the variance of three common distributions. i) Normal distribution: Let X ∼ N (µ, σ 2 ). The probability density function of X is   1 1 (x − µ)2 fX (x) = √ exp − . 2 σ2 σ 2π

80

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES Clearly

Z

  √ 1 (x − µ)2 exp − 2π dx = σ 2 σ2 −∞ ∞

Differentiating with respect to σ, we obtain   Z ∞ √ 1 (x − µ)2 (x − µ)2 2π dx = exp − σ3 2 σ2 −∞ √ Multiplying both sides by σ 2 / 2π, we conclude that     E (X − µ)2 = E (X − E [X])2 = σ 2 .

ii) Uniform distribution:

    E [(X − E [X])2 = E X 2 − (E [X])2   Z ∞ a+b 2 2 x fX (x)dx − = 2 −∞  2 Z b 2 x a+b = dx − 2 a b−a 2 (b − a) . = 12 iii) Exponential distribution: fX (x) = λ exp{−λx}u(x), x > 0, λ > 0 E [X] = σ2 =

7.1.13

1 λ

1 λ2

Transformation of a Random Variable

In practice, one may wish to transform one random variable X into a new random variable Y by means of a transformation Y = g(X), where the probability density function fX (x) or distribution function of X is known. The problem is to find the probability density function of fY (y) or distribution function FY (y) of Y . Let us assume that g(·) is continuous and differentiable at all values of x for which fX (x) 6= 0. Furthermore, we assume that g(·) is monotone for which the inverse g −1 (·) exists. Then: FY (y) = P ({Y ≤ y}) = P ({g(X) ≤ y}) = P {X ≤ g

−1



(y)} =

Z

g −1 (y)

fX (x)dx −∞

holds. The density function of Y = g(X) is obtained via differentiation which leads to  dg −1 (y) −1 fY (y) = fX g (y) · dy

7.1. RANDOM VARIABLES

81

Example 7.1.7 Let Y = aX + b. To find the probability density function of Y knowing FX (x) or fX (x), we calculate FY (y) = P ({Y ≤ y}) = P ({aX + b ≤ y})   y−b = P X≤ , a>0 a   y−b = FX , a and FY (y) = 1 − FX



y−b a



, a<0

By differentiating FY (y), we obtain the probability density function   y−b 1 · fX fY (y) = |a| a Example 7.1.8 Let Y = X2 To find the probability density function of Y knowing FX (x) or fX (x), we calculate   0, y<0 2 √ √ FY (y) = P {X ≤ y} = P ({− y ≤ X ≤ y}) , y ≥ 0 √ √ FY (y) = FX ( y) − FX (− y) By differentiating FY (y), we obtain the probability density function   0, y<0 √ √ fY (y) = fX ( y) +√fX (− y) , y>0  2 y

FY (y) is non differentiable at y = 0. We may choose arbitrarily fY (y) = 0 at y = 0.

7.1.14

The Characteristic Function

The characteristic function (cf) of a random variable X is defined as follows:   ΦX (ejω ) := E ejωX Z ∞ fX (x)ejωx dx. =

(7.1) (7.2)

−∞

This can be interpreted as the expectation of a function g(X) = ejωX , or as the Fourier transform (with reversed sign of the exponent) of the probability density function (pdf) of X. Considering Equation (7.2) as a Fourier transform, we can readily apply all the properties of Fourier transforms to the characteristic function. Most importantly, the inverse relationship of Equation (7.2) is given by Z ∞ 1 fX (x) = ΦX (ω)e−jωx dω. 2π −∞

Using Schwarz’s inequality in (7.2) shows that |ΦX (ejω )| ≤ 1 and thus it always exists.

82

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Example 7.1.9 We shall derive the cf of a Gaussian distributed random variable. First, we consider Z ∼ N (0, 1). Z ΦZ (ω) = fZ (z)ejωz dz Z ∞ 1 2 1 = √ e− 2 z ejωz dz 2π −∞ Z ∞ 1 1 2 1 2 = √ e− 2 (z−jω) + 2 (jω) dz 2π −∞ Z − 21 ω 2 = e fZ (z − jω)dz 1

= e− 2 ω

2

Now we consider the more general case of X = σZ + µ ∼ N (µ, σ 2 ): h i ΦX (ω) = E ejω(σZ+µ) = ejωµ ΦZ (σω) 1

= ejωµ− 2 σ

2 ω2

The cf is also known as the moment-generating function as it may be used to determine nth order moment of a random variable. This can be seen from the following derivation. Consider the power series expansion of an exponential term for x < ∞: x

e =

∞ X xn i=0

n!

.

Substituting this into Equation (7.2) yields   ZZ (jωx)2 + · · · dx ΦX (ω) = fX (x) 1 + jωx + 2! (jω)2  2  = 1 + jωE [X] + E X + ··· 2!

(7.3)

Differentiating the above expression n times and evaluating at the origin gives  n d = j n E [X n ] ΦX (ω) dω ω=0

which means that the nth order moment of X can be determined according to  n 1 d n E [X ] = n . ΦX (ω) j dω ω=0

This is known as the moment theorem. When the power series expansion of Equation (7.3) converges the characteristic function and hence the pdf of X are completely determined by the moments of X.

7.2. COVARIANCE

7.2

83

Covariance

If the statistical properties of the two random variables X1 and X2 are described by their joint probability density function fX1 X2 (x1 , x2 ), then we have Z ∞Z ∞ g(x1 , x2 ) · fX1 X2 (x1 , x2 )dx1 dx2 E [g(X1 , X2 )] = For g(X1 , X2 ) = X1n X2k h i Z n k µn,k := E X1 X2 =



−∞

−∞

−∞

Z

xn1 xk2 fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . .

∞ −∞

  are called the joint moments of X1 and X2 . Clearly µ0k = E X2k , while µn0 = E [X1n ]. The second order moment E [X1 X2 ] of X1 and X2 we denote1 by rX1 X2 , Z ∞Z ∞ x1 x2 fX1 X2 (x1 , x2 )dx1 dx2 rX1 X2 = E [X1 X2 ] = −∞

−∞

If rX1 X2 = E [X1 X2 ] = E [X1 ] · E [X2 ], then X1 and X2 are known to be uncorrelated. If rX1 X2 = 0, then X1 and X2 are orthogonal. The moments µ′nk , k = 0, 1, 2, . . . are known to be joint central moments and defined by h i µ′nk = E (X1 − E [X1 ])n (X2 − E [X2 ])k ) Z ∞Z ∞ (x1 − E [X1 ])n (x2 − E [X2 ])k fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . . = −∞

−∞

The second order central moments 2 µ′20 = σX 1 2 µ′02 = σX 2

are the variances of X1 and X2 , respectively. The joint moment µ′11 is called covariance of X1 and X2 and denoted by cX1 X2 c X1 X2

= E [(X1 − E [X1 ])(X2 − E [X2 ])] Z ∞ (x1 − E [X1 ])(x2 − E [Y2 ]) · fX1 X2 (x1 , x2 )dx1 dx2 = −∞

Clearly cX1 X2 = rX1 X2 − E [X1 ] E [X2 ] . If two random variables are independent, i.e. fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), then cX1 X2 = 0. The converse however isp not always true (except for the Gaussian case). The normalized secondc ′ order moment ρ = µ11 / µ′20 µ′02 = σXX1σXX2 is known as the correlation coefficient of X1 and X2 . 1 2 It can be shown that −1 ≤ ρ ≤ 1 We will show that two uncorrelated random variables are not necessarily independent. 1

In the engineering literature rX1 X2 is often called correlation. We reserve this term to describe the normalized covariance as it is common in the statistical literature.

84

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Example 7.2.1 U is uniformly distributed on [−π, π) X1 = cos U, X2 = sin U X1 and X2 are not independent because X12 + X22 = 1 cXY

= E [(X1 − E [X1 ])(X2 − E [X2 ])] = E [X1 Y1 ] = E [cos U sin U ] = 12 E [sin 2U ] = 0

Thus X1 and X2 are uncorrelated but not independent. Example 7.2.2 Let Xi , i = 1, . . . , N be random variables and αi , i = 1, . . . , N be scalar values. We calculate the mean and the variance of Y =

N X

αi Xi

i=1

given the means and variances (and covariances) of Xi . E [Y ] =

N X

αi E [Xi ]

i=1

σY2

N N X  X  2 αi αj cXi Xj = E (Y − E [Y ]) = i=1 j=1

For the special case cXi Xj = 0, i 6= j

σY2 =

N X

2 . αi2 σX i

i=1

7.3

Stochastic Processes (Random Processes)

Random processes naturally arise in many engineering applications. For example, the voltage waveform of a noise source or the bit stream of a binary communications signal are continuoustime random processes. The monthly rainfall values or the daily stock market index are examples of discrete-time random processes. However, random sequences are most commonly encountered when sampling continuous-time processes, as discussed in the preceding chapter. The remainder of the manuscript focuses solely on discrete-time random processes. However the results presented herein apply in a similar fashion to the continuous- time case. Definition 7.3.1 A real random process is an indexed set of real functions of time that has certain statistical properties. We characterise a real random process by a set of real functions and associated probability description. In general, xi (n) = X(n, ζi ), n ∈ Z, ζi ∈ S, denotes the waveform that is obtained when the event ζi of the process occurs. xi (n) is called a sample function of the process (see Figure 7.7). The set of all possible sample functions {X(n, ζi )} is called the ensemble and defines the

7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)

85

.. .

xi−1 (n) = X(n, ζi−1 )

n

xi (n) = X(n, ζi )

n

xi+1 (n) = X(n, ζi+1 )

n .. .

n1 X1 = X(n1 ) = X(n1 , ζ)

n2 X2 = X(n2 ) = X(n2 , ζ)

Figure 7.7: Voltage waveforms emitted from a noise source. random process X(n) that describes a noise source. A random process can be seen as a function of two variables, time n and elementary event ζ. When n is fixed, X(n, ζ) is simply a random variable. When ζ is fixed, X(n, ζ) = x(n) is a function of time known as “sample function” or realisation. If n = n1 then X(n1 , ζ) = X(n1 ) = X1 is a random variable. If the noise source has a Gaussian distribution, then any of the random variables would be described by fXi (xi ) = q

1 2πσi2

·e



(xi −µi )2 2σ 2 i

where Xi = X(ni , ζ). Also, all joint distributions are multivariate Gaussian. At each point in time, the value of the waveform is a random variable described by its probability density function at that time n, fX (x; n). The probability that the process X(n) will have a value in the range [a, b] at n is given by P({a ≤ X(n) ≤ b}) =

Z

b

fX (x; n)dx a

86

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

The first moments of X(n) are µ1 (n) = E [X(n)]

=

  µ2 (n) = E X(n)2 = µl (n)

.. .

.. .

  = E X(n)l

=

Z



x fX (x; n)dx

Z−∞ ∞

x2 fX (x; n)dx

Z

xl fX (x; n)dx,

−∞ ∞

−∞

where µ1 (n) is the mean of X(n), and µ2 (n)−µ1 (n)2 is the variance of X(n). Because in general, fX (x; n) depends on n, the moments will depend on time. The probability that a process X(n) lies in the interval [a, b] at time n1 and in the interval [c, d] at time n2 is given by the joint density function fX1 X2 (x1 , x2 ; n1 , n2 ), Z bZ d fX1 X2 (x1 , x2 ; n1 , n2 )dx1 dx2 . P ({a ≤ X1 ≤ b, c ≤ X2 ≤ d}) = a

c

A complete description of a random process requires knowledge of all such joint densities for all possible point locations and orders fX1 X2 ...XN (x1 , . . . , xN ; n1 , . . . , nN ). • A continuous random process consists of a random process with associated continuously distributed random variables, X(n), n ∈ Z. For example, noise in communication circuits at a given time; the voltage measured can have any value. • A discrete random process consists of a random process with associated discrete random variables, e.g. the output of an ideal limiter.

7.3.1

Stationarity and Ergodicity

Definition 7.3.2 A random process X(n), n ∈ Z is said to be stationary to the order N if, for any n1 , n2 , . . . nN , and x1 , x2 , . . . , xN , and any n0 FX (x1 , . . . , xN ; n1 , . . . , nN ) = P ({X(n1 ) ≤ x1 , X(n2 ) ≤ x2 , . . . , X(nN ) ≤ xN }) = P({X(n1 + n0 ) ≤ x1 , X2 (n2 + n0 ) ≤ x2 , . . . , X(nN + n0 ) ≤ xN })

= FX (x1 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) holds or equivalently:

fX (x1 , x2 , . . . , xN ; n1 , . . . , nN ) = fX (x1 , x2 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) . The process is said to be strictly stationary if it is stationary to the infinite order. If a process is stationary, it can be translated in time without changing its statistical description. Ergodicity is a topic dealing with the relationship between statistical averages and time averages. Suppose that, for example, we wish to determine the mean µ1 (n) of a process X(n). For this purpose, we observe a large number of samples X(n, ζi ) and we use their ensemble average as the estimate of µ1 (n) = E [X(n)] L

µ ˆ1 (n) =

1X X(n, ζi ) L i=1

7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)

87

Suppose, however, that we have access only to a single sample x(n) = X(n, ζ) of X(n) for −N ≤ n ≤ N . Can we use then the time average N X 1 X(n, ζ) X(n) = lim N →∞ 2N + 1 n=−N

as the estimate of µ1 (n)? This is, of course, impossible if µ1 (n) depends on n. However, if µ1 (n) = µ1 is a constant, then under the general conditions, X(n) approaches µ1 . Definition 7.3.3 A random process is said to be ergodic if all time averages of any sample function are equal to the corresponding ensemble averages (expectation). Example 7.3.1 DC and RMS values are defined in terms of time averages, but if the process is ergodic, they may be evaluated by use of ensemble averages. The DC value of X(n) is: Xdc = E [X(n)] p Xrms = E [X(n)2 ].

If X(n) is ergodic, the time average is equal to the ensemble average, that is Xdc

Z ∞ N X 1 = lim xfX (x; n)dx = µX X(n) = E [X(n)] = N →∞ 2N + 1 −∞ n=−N

Similarly, we have for the rms value: v u Z ∞ N q X u p 1 2 2 t 2 2 x2 fX (x; n)dx Xrms = lim X(n) = E [X(n) ] = σX + µX = N →∞ 2N + 1 −∞ n=−N

2 is the variance of X(n). where σX

Ergodicity and Stationarity If a process is ergodic, all time and ensemble averages are interchangeable. The time average is by definition independent of time. As it equals the ensemble averages (such as moments), the latter are also independent of time. Thus, an ergodic process must be stationary. But not all stationary processes are ergodic. Example 7.3.2 An Ergodic Random Process Let the random process be given by: X(n) = A cos(ω0 n + Φ), where A and ω0 are constants and Φ is a random variable that is uniformly distributed over [0, 2π). First we evaluate some ensemble averages (expectation). Z ∞ x(ϕ) fΦ (ϕ) dϕ E [X(n)] = =

Z

−∞ 2π

A cos(ω0 n + ϕ) 0

= 0.

1 dϕ 2π

88

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Furthermore,   E X(n)2 =

Z



A2 cos(ω0 n + ϕ)2

0

1 dϕ 2π

A2 = 2 In this example, the time parameter n disappeared when the ensemble averages were evaluated. The process is stationary up to the second order. Now, evaluate the corresponding time averages using a sample function of the random process. X(n, ζ1 ) = A cos ω0 n that occurs when Φ = 0. The zero phase Φ = 0 corresponds to one of the events. The time average for any of the sample functions can be calculated by letting Φ be any value between 0 and 2π. The time averages are: N X 1 • First moment: X(n) = lim A cos(ω0 n + ϕ) = 0 N →∞ 2N + 1 n=−N

• Second moment:

X(n)2

N X A2 1 [A cos(ω0 n + ϕ)]2 = = lim N →∞ 2N + 1 2 n=−N

In this example, ϕ disappears when the time average is calculated. We see that the time average is equal to the ensemble average for the first and second moments. We have not yet proved that the process is ergodic because we have to evaluate all the orders of moments and averages. In general, it is difficult to prove that a process is ergodic. In this manuscript we will focus on stationary processes. Example 7.3.3 Suppose that the amplitude in the previous example is now random. Let us assume thatA and Φ are independent. In this case, the process is stationary but not ergodic because if E A2 = σ 2 , then      σ2  E X(n)2 = E A2 · E cos(ω0 n + Φ)2 = 2

but

N X 1 1 X(n)2 = A2 lim T →∞ 2N + 1 2 n=−N

Summary: • An ergodic process has to be stationary. • However, if a process is stationary, it may or may not be ergodic.

7.3.2

Second-Order Moment Functions and Wide-Sense Stationarity

Definition 7.3.4 The second-order moment function (SOMF) of a real stationary process X(n) is defined as: rXX (n1 , n2 ) = E [X(n1 )X(n2 )] Z ∞ Z ∞ x1 x2 fX1 X2 (x1 , x2 ; n1 , n2 ) dx1 dx2 = −∞

where X1 = X(n1 ) and X2 = X(n2 ).

−∞

7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)

89

If the process is stationary to the second order, the SOMF is only a function of the time difference κ = |n2 − n1 |, and rXX (κ) = E [X(n + κ)X(n)] Definition 7.3.5 A random process is said to be wide-sense stationary if: 1. E [X(n)] is a constant. 2. rXX (n1 , n2 ) = rXX (κ), where κ = |n2 − n1 | A process that is stationary to order 2 or greater is certainly wide-sense stationary. However, a finite order stationary process is not necessarily stationary. Properties of the SOMF   2 + µ2 is the second order moment 1. rXX (0) = E X(n)2 = σX X 2. rXX (κ) = rXX (−κ)

3. rXX (0) ≥ |rXX (κ)|, |κ| > 0 Proofs for 1 and 2 follow from the definition. The third property holds because   E (X(n + κ) ± X(n))2 ≥ 0     E X(n + κ)2 ± 2E [X(n + κ)X(n)] + E X(n)2 ≥ 0 rXX (0) ± 2 rXX (κ) + rXX (0) ≥ 0.

Definition 7.3.6 The “cross-SOMF” for two real processes X(n) and Y (n) is: rXY (n1 , n2 ) = E [X(n1 )Y (n2 )] If X(n) and Y (n) are jointly stationary, the cross-SOMF becomes: rXY (n1 , n2 ) = rXY (n1 − n2 ) = rXY (κ) where κ = n1 − n2 . Properties of the Cross-SOMF of a Jointly Stationary Real Process 1. rXY (−κ) = rY X (κ) p 2. |rXY (κ)| ≤ rXX (0)rY Y (0).

This holds because

h i E (X(n + κ) − aY (n))2 ≥ 0 ∀a ∈ R     E X(n + κ)2 + a2 E Y (n)2 − 2aE [X(n + κ)Y (n)] ≥ 0 ∀a ∈ R rXX (0) + a2 rY Y (0) − 2arXY (κ) ≥ 0

This quadratic form in a is valid if the roots are not real (except as a double real root). Therefore rXY (κ)2 − rXX (0)rY Y (0) ≤ 0

90

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES p rXY (κ)2 ≤ rXX (0)rY Y (0) ⇒ |rXY (κ)| ≤ rXX (0)rY Y (0)

Sometimes

r (κ) p XY rXX (0)rY Y (0)

is denoted by ρXY (κ). 3.

|rXY (κ)| ≤

1 [rXX (0) + rY Y (0)] 2

Property 2 constitutes a tighter bound than that of property 3, because the geometric mean of two positive numbers cannot exceed the arithmetic mean, that is p

rXX (0)rY Y (0) ≤

1 [rXX (0) + rY Y (0)] 2

• Two random processes X(n) and Y (n) are said to be uncorrelated if: rXY (κ) = E [X(n + κ)] E [Y (n)] = µX µY ∀κ ∈ Z • Two random processes X(n) and Y (n) are orthogonal if: rXY (κ) = 0, ∀κ ∈ Z • If X(n) = Y (n), the cross-SOMF becomes the SOMF. • If the random processes X(n) and Y (n) are jointly ergodic, the time average may be used to replace the ensemble average. If X(n) and Y (n) are jointly ergodic

N X 1 rXY (κ) = E [X(n + κ)Y (n)] = E [X(n)Y (n − κ)] ≡ lim X(n) Y (n − κ) N →∞ 2N + 1 △

n=−N

(7.4)

In this case, cross-SOMFs and auto-SOMFs may be measured by using a system composed of a delay element, a multiplier and an accumulator. This is illustrated in Figure 7.8, where the block z −κ is a delay line of order κ. X(n)

Y (n)

-

z −κ

-

?

×

Y (n − κ)

-

N X 1 2N + 1

n=−N

Figure 7.8: Measurement of SOMF.

-

rXY (κ)

7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)

91

Central Second-Order Moment Function (Covariance Function) The central second-order moment functionis called the covariance function and is defined by cXX (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(X(n) − E [X(n)])] which can be re-written in the form cXX (n + κ, n) = rXX (n + κ, n) − E [X(n + κ)] E [X(n)] The covariance function for two processes X(n) and Y (n) is defined by cXY (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(Y (n) − E [Y (n)])] or equivalently cXY (n + κ, n) = rXY (n + κ, n) − E [X(n + κ)] E [Y (n)] For processes that are at least jointly wide sense stationary, we have cXX (κ) = rXX (κ) − (E [X(n)])2 and cXY (κ) = rXY (κ) − E [X(n)] E [Y (n)] The variance of a random process is obtained from cXX (κ) by setting κ = 0. If the process is stationary h i 2 σX = E (X(n) − E [X(n)])2 = rXX (0) − (E [X(n)])2 = rXX (0) − µ2X

For two random processes, if cY X (n, n + κ) = 0, then they are called uncorrelated, which is equivalent to rXY (n + κ, n) = E [X(n + κ)] E [Y (n)]. Complex Random Processes △

Definition 7.3.7 A complex random process is Z(n) = X(n) + j Y (n) where X(n) and Y (n) are real random processes. A complex process is stationary if X(n) and Y (n) are jointly stationary, i.e. fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 , . . . , n2N ) = fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 + n0 , . . . , n2N + n0 )

for all x1 , . . . , xN , y1 , . . . , yN and all n1 , . . . , n2N , n0 and N . The mean of a complex random process is defined as: E [Z(n)] = E [X(n)] + jE [Y (n)] Definition 7.3.8 The SOMF for a complex random process is: rZZ (n1 , n2 ) = E [Z(n1 )Z(n2 )∗ ]

92

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Definition 7.3.9 The complex random process is stationary in the wide sense if E [Z(n)] is a complex constant and rZZ (n1 , n2 ) = rZZ (κ) where κ = |n2 − n1 |. The SOMF of a wide-sense stationary complex process has the symmetry property: rZZ (−κ) = rZZ (κ)∗ as it can be easily seen from the definition. Definition 7.3.10 The cross-SOMF for two complex random processes Z1 (n) and Z2 (n) is: rZ1 Z2 (n1 , n2 ) = E [Z1 (n1 )Z2 (n2 )∗ ] If the complex random processes are jointly wide-sense stationary, the cross-SOMF becomes: rZ1 Z2 (n1 , n2 ) = rZ1 Z2 (κ) where κ = |n2 − n1 | The covariance function of a complex random process is defined as cZZ (n + κ, n) = E [(Z(n + κ) − E [Z(n + κ)]) (Z(n) − E [Z(n)])∗ ] and is a function of κ only if Z(n) is wide-sense stationary. The cross-covariance function of Z1 (n) and Z2 (n) is given as: cZ1 Z2 (n + κ, n) = E [(Z1 (n + κ) − E [Z1 (n + κ)]) (Z2 (n) − E [Z2 (n)])∗ ] Z1 (n) and Z2 (n) are called uncorrelated if cZ1 Z2 (n + κ, n) = 0. They are orthogonal if rZ1 Z2 (n + κ, n) = 0. Example 7.3.4 A complex process Z(n) is comprised of a sum of M complex signals Z(n) =

M X

Am ej(ω0 n+Φm )

m=1

Here ω0 is constant, Am is a zero-mean random amplitude of the mth signal and Φm is its random phase. We assume all variables Am and Φm for m = 1, 2, . . . , M to be statistically independent and the random phases uniformly distributed on [0, 2π). Find the mean and SOMF. Is the process wide sense stationary ?

E [Z(n)] =

=

M X

m=1 M X

m=1

  E [Am ] ejω0 n E ejΦm E [Am ] ejω0 n {E [cos(Φm )] + jE [sin(Φm )]}

= 0

rZZ (n + κ, n) = E [Z(n + κ)Z(n)∗ ] # " M M X X −jω0 n −jΦl jω0 (n+κ) jΦm Al e e · Am e e = E m=1

=

M M X X

m=1 l=1

l=1

h i E [Am Al ] · ejω0 κ · E ej(Φm −Φl )

7.4. POWER SPECTRAL DENSITY Since

93

h i  1 m=l j(Φm −Φl ) E e = 0 m= 6 l

The former holds because for m 6= l,       E ejΦm −Φl = E ejΦm E e−jΦl = 0

Thus if m = l,

rZZ (n + κ, n) =

M X

m=1

7.4 7.4.1

M X    2  jω0 κ jω0 κ E A2m E Am e =e m=1

Power Spectral Density Definition of Power Spectral Density

Let X(n, ζi ), n ∈ Z, represent a sample function of a real random process X(n, ζ) ≡ X(n). We define a truncated version of this sample function as follows:  X(n, ζi ), |n| ≤ M, M ∈ Z+ xN (n) = 0, elsewhere where N = 2M + 1 is the length of the truncation window. As long as M is finite, we presume that xN (n) will satisfy M X |xN (n)| < ∞ n=−M

Then the Fourier transform of xN (n) becomes: XN (ejω ) =

∞ X

xN (n) e−jωn

n=−∞

=

M X

X(n, ζi ) e−jωn

n=−M

The normalised energy in the time interval [−M, M ] is EN =

M X

xN (n)2 =

n=−M

Z

π

−π

XN (ejω ) 2 dω 2π

where Parseval’s theorem has been used to obtain the second integral. By dividing the expression EN by N we obtain the average power PN in xN (n) over the interval [−M, M ]: PN

Z π M X |XN (ejω )|2 dω 1 2 xN (n) = = 2M + 1 −π 2M + 1 2π

(7.5)

n=−M

From Equation (7.5), we see that the average power PN , in the truncated ensemble member xN (n), can be obtained by the integration of |XN (ejω )|2 /N . To obtain the average power for the entire sample function, we must consider the limit as M → ∞. PN provides a measure of power for a single sample function. If we wish to determine the average power of the random process,

94

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

we must also take the average (expected value) over the entire ensemble of sample functions. The average power of the process X(n, ζ) is defined as  Z π  M X   E |XN (ejω , ζ)|2 dω 1 2 PXX = lim E X(n, ζ) = lim M →∞ 2M + 1 M →∞ −π 2M + 1 2π n=−M

Remark 1: The average power PXX in a random process X(n) is given by the time average of its second moment.     Remark 2: If X(n) is wide-sense stationary, E X(n)2 is a constant and PXX = E X(n)2 . PXX can be obtained by a frequency domain integration. If we define the power spectral density (PSD) of X(n) by h 2 i E XN (ejω , ζ) SXX (ejω , ζ) = lim M →∞ 2M + 1 where M X XN (ejω , ζ) = X(n, ζ) e−jωn n=−M

then the normalised power is given by Z PXX =

π

SXX (ejω )

−π

dω = rXX (0) 2π

Example 7.4.1 Consider the random process X(n) = A · cos(ω0 n + Φ) where A and ω0 are real constants and Φ is a random variable uniformly distributed on [0, π/2). We shall find PXX for X(n) using   M X E X(n)2 PXX = lim . M →∞ 2M + 1 n=−M

The mean-squared value of X(n) is h i   E X(n)2 = E A2 cos (ω0 n + Φ)2   2 A2 A + cos (2ω0 n + 2Φ) = E 2 2 Z 2 A2 A2 π/2 cos (2ω0 n + 2ϕ) dϕ + = 2 2 0 π   A2 A2 sin (2ω0 n + 2ϕ) π/2 2 = + 2 2 2 π 0   2 2 A A sin (2ω0 n + π) − sin (2ω0 n) 2 = + 2 2 2 π 2 2 A A 2 −2 sin (2ω0 n) = + 2 2 π 2 A2 A2 − sin (2ω0 n) = 2 π

7.4. POWER SPECTRAL DENSITY

95

  X(n) is non stationary because E X(n)2 is a function of time. The time average of the above expression is  M  2 X 1 A A2 A2 lim − sin(2ω0 n) = M →∞ 2M + 1 2 π 2 n=−M

Using PXX = lim

Z

π

M →∞ −π

yields PXX =

A2 2

  E XN (ejω )2 dω 2M + 1 2π

which agrees with the above, which should be shown by the reader as an exercise.

The results derived for the PSD can be obtained similarly for two processes X(n) and Y (n), say. The cross-power density spectrum is then given by   jω )Y (ejω )∗ E X (e N N SXY (ejω ) = lim M →∞ 2M + 1 and thus the total average cross power is PXY =

7.4.2

Z

π

SXY (ejω )

−π

dω 2π

Relationship between the PSD and SOMF: Wiener-Khintchine Theorem

Theorem 7.4.1 Wiener-Khintchine Theorem When X(n) is a wide-sense stationary process, the PSD can be obtained from the Fourier Transform of the SOMF: SXX (ejω ) = F {rXX (κ)} = and conversely,  rXX (κ) = F −1 SXX (ejω ) =

∞ X

rXX (κ) e−jωκ

κ=−∞

Z

π

SXX (ejω ) ejωκ

−π

dω 2π

Consequence: there are two distinctly different methods that may be used to evaluate the PSD of a random process: 1. The PSD is obtained directly from the definition h 2 i E XN (ejω ) SXX (ejω ) = lim M →∞ 2M + 1 where XN (ejω ) =

M X

X(n) e−jωn

n=−M

2. The PSD is obtained by evaluating the Fourier transform of rXX (κ), where rXX (κ) has to be obtained first.

96

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

7.4.3

Properties of the PSD

The PSD has a number of important properties: 1. The PSD is always real, even if X(n) is complex jω ∗

SXX (e )

∞ X

=

κ=−∞ ∞ X

=

κ=−∞ ∞ X

=

rXX (κ)∗ e+jωκ rXX (−κ)e+jωκ ′

rXX (κ′ )e−jωκ

κ=−∞

= SXX (ejω ) 2. SXX (ejω ) ≥ 0, even if X(n) is complex. 3. When X(n) is real, SXX (e−jω ) = SXX (ejω ). Rπ 4. −π SXX (ejω ) dω 2π = PXX is the total normalised average power.

Example 7.4.2

X(n) = A sin(ω0 n + Φ) where A, ω0 are constants and Φ is uniformly distributed on [0, 2π). The mean of X(n) is given by Z 2π 1 E [X(n)] = A sin(ω0 n + ϕ) dϕ = 0 2π 0 and its SOMF is A2 cos(ω0 κ) rXX (κ) = E [X(n + κ)X(n)] = 2 By taking the Fourier transform of rXX (κ) we obtain the PSD A2 π {η(ω − ω0 ) + η(ω + ω0 )} 2

SXX (ejω ) = F{rXX (κ)} = where

∞ X

η(ω) =

δ(ω + k2π)

k=−∞

Example 7.4.3 Suppose that the random variables Ai are uncorrelated with zero mean and variance σi2 , i = 1, . . . , P . Define Z(n) =

P X

Ai exp{jωi n}.

i=1

It follows that

rZZ (κ) =

P X

σi2 exp{jωi κ}

i=1

and jω

SZZ (e ) =

P X i=1

σi2 η(ω − ωi )

7.4. POWER SPECTRAL DENSITY

97

Example 7.4.4 Suppose now that the random variables Ai and Bi , i = 1, . . . , P are pairwise uncorrelated with zero mean and common variance     E A2i = E Bi2 = σi2 , i = 1, . . . , P. Define

X(n) =

P X

[Ai cos(ωi n) + Bi sin(ωi n)] .

i=1

We find

rXX (κ) =

P X

σi2 cos(ωi κ)

i=1

and therefore SXX = π

P X i=1

σi2 [η(ω − ωi ) + η(ω + ωi )]

Definition 7.4.1 A random process X(n) is said to be a “white noise process” if the PSD is constant over all frequencies: N0 SXX (ejω ) = 2 where N0 is a positive constant. The SOMF for a white process is obtained by taking the inverse Fourier transform of SXX (ejω ). rXX (κ) =

N0 δ(κ) 2

where δ(κ) is the Kronecker delta function given by  1, κ=0 δ(κ) = 0, κ= 6 0

7.4.4

Cross-Power Spectral Density

Suppose we have observations xN (n) and yN (n) of a random process X(n) and Y (n), respectively. Let XN (ejω ) and YN (ejω ) be their respective Fourier transforms, i.e. XN (ejω ) =

M X

xN (n)e−jωn

n=−M

and YN (ejω ) =

M X

yN (n)e−jωn

n=−M

The cross-spectral density function is given as   E XN (ejω )YN (ejω )∗ SXY (e ) = lim M →∞ 2M + 1 jω

98

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Substituting XN (ejω ) and YN (ejω ): M X 1 SXY (e ) = lim M →∞ 2M + 1 jω

M X

rXY (n1 , n2 )e−jω(n1 −n2 )

n1 =−M n2 =−M

Taking the inverse Fourier transform of both sides Z

π



SXY (e )e

−π

jωκ dω



=

Z

M X 1 −π M →∞ 2M + 1 π

lim

M X

rXY (n1 , n2 )e−jω(n1 −n2 ) ejωκ

n1 =−M n2 =−M

1 = lim M →∞ 2M + 1

M X

M X

rXY (n1 , n2 )

M X

n1 =−M n2 =−M

π

ejω(n2 +κ−n1 )

−π

n1 =−M n2 =−M

M X 1 = lim M →∞ 2M + 1

Z

dω 2π

dω 2π

rXY (n1 , n2 )δ(n2 + κ − n1 )

M X 1 rXY (n2 + κ, n2 ) = lim M →∞ 2M + 1

(7.6)

n2 =−M

provided −M ≤ n2 + κ ≤ M so that the delta function is in the range of the summation. This condition can be relaxed in the limit as M → ∞. The above result shows that the time average of the SOMF and the PSD form a Fourier transform pair. Clearly, if X(n) and Y (n) are jointly stationary then this result also implies jω

SXY (e ) =

∞ X

rXY (κ)e−jωκ

κ=−∞

Definition 7.4.2 For two jointly stationary processes X(n) and Y (n) the cross-power spectral density is given by ∞ X rXY (κ)e−jωκ SXY (ejω ) = F{rXY (κ)} = κ=−∞

and is in general complex. Properties:

1. SXY (ejω ) = SY X (ejω )∗ . In addition, if X(n) and Y (n) are real, then SXY (ejω ) = SY X (e−jω ).   2. Re SXY (ejω ) and Re SY X (ejω ) are even functions of ω if X(n) and Y (n) are real.   3. Im SXY (ejω ) and Im SY X (ejω ) are odd functions of ω if X(n) and Y (n) are real. 4. SXY (ejω ) = 0 and SY X (ejω ) = 0 if X(n) and Y (n) are orthogonal.

7.5. DEFINITION OF THE SPECTRUM

99

Property 1 results from the fact that: ∞ X



SY X (e ) = = = =

rY X (κ)e−jωκ

κ=−∞ ∞ X

κ=−∞ ∞ X

"

rXY (−κ)∗ e−jωκ

rXY (κ)e κ=−∞ SXY (ejω )∗

−jωκ

#∗

Example 7.4.5 Let the SOMF function of two processes X(n) and Y (n) be rXY (n + κ, n) =

AB [sin(ω0 κ) + cos(ω0 [2n + κ])] 2

where A, B and ω0 are constants. We find M M X X 1 1 AB AB rXY (n + κ, n) = sin(ω0 κ) + lim cos(ω0 [2n + κ]) M →∞ 2M + 1 2 2 M →∞ 2M + 1 n=−M n=−M | {z }

lim

=0

so using the result of Equation (7.6)

 AB sin(ω0 κ) SXY (e ) = F 2 π AB = [η(ω − ω0 ) − η(ω + ω0 )] j 2 



7.5

Definition of the Spectrum

The PSD of a stationary random process has been defined as the Fourier transform of the SOMF. We now define the ‘spectrum’ of a stationary random process X(n) as the Fourier transform of the covariance function: CXX (ejω ) :=

∞ X

cXX (n)e−jωn

n=−∞

P If n |cXX (n)| < ∞, CXX (ejω ) exists and is bounded and uniformly continuous. The inversion is given by Z π 1 CXX (ejω )ejωn dω cXX (n) = 2π −π CXX (ejω ) is real, 2π periodic and non-negative. The cross-spectrum of two jointly stationary random processes X(n) and Y (n) is defined by jω

CXY (e ) :=

∞ X

n=−∞

cXY (n)e−jωn

100

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

with the property CXY (ejω ) = CY X (ejω )∗ The inversion is given by cXY (n) =

1 2π

If X(n) and Y (n) are real,

Z

π

CXY (ejω )ejωn dω

−π

CXX (ejω ) = CXX (e−jω ) CXY (ejω ) = CXY (e−jω )∗ = CY X (e−jω ) = CY X (ejω )∗ The spectrum of a real process is completely specified on the interval [0, π].

7.6

Random Signals and Linear Systems

7.6.1

Convergence of Random Variables

In the following we consider definitions of convergence of a sequence of random variables Xk , k = 0, 1, 2, . . . . They are given in order of power. 1. Convergence with probability one or almost surely convergence A variable Xk converges to X with probability 1 if   P lim |Xk − X| = 0 = 1 k→∞

2. Convergence in the Mean Square Sense A variable Xk converges to X in the MSS if   lim E |Xk − X|2 = 0 k→∞

3. Convergence in Probability Xk converges in probability to X if ∀ ǫ > 0

lim P (|Xk − X| > ǫ) = 0

k→∞

4. Convergence in Distribution Xk converges in distribution to X (or Xk is asymptotically FX distributed) if lim FXk (x) = FX (x)

k→∞

for all continuous points of FX . The convergences 1 to 4 are not equally weighted. i) Convergence with probability 1 implies convergence in probability. ii) Convergence with probability 1 implies convergence in the MSS, provided second order moments exist. iii) Convergence in the MSS implies convergence in probability. iv) Convergence in probability implies convergence in distribution. With the concepts of convergence one may define continuity, differentiability and so on.

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS

7.6.2

101

Linear filtering

We are given two stationary processes X(n) and Y (n) P and a linear time-invariant (LTI) system with impulse response h(n), where the filter is stable ( |h(n)| < ∞) as in figure 7.9. Then for such a system X X h(k)X(n − k) = h(n − k)X(k) Y (n) = k

exists with probability one i.e. for N, M → ∞.

k

PM

k=−N

h(k)X(n − k) converges with probability one to Y (n)

Y(n)

h(n)

X(n)

Figure 7.9: A linear filter. If X(n) is stationary and E [|X(n)|] < ∞, then Y (n) is stationary. If X(n) is white noise, 2 δ(n − k), with σ 2 the power of the noise and δ(n) the i.e. E [X(n)] = 0, E [X(n)X(k)] = σX X Kronecker delta function  1, n = 0 δ(n) = 0, else Y (n) is called a linear process. If we do not require the filter to be stable but where X H(ejω ) = h(n)e−jωn

R

|H(ejω )|2 dω < ∞,

n

P is the transfer function of the system, and if for X(n), |cXX (n)| < ∞ then X X Y (n) = h(k)X(n − k) = h(n − k)X(k) k

k

exists in the MSS, i.e, M X

k=−N

h(k)X(n − k)

converges in the MSS for N, M → ∞ and Y (n) is stationary in the wide sense with X h(k)E [X(n − k)] = µX H(ej0 ) µY , E [Y (n)] = k

102

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Example 7.6.1 Hilbert Transforms   −j, 0 < ω < π 0, ω = 0, −π H(ejω ) =  j, −π < ω < 0  0, n=0 h(n) = (1 − cos(πn))/(πn), else R The filter is not stable, but |H(ejω )|2 dω < ∞. Y (n) is stationary in the wide sense. For example if X(n) = a cos(ωo n + φ), then Y (n) = a sin(ωo n + φ). Output covariance function and spectrum We now determine the relationship between the covariance function and the spectrum, at the input and at the output. First, let us define X X h(k)E [X(n − k)] h(k)X(n − k) − Y˜ (n) , Y (n) − µY = k

k

=

X k

h(k) (X(n − k) − µX ) =

X k

˜ − k) h(k)X(n

˜ where X(n) , X(n) − µX . In the general case, the system impulse response and the input process may be complex. The covariance function of the output is determined according to h i cY Y (κ) = E Y˜ (n + κ)Y˜ (n)∗ !∗ # ! " X X ˜ − l) ˜ + κ − k) h(l)X(n h(k)X(n = E k

=

XX k

=

l

XX k

l

l

h i ˜ + κ − k)X(n ˜ − l)∗ h(k)h(l) E X(n ∗

h(k)h(l)∗ cXX (κ − k + l)

Taking the Fourier transform of both sides leads to CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) and applying the inverse Fourier transform formula yields2 Z π dω |H(ejω )|2 CXX (ejω )ejωκ cY Y (κ) = 2π −π Example 7.6.2 FIR filter Let X(n) be white noise with power σ 2 and Y (n) = X(n) − X(n − 1). 2

The integral exists because CXX (ejω ) is bounded and

R

|H(ejω )|2 dω < ∞.

(7.7)

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS

103

From the above, one can see that  n=0  1, −1, n = 1 h(n) =  0, else

Taking the Fourier transform leads to H(ejω ) =

∞ X

n=−∞

h(n)e−jωn = 1 − e−jω

and thus, CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) = |1 − e−jω |2 σ 2 = |2je−jω/2 sin(ω/2)|2 σ 2 = 4 sin(ω/2)2 σ 2

Cross-covariance function and cross-spectrum The cross-covariance function of the output is determined according to h i ˜ ∗ cY X (κ) = E Y˜ (n + κ)X(n) ! # " X ∗ ˜ + κ − k) X(n) ˜ h(k)X(n = E k

=

X k

=

X k

h i ˜ + κ − k)X(n) ˜ ∗ h(k)E X(n h(k)cXX (κ − k)

Taking the Fourier transform of both sides leads to CY X (ejω ) = H(ejω )CXX (ejω ) and applying the inverse Fourier transform formula yields Z π dω H(ejω )CXX (ejω )ejωκ cY X (κ) = . 2π −π Example 7.6.3 Consider a system as depicted in Figure 7.9 where X(n) is a stationary, zero2 . The impulse response of the system is given by mean white noise process with variance σX   n  n  1 1 2 1 + h(n) = − u(n) 3 2 3 2 where u(n) is the unit step function u(n) =



1, n ≥ 0 0, n < 0

and we wish to determine CY Y (ejω ) and CY X (ejω ).

104

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES The system transfer function is given by H(ejω ) = F{h(n)} = =

1 1+

1 −jω 4e

1/3 2/3 1 −jω + 1 − 4e 1 + 12 e−jω

− 81 e−j2ω

The spectrum and cross-spectrum are given respectively by CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) 2 σX = 1 + 1 e−jω − 1 e−j2ω 2 4

8

CY X (ejω ) = H(ejω )CXX (ejω ) 2 σX = 1 + 41 e−jω − 81 e−j2ω Interaction of Two Linear Systems

The previous result may be generalised to obtain the cross-covariance or cross-spectrum of two linear systems. Let X1 (n) and Y1 (n) be the input and output processes of the first system, with impulse response h1 (n), and X2 (n) and Y2 (n) be the input and output processes of the second system with impulse response h2 (n) (see Figure 7.10). Suppose that X1 (n) and X2 (n) are widesense stationary and the systems are time-invariant and linear. Then the output cross-covariance function is: cY1 Y2 (κ) =

XX k

l

h1 (k)h2 (l)∗ cX1 X2 (κ − k + l)

(7.8)

It can be seen from (7.8) that cY1 Y2 (κ) = h1 (κ) ⋆ h2 (−κ)∗ ⋆ cX1 X2 (κ) where ⋆ denotes convolution. Taking the Fourier transform leads to CY1 Y2 (ejω ) = H1 (ejω )H2 (ejω )∗ CX1 X2 (ejω )

X1 (n)

- h1 (n), H1 (ejω )

- Y1 (n) =

cX1 X2 (κ)

P

κ h1 (κ)X1 (n

CX1 X2 (ejω ) X2 (n)

(7.9)

− κ)

cY1 Y2 (κ)

CY1 Y2 (ejω ) - h2 (n), H2 (ejω )

- Y2 (n) =

P

κ h2 (κ)X2 (n

Figure 7.10: Interaction of two linear systems

− κ)

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS

105

Example 7.6.4 Consider the system shown in Figure 7.11 where Z(n) is stationary, zero-mean white noise with variance σZ2 . Let h(n) be the same system as discussed in Example 7.6.3 and g(n) is given by 1 1 g(n) = δ(n) + δ(n − 1) − δ(n − 2) 2 2 where δ(n) is the Kronecker delta function. We wish to find the cross-spectrum CXY (ejω ). X G(ejω ) = g(n)e−jωn n

1 1 = 1 + e−jω − e−j2ω 2 2

CXY (ejω ) = H(ejω )G(ejω )∗ CZZ (ejω ) # " 1 + 21 e−jω − 21 e−j2ω 2 = 1 −j2ω σZ 1 −jω − 8e 1 + 4e

h(n)

X(n)

g(n)

Y (n)

Z(n)

Figure 7.11: Interaction of two linear systems.

Cascade of linear systems We consider a cascade of L linear systems in serial connection, as illustrated by Figure 7.12. From linear system theory, it is known that a cascade of linear systems can be equivalently represented by a system whose transfer function H(ejω ) is the product of the L individual system functions H(ejω ) = H1 (ejω )H2 (ejω ) · · · HL (ejω ). When a random process is the input to such a system, the spectrum of the output and cross spectrum of the output with the input are given respectively by 2 L L Y Y Hi (ejω ) 2 CY Y (ejω ) = CXX (ejω ) Hi (ejω ) = CXX (ejω ) i=1

CY X (ejω ) = CXX (ejω )

L Y

i=1

Hi (ejω )

i=1

Example 7.6.5 We consider again the system shown in Figure 7.11 where we wish to find the cross spectrum CXY (ejω ). Instead of using Equation (7.9) with X1 (n) = X2 (n) = Z(n) as done

106

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES H1 (ejω )

X(n)

H2 (ejω )

HL (ejω )

···

Y (n)

Figure 7.12: Cascade of L linear systems.

1 G(ejω )

Y (n)

H(ejω )

X(n)

Figure 7.13: Cascade of two linear systems. in Example 7.6.4, we reformulate the problem as a cascade of two systems as shown in Figure 7.13. Remembering that CY Y (ejω ) = |G(ejω )|2 CZZ (ejω ) (from Figure 7.11 and Equation (7.7)) we obtain CXY (ejω ) = =

H(ejω ) CY Y (ejω ) G(ejω ) H(ejω ) |G(ejω )|2 CZZ (ejω ) G(ejω )

= H(ejω )G(ejω )∗ CZZ (ejω )

7.6.3

Linear filtered process plus noise

Consider the system shown in Figure 7.14. It is assumed that X(n) is stationary with E [X(n)] = 0, V (n) stationary with E [V (n)] = 0 and cV X (n) = 0 (X(n) and V (n) uncorrelated). It is also assumed that h(n), X(n) and V (n) are real.

V(n)

X(n)

h(n)

+

Figure 7.14: A linear filter plus noise.

Y(n)

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS

107

The expected value of the output is derived below. Y (n) = E [Y (n)] =

∞ X

k=−∞ ∞ X

k=−∞

h(k)X(n − k) + V (n) h(k)E [X(n − k)] + E [V (n)] = 0

Output covariance function and spectrum The covariance function of the output is determined according to cY Y (κ) = E [Y (n + κ)Y (n)] !# ! " X X h(l)X(n − l) + V (n) h(k)X(n + κ − k) + V (n + κ) = E l

k

=

XX k

+

X k

+

X l

l

h(k)h(l)E [X(n + κ − k)X(n − l)]

h(k)E [X(n + κ − k)V (n)]

h(l)E [V (n + κ)X(n − l)]

+ E [V (n + κ)V (n)] XX h(k)h(l)cXX (κ − k + l) + cV V (κ) = k

l

Applying the Fourier transform to both sides of the above expression yields CY Y (ejω ) = F{cY Y (κ)} = |H(ejω )|2 CXX (ejω ) + CV V (ejω ) Cross-covariance function and cross-spectrum The cross-covariance function of the output is determined according to cY X (κ) = E [Y (n + κ)X(n)] ! # " X h(k)X(n + κ − k) + V (n + κ) X(n) = E k

=

X k

cY X (κ) =

X k

h(k)E [X(n + κ − k)X(n)]

h(k)cXX (κ − k)

Applying the Fourier transform to both sides of the above expression yields CY X (ejω ) = H(ejω )CXX (ejω ) Note that when the additive noise at the output, V (n) is uncorrelated with the input process X(n), it has no effect on the cross-spectrum of the input and output.

108

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Example 7.6.6 A complex-valued process X(n) is comprised of a sum of K complex signals K X

X(n) =

Ak ej(ω0 n+φk )

k=1

where ω0 is a constant and Ak is a zero-mean random amplitude of the kth signal with variance σk2 , k = 1, 2, . . . , K. The phase φk is uniformly distributed on [−π, π). The random variables Ak and φk are statistically pairwise independent for all k = 1, . . . , K. Z(n) is the input to a linear, stable time-invariant system with known impulse response h(n). The output of the system is buried in zero-mean noise V (n) of known covariance cV V (n), independent of Z(n) (refer to Figure 7.14). We wish to determine the cross-spectrum CY X (ejω ) and the auto-spectrum CY Y (ejω ). We first calculate the covariance function of the input process X(n) i hP K j(ω0 n+φk ) E [X(n)] = E k=1 Ak e and since Ak and φk are independent for all k = 1, . . . , K =

K X k=1

h i E [Ak ] E ej(ω0 n+φk ) = 0

cXX (n + κ, n) = E [X(n + κ)X(n)∗ ] "K # K X X = E Ak ej(ω0 (n+κ)+φk ) Al e−j(ω0 n+φl ) =

k=1 K K XX k=1 l=1

Since

h

E e

l=1

h i E [Ak Al ] ejω0 κ E ej(φk −φl )

j(φk −φl )

cXX (n + κ, n) =

i

=



1, 0,

k=l k 6= l

K X   E A2k ejω0 κ k=1

= ejω0 κ

K X

σk2 = cXX (κ)

k=1

CY X (ejω ) = H(ejω )CXX (ejω ) = H(ejω )F {cXX (κ)} "K # X X = H(ejω ) δ(ω + ω0 + 2πl) σk2 2π k=1

=

"

K X k=1

σk2

#

X l

l

H(ω − ω0 + 2πl)δ(ω − ω0 + 2πl)

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) + CV V (ejω ) # "K X X |H(ω − ω0 + 2πl)|2 δ(ω − ω0 + 2πl) + CV V (ejω ) σk2 = k=1

l

109

110

CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

Chapter 8

The Finite Fourier Transform Let X(0), . . . , X(n−1) be a model for the observations x(0), . . . , x(N −1) of a stationary random process X(n), n = 0, ±1, ±2, . . . . The function XN (ejω ) =

N −1 X

X(n)e−jωn ,

−∞ < ω < ∞

n=0

is called the finite Fourier transform. Properties The finite Fourier transform has some properties which are listed below. 1. XN (ej(ω+2π) ) = XN (ejω ) 2. XN (e−jω ) = XN (ejω )∗ ∀ X(n) ∈ R 3. The finite Fourier transform of aX(n) + bY (n) is aXN (ejω ) + bYN (ejω ), where a and b are constants. P P 4. If y(n) = m h(m)x(n − m), n (1 + |n|)|h(n)| < ∞ and |X(n)| < M , then there exists a constant K > 0 such that |YN (ejω ) − H(ejω )XN (ejω )| < K ∀ ω. Rπ 5. X(n) = (1/2π) −π XN (ejω )ejωn dω, n = 0, 1, . . . , N − 1, and 0 otherwise. The finite Fourier transform can be taken at discrete frequencies ωk = 2πk/N X XN (2πk/N ) = X(n)e−j2πkn/N k = 0, 1, . . . , N − 1 ,

which can be computed by means of the Fast Fourier Transform (FFT).

8.1 8.1.1

Statistical Properties of the Finite Fourier Transform Preliminaries

Real normal distribution: X = (X(0), . . . , X(N − 1))T is said to be normally distributed (X ∼ NN (µ, Σ)) if 1

fX (X) = (2π)−N/2 |Σ|−1/2 e− 2 (X−µ) 111

T Σ−1 (X−µ)

112

CHAPTER 8. THE FINITE FOURIER TRANSFORM

    given that µ = E (X(0), . . . , X(N − 1))T , Σ = (Cov [Xi , Xk ])i,k=0,...,N −1 = E (X − µ)(X − µ)T and |Σ| > 0. Complex normal distribution: A vector X is called complex normally distributed with mean µ and covariance Σ (X ∼ NNC (µ, Σ)) if the 2N vector of real valued random variables   ℜX ℑX is a normal vector with mean 

ℜµ ℑµ



and covariance matrix 1 2



ℜΣ −ℑΣ ℑΣ ℜΣ



One should note that Σ is Hermitian and so (ℑΣ)T = −ℑΣ, and   E (X − µ)(X − µ)T = 0 If Σ is nonsingular, then the probability density function is c fX (x) = π −N |Σ|−1 e−(x−µ)

H

Σ−1 (x−µ)

2 . Then Cov [ℜX, ℑX] = 0, i.e. For N = 1, Σ = E [(X − µ)(X − µ)∗ ] is real and identical to σX 2 2 ℜX and ℑX are independent N1 (ℜX, σX /2) and N1 (ℑX, σX /2) distributed respectively.

Asymptotic Normal Distribution We say that a sequence X(n), n = 1, 2, . . . of N dimensional random vectors is asymptotically  −1  NNC (µ(n) , Σ(n) ) distributed if the distribution function of G(n) X(n) − µ(n) converges to the distribution NNC (0, I) as n → ∞. Herein G(n) is a matrix with G(n) [G(n) ]H = Σ(n) . In the following, we assume that X(n) is real and stationary with finite moments of all orders and absolutely summable cumulants, that is, the statistical dependence between X(n) and X(m) decreases sufficiently fast as |n − m| increases, for example, if X(n) is a linear process. Then we have the following properties for the finite Fourier transform. 1. (a) For large N , XN (ejω ) is distributed as  C  N1 (0, N CXX (ejω )), ω 6= kπ N1 (0, N CXX (ejπ )), ω = (2k + 1)π XN (ejω ) ∼  N1 (N µX , N CXX (0)), ω = 2kπ where k is an integer.

(b) For pairwise distinct frequencies ωi , i = 1, . . . , M with 0 ≤ ωi ≤ π, the XN (ejωi ) are asymptotically independent variables. This is also valid when the ωi are approximated by 2πki /N → ωi as N → ∞.

This states that the discrete Fourier transformation provides N independent variables at frequencies ωi which are normally distributed with mean 0 or N µX and variance N CXX (ejωi ).

8.1. STATISTICAL PROPERTIES OF THE FINITE FOURIER TRANSFORM

113

2. RIf we use a window w(t), t ∈ R with w(t) = 0 for t < 0 and t > 1 that is bounded and 1 2 0 w(t) dt < ∞, then for ωi , i = 0, . . . , N − 1 XW,N (ejωi ) =

N −1 X

w(n/N )X(n)e−jωi n ,

n=0

−∞ < ω < ∞

are asymptotically independently distributed as  R1 C 2 jω  ωi 6= kπ  N1 (0, NR 0 w(t) dt CXX (e i )), 1 jωi 2 jπ XN (e ) ∼ N1 (0, N 0 w(t) dt CXX (e )), ωi = (2k + 1)π   N (N µ R 1 w(t) dt, N R 1 w(t)2 dt C (0)), ω = 2kπ i 1 XX X 0 0

The above is also valid for ωi ≃ 2πki /N . This states that for large N , the Fourier transform of windowed stationary processes are distributedR similarly as non-windowed processes, except that the variance is Rmultiplied by 1 1 the factor 0 w(t)2 dt and the mean at ωi = 2kπ is multiplied by the factor 0 w(t) dt.

3. If N = M L and X(0), . . . , X(N − 1) are divided into L segments each of length M , then XW,M (ejω , l) =

M −1 X

w(m/M )X(lM + m)e−jω(lM +m)

m=0

for l = 0, . . . , L − 1 are asymptotically independently distributed as  R1 2 C jω  ωi 6= kπ  N1 (0, M CXX (e iR) 0 w(t) dt), 1 jωi jπ 2 XW,M (e , l) ∼ N1 (0, M CXX (e ) 0 w(t) dt), ωi = (2k + 1)π   N (M µ R 1 w(t) dt, M C (0) R 1 w(t)2 dt), ω = 2kπ i 1 XX X 0 0

This states that finite Fourier transforms of data segments of size M each at ωRi are inde1 pendently normally distributed random variable with variance M · CXX (ejωi ) 0 w(t)2 dt.

4. If X(n) is Gaussian then the distributions are exact.

5. Under stronger conditions with respect to the cumulants, one can show that the Fourier √ transform has a rate of growth at most of order N log N . If X(n) is bounded by a constant, M say, then sup |XN (ejω )| ≤ M N ω

giving growth rate of order N . P P∞ 6. If Y (n) = ∞ k=−∞ h(k)X(n − k), E [X(n)] = 0 and n=−∞ (1 + |n|)|h(n)| < ∞, then with probability 1  p log N , YN (ejω ) = H(ejω )XN (ejω ) + O ! r log N jω jω jω YW,N (e ) = H(e )XW,N (e ) + O N as N → ∞

(8.1) (8.2)

114

CHAPTER 8. THE FINITE FOURIER TRANSFORM

Interpretation To 1.(a): Take a broad-band noise with a normally distributed amplitude and pass it through an ideal narrow bandpass filter,  1, |ω ± ωo | < π/N jω H(e ) = . 0, else

H(ejω ) 2π/Ν

−ω0

ω

ω0

Figure 8.1: Transfer function of an ideal bandpass filter. If X(n) is filtered, then Y (n) = Y (n) ≈

P

h(m)X(n − m). For large N

1 1 XN (2πs/N )e2πsn/N + XN (−2πs/N )e−2πsn/N N N

where 2πs/N ≈ ωo , also Y (n) is approximately N1 (0, N1 CXX (ejωo )) distributed. To 1.(b): jω ) H(e 0

Y(n) −ω0

ω0

ω

X(n)

Y(n) and U(n) are approximately independent

jω ) H(e 1

U(n) −ω1

ω1

non−overlapping frequency ranges

ω

Chapter 9

Spectrum Estimation 9.1

Use of bandpass filters

X(n) is a stationary stochastic process with covariance function cXX (n) and spectrum CXX (ejω ). Assume E [X(n)] = 0. If we transform X(n) by Y (n) = X(n)2 then   E [Y (n)] = E X(n)2 = cXX (0) =

Z

π

CXX (ejω )

−π

dω 2π

This motivates the following construction to estimate the spectrum. The bandpass filter has the transfer function  0 HB , |ω ± ω0 | ≤ B2 for |ω| ≤ π jω HB (e ) = 0, else , 0 will be specified later and ω is the center frequency of the filter. The low-pass filter where HB 0 is an FIR filter with impulse response  1 N , n = 0, 1, 2, . . . , N − 1 hl (n) = 0, otherwise ,

X(n)

V(n)

U(n) Low−pass filter

Band−pass filter

Figure 9.1:

115

Y(n)

116

CHAPTER 9. SPECTRUM ESTIMATION

HΒ (ω)

HΒ0

ω0−Β/2 ω0 ω0+Β/2

−ω0

Figure 9.2: Transfer function of an ideal band-pass filter

used to calculate the mean. We have jω

HL (e ) =

sin( ωN N −1 2 ) e−ω 2 N sin(ω/2)

The spectrum of V (n) is given by CV V (ejω ) = |HB (ejω )|2 CXX (ejω ) The output process is given by Y (n) =

N −1 X m=0

N −1 1 1 X V (n − m)2 U (n − m) = N N m=0

The mean of Y (n) is given by µY

= = = = ≈

N −1  1 X  E [Y (n)] = E V (n − m)2 = cV V (0) N m=0 Z π dω CV V (ejω ) 2π Z−π π dω |HB (ejω )|2 CXX (ejω ) 2π −π Z ω0 +B/2 dω 0 2 CXX (ejω ) 2(HB ) 2π ω0 −B/2 0 2B (HB ) CXX (ejω0 ) π

0 = 1, µ If we choose HB frequency band Y = E [Y (n)] measures the power of X(n) in thep 0 jω 0 ω − ω0 < B/2. The spectral value CXX (e ) will be measured if HB = π/B, provided B ≪ ω0 and CXX (ejω ) is continuous at ω = ω0 . Under the assumption that X(n) is Gaussian, one can show that

σY2 ≈

2π CXX (ejω0 )2 . BN

9.1. USE OF BANDPASS FILTERS

117

0 so that µ = C jω0 ). However, the power of the To measure CXX (ejω0 ), we choose HB Y XX (e 2 jω output process is proportional to CXX (e 0 ) . To measure CXX (ejω ) in a frequency band, we would need a filter bank with a cascade of squarers and integrators. In acoustics, one uses such a circuit, where the center frequencies are taken to be ωk+1 = (2ωk )1/3 , k = 1, . . . , K − 1. This is the principle of a spectral analyser, especially when it is used for analysing analog signals. Such a spectral analyser yields the spectrum at frequencies ω1 , . . . , ωk simultaneously. In measurement technology, one uses simpler systems that work with a band-pass having variable center frequency. The signal of the system output gives an estimate of the spectrum in a certain frequency band. The accuracy of the measurement depends on the slowness of changing the center frequency and the smoothness of the spectrum of X(n).

118

CHAPTER 9. SPECTRUM ESTIMATION

Chapter 10

Non-Parametric Spectrum Estimation 10.1

Consistency of Spectral Estimates

An estimate CˆXX (ejω ) of the spectrum CXX (ejω ) is a random variable. If we could design an ideal estimator for the spectrum CXX (ejω ), the pdf fCˆXX (ejω ) (α) would be a delta function at i i h h CXX (ejω ), i.e. the mean E CˆXX (ejω ) would be CXX (ejω ) and the variance Var CˆXX (ejω ) would be zero. Because it is not possible to create an ideal estimator, we try to reach this goal asymptotically as the number N of observed samples X(n), n = 0, 1, . . . , N −1 tends to infinity. We require that CˆXX (ejω ) is a consistent estimator for CXX (ejω ), i.e., CˆXX (ejω ) → CXX (ejω ) in probability as N → ∞. A sufficient condition for consistency is that the mean square error (MSE) of CˆXX (ejω ) converges to zero as N → ∞ (see 7.6.1), i.e.,  2    jω jω jω ˆ ˆ →0 as N → ∞ . M SE CXX (e ) = E CXX (e ) − CXX (e ) Equivalently, we may say that a sufficient condition for CˆXX (ejω ) to be consistent is that i h as N → ∞ Var CˆXX (ejω ) → 0

and

i  h  Bias CˆXX (ejω ) = E CˆXX (ejω ) − CXX (ejω )

→0

as

N →∞

because    2  jω jω jω ˆ ˆ = E CXX (e ) − CXX (e ) M SE CXX (e ) i h i h = E CˆXX (ejω )2 − 2E CˆXX (ejω ) · CXX (ejω ) + CXX (ejω )2 i2 i2  h  h + E CˆXX (ejω ) − E CˆXX (ejω ) i2 i2  h h i  h + E CˆXX (ejω ) − CXX (ejω ) = E CˆXX (ejω )2 − E CˆXX (ejω ) i2 i h  h . = Var CˆXX (ejω ) + Bias CˆXX (ejω ) 119

120

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

i h If E CˆXX (ejω ) 6= CXX (ejω ) the estimator has a bias which results in a systematic error of the i h method. On the other hand if Var CˆXX (ejω ) 6= 0, we get a random error in the estimator, i.e., the estimator varies from data set to data set.

10.2

The Periodogram

Suppose X(n), n = 0, ±1, . . . is P stationary with mean µX and cXX (k) = Cov [X(n + k), X(n)], −jωk , for −π < ω ≤ π is uniformly continuous i.e. the spectrum CXX (ejω ) = ∞ k=−∞ cXX (k)e and bounded. As seen in Chapter 8, the finite Fourier transform satisfies the following:  C  N1 (0, N CXX (ejω )), ω 6= kπ jω N1 (0, N CXX (ejπ )), ω = (2k + 1)π XN (e ) ∼  N1 (N µX , N CXX (0)), ω = 2kπ

N (ejω ) as an estimator for the spectrum C jω which suggests 1 the so called Periodogram IXX XX (e ), i.e., N IXX (ejω ) =

=

1 |XN (ejω )|2 N 2 N −1 1 X X(n)e−jωn . N

(10.1)

n=0

N (ejω ) has the same symmetry, non-negativity and periodicity properties as C jω Note that IXX XX (e ). The periodogram was introduced by Schuster (1898) as a tool for the identification of hidden periodicities. Indeed, in the case of

X(n) =

J X

aj cos(ωj n + φj ),

j=1

n = 0, . . . , N − 1

N (ejω ) has peaks at frequencies ω = ±ω (mod 2π), j = 1, . . . , J, provided N is large. IXX j

10.2.1

Mean of the Periodogram

The mean of the periodogram can be calculated as follows: ` ´˜ ˆ ` ´ = N CXX (ejω ). From the asymptotic distribution of XN ejω we note that Var XN ejω i h h i ˛ ˛ ´˜ ´˛ ` ` ˆ ´˛ ` 2 2 1 Var XN ejω = N1 E ˛XN ejω ˛ = E N1 ˛XN ejω ˛ . N 1

Thus

10.2. THE PERIODOGRAM

 N  E IXX (ω) = =

=

=

=

= =

121

  1 · E XN (ejω ) · XN (ejω )∗ N # "N −1 N −1 X X 1 X(m)∗ ejωm X(n)e−jωn ·E N 1 N 1 N 1 N 1 N

n=0 N −1 N −1 X X

(10.2)

m=0

E [X(n)X(m)∗ ] e−jω(n−m)

n=0 m=0 −1 N −1 N X X

(cXX (n − m) + µ2X )e−jω(n−m)

n=0 m=0 N −1 N −1 X X n=0 m=0 N −1 N −1 X X

cXX (n − m)e−jω(n−m) + −jω(n−m)

e

Z

π

1 N jω 2 2 |∆ (e )| µX N

CXX (ejλ )ejλ(n−m)

−π

1 dλ + |∆N (ejω )|2 µ2X 2π N

Zn=0 m=0 1 π dλ 1 |∆N (ej(ω−λ) )|2 CXX (ejλ ) + |∆N (ejω )|2 µ2X N −π 2π N

where ∆N (ejω ) =

N −1 X n=0

e−jωn = e−jω

N −1 2

·

sin( ωN 2 ) ω sin( 2 )

is the Fourier transform of a rectangular window. Thus, for a zero mean stationary process N (ejω ), E I N (ejω ) is equal to the spectrum X(n), the expected value of the periodogram IXX XX jω CXX (e ) convolved with the magnitude squared of the Fourier transform of the rectangular win N  P dow, |∆N (ejω )|2 . This suggests that P limN →∞ E IXX (ejω ) = CXX (ejω )+µ2X ·2π k δ(ω +2πk), because limN →∞ N1 |∆N (ejω )|2 = 2π k δ(ω + 2πk), as illustrated in Fig. 10.1.

Therefore the periodogram estimate is asymptotically unbiased for ω 6= 2πk, k ∈ Z. We may find an asymptotic unbiased estimator for CXX (ejω ) ∀ ω by subtracting the sample mean of the P jω ) = 1 ∞ N −jωn 2 (e data, before computing the periodogram, i.e., IX−µ (X(n) − µ )e X n=−∞ ,X−µ N X X is asymptotically unbiased.

In (10.2) we calculated the mean of the periodogram of a rectangular windowed data sequence X(n), n = 0, . . . , N −1. Advantages of using different window types are stated in the filter design theory part (chapter 4). We now consider periodograms of sequences X(n), n = 0, . . . , N − 1, multiplied by a real-valued window (also taper) wN (n) of length N , having a Fourier transform WN (ejω ). Then,  N  jω E IW = X,W X (e )

N −1 N −1 1 X X wN (n)wN (n)E [X(n)X(m)∗ ] e−jω(n−m) N

=

1 N

=

1 N

n=0 m=0 N −1 N −1 X X

wN (n)wN (n)cXX (n − m)e−jω(n−m) +

(10.3)

1 |WN (ejω )|2 µ2X N

m=0 Zn=0 π dλ 1 |WN (ej(ω−λ) )|2 CXX (ejλ ) + |WN (ejω )|2 µ2X 2π N −π

122

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

20 1 N jω 2 N |∆ (e )|

N = 20

Window function

15

10

N = 10

5

N =5

0 −1

−0.5

0 Frequency

Figure 10.1: The window function

1 N jω 2 N |∆ (e )|

ω 2π

0.5

1

for N = 20, N = 10 and N = 5.

Comparing this result with the one for rectangular windowed data, we observe the following: If CXX (ejα ) has a significant peak at α in the neighborhood of ω, the expected value of N (ejω ) and I N jω jω IXX W X,W X (e ) can differ substantially from CXX (e ). The advantage of employing a taper is now apparent. It can be considered as a means to reducing the effect of neighboring peaks.

10.2.2

Variance of the Periodogram

To find the variance of the periodogram we consider the covariance function of the periodogram. For simplicity, let us assume that X(n) is a real white Gaussian process with variance σ 2 and absolute summable covariance function cXX (n) and ω, λ 6= 2kπ with k ∈ Z. Then, h i h i i  N  h N N N N N Cov IXX (ejω ), IXX (ejλ ) = E IXX (ejω )IXX (ejλ ) − E IXX (ejω ) E IXX (ejλ )  1  N j(ω+λ) 2 N j(ω−λ) 2 = |∆ (e )| + |∆ (e )| σ4 . N2 Using the characteristic function, this can be proven to equal

E [X1 X2 X3 X4 ] = E [X1 X2 ] E [X3 X4 ] + E [X1 X3 ] E [X2 X4 ] + E [X1 X4 ] E [X2 X3 ] .

10.2. THE PERIODOGRAM

123

If the process X(n) is non-white and non-Gaussian the covariance function can be found in [Brillinger 1981] as h i N N Cov IXX (ejω ), IXX (ejλ ) ≈ (10.4)   1 CXX (ejω )2 · 2 |∆N (ej(ω+λ) )|2 + |∆N (ej(ω−λ) )|2 , N

which reduces to the variance of the periodogram to

 N   1 Var IXX (ejω ) ≈ CXX (ejω )2 · 2 |∆N (ej2ω )|2 + N 2 . N

N (ejω ) will tend to remain at the This suggests that no matter how large N is, the variance of IXX jω 2 level CXX (e ) . Because the variance does not tend to zero for N → ∞, according to Section 10.1, the periodogram is not a consistent estimator. If an estimator with a variance smaller than this is desired, it cannot be obtained by simply increasing the sample length N .

The properties of the finite Fourier transform suggest that periodogram ordinates asymptotically (for large N ) are independent random variates, which explains the irregularity of the periodogram in Figure 10.2. From the asymptotic distribution of the finite Fourier transform, 16

14

12

10

8

6

4

2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Figure 10.2: The periodogram estimate of the spectrum (solid line) and the true spectrum (dotted ω while the ordinate shows the estimate. line). The abscissa displays the frequency 2π we can deduce the asymptotic distribution of periodogram ordinates as ( CXX (ejω ) 2 χ2 , ω 6= πk N jω 2 k∈Z . IXX (e ) ∼ CXX (ejω )χ21 , ω = πk

124

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

10.3

Averaging Periodograms - Bartlett Estimator

One way to reduce the variance of an estimator is to average estimators obtained from independent measurements. If we assume that the covariance function cXX (k) of the process X(n) is finite or can be viewed as finite with a small error and the length of cXX (k) is much smaller than the length of the data stretch N , then we can divide the process X(n) of length N into L parts with length M each, where M ≪ N . With this assumption we can calculate L nearly independent M (ejω , l) l = 0, . . . , L − 1 of sequences periodograms IXX Xl (n) = X(n + l · M ) ,

n = 0, . . . , N − 1

The periodograms are written as 1 M IXX (ejω , l) = M and the estimator as

2 M −1 X Xl (n)e−jωn

(10.5)

n=0

L−1 1 X M jω B jω ˆ CXX (e ) = IXX (e , l) . L l=0

This method of estimating the spectral density function CXX (ejω ) is called Bartlett method due to its invention by Bartlett in 1953. B (ejω ) can be easily obtained as The mean of CˆXX

" L−1 # i h X 1 B jω M jω E CˆXX (e ) = E IXX (e , l) L l=0 L−1 X

 M jω  1 E IXX (e , l) L l=0  M jω  = E IXX (e ) =

which is the same as the mean of the periodogram of the process X(n) for n = 0, . . . , M − 1. From Section 10.2.1, it can be seen that Bartlett’s estimator is an unbiased estimator of the spectrum for M → ∞ because the periodogram is asymptotically unbiased. To check the performance of the estimator we also have to derive the variance of the estimator.

10.3. AVERAGING PERIODOGRAMS - BARTLETT ESTIMATOR

125

16

14

12

10

8

6

4

2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Figure 10.3: The periodogram as an estimate of the spectrum using 4 non-overlapping segments. ω while the ordinate shows the estimate. The abscissa displays the frequency 2π

The variance of Bartlett’s estimator can be calculated as " # L−1 i h X X L−1 1 B M M (ejω ) = E Var CˆXX IXX (ejω , k)IXX (ejω , m) − · L2 =

= = =

1 · L2

k=0 m=0 L−1 X   M E IXX (ejω , p)2 p=0

+

1 · L2

L−1 X L−1 X

k=0 m=0 m6=k

#!2 L−1 1 X M jω IXX (e , l) E · L "

l=0

 M jω   M jω  E IXX (e , k) E IXX (e , m)

 M jω 2 − E IXX (e )

L−1  M jω 2 1 X  M jω 2  L · (L − 1)  M jω 2 E IXX (e , p) + · E IXX (e ) − E IXX (e ) 2 2 L L p=0  M jω 2 i 1 h  M jω 2  · E IXX (e ) − E IXX (e ) L  M jω  1 · Var IXX (e ) . (10.6) L

Note that we can proceed similarly as above with a windowed periodogram. We would then consider L−1 1 X M B jω IW X,W X (ejω , l) CˆW (e ) = X,W X L l=0

which has similar properties as the above. The variance of the Bartlett estimator is equivalent to the variance of the periodogram diB (ejω ) decreases vided by the number of data segments L. So the variance of the estimator CˆXX

126

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

with an increasing L. With this result and according to the distribution of the periodogram, Bartlett’s estimator is asymptotically ( CXX (ejω ) 2 χ2L , ω 6= πk B jω 2L ˆ CXX (e ) ∼ k∈Z . CXX (ejω ) 2 χL , ω = πk L distributed. This is due to the fact that the sum of L independent χ22 distributed variables is χ22L distributed.

10.4

Windowing Periodograms - Welch Estimator

Another method of averaging periodograms was developed by Welch in 1967. Welch derived a modified version of the Bartlett method called Welch’s method, which allows the data segments to overlap. Additionally, the segments are multiplied by a window function which affects the bias of the estimator. For each periodogram l, l = 0, . . . , L − 1 the data sequence is now Xl (n) = X(n + l · D) , where l · D is the starting point of the lth sequence. According to this, the periodograms are defined similarly to Equation (10.5), except for a multiplication by the window wM (n), i.e., 1 M jω IW X,W X (e , l) = M Welch defined his estimator as W CˆXX (ejω ) =

2 M −1 X Xl (n) · wM (n) · e−jωn . n=0

L−1 1 X M IW X,W X (ejω , l) L·A l=0

with a normalization factor A to get an unbiased estimation. If we set A = 1 and wM (n) as a rectangular window of length M and D = M , the Welch method is equivalent to the Bartlett method. Thus, the Bartlett method is a special case of the Welch method. To calculate the normalization factor A we have to derive the mean of the estimator. # " L−1 i h X 1 M jω W IW · (ejω ) = E E CˆXX X,W X (e , l) L·A = = From (10.3) we get E

h

i

W CˆXX (ejω )

1 · L·A

l=0 L−1 X l=0

 M  jω E IW X,W X (e , l)

 1  M E IW X,W X (ejω ) A

  Z π 1 1 1 j(ω−λ) 2 jλ 2 jω 2 |WM (e )| CXX (e )d λ + µX · |WM (e )| . · = · A M 2π −π

10.5. SMOOTHING THE PERIODOGRAM

127

If we assume that M is large, the energy of WM (ejω ) is concentrated around ω = 0. Thus, we can state that approximately   Z π i h 1 1 1 W jω jω jλ 2 2 jω 2 ˆ E CXX (e ) ≈ · · CXX (e ) |WM (e )| d λ + µX · |WM (e )| . A M 2π −π Using Parseval’s theorem, 1 2π

Z

π

−π

jλ 2

|WM (e )| dλ =

M −1 X n=0

|wM (n)|2 ,

we rewrite the above as E

h

i

W CˆXX (ejω )

# " M −1 X 1 1 2 2 jω 2 jω ≈ · |wM (n)| + µX · |WM (e )| . · CXX (e ) A M n=0

To get an asymptotically unbiased estimator for ω 6= 2kπ, k ∈ Z, A has to be chosen as A=

M −1 1 X |wM (n)|2 . · M n=0

With this and M → ∞, the Welch estimator is unbiased for ω 6= 2kπ. For M → ∞ the variance of the Welch method for non-overlapping data segments with D = M is asymptotically the same as for the Bartlett method (unwindowed data) i h W (ejω ) Var CˆXX



M →∞, D=M

i h 1 B (ejω ) ≈ CXX (ejω )2 . Var CˆXX L

In the case of 50% overlap (D = M 2 ) Welch calculated the asymptotic variance of the estimator to be i h 9 1 W (ejω ) ≈ Var CˆXX · · CXX (ejω )2 , M 8 L M →∞, D= 2 which is higher than the variance of Bartlett’s estimator. The advantage of using Welch’s estimator with overlap is that one can use more segments for the same amount of data. This reduces the variance of the estimate while having a bias which is comparable to the one obtained by the Bartlett estimator. For example, using a 50% overlap leads to doubling the number of segments, thus a reduction of the variance by the factor 2. Additionally, the shape of the window only affects the bias of the estimate, not the variance. So even for non-overlapping segments, the selection of suitable window functions can result in an estimator of reduced bias. These advantages makes Welch’s estimator one of the most used non-parametric spectrum estimators. c The method is also implemented in the ”pwelch” function of MATLAB.

10.5

Smoothing the Periodogram

The asymptotic independence of the finite Fourier transform at discrete frequencies (see propN (ejω1 ), . . . , I N (ejωL ) for erties 1b in section 8.1) suggests that the periodogram ordinates IXX XX 0 ≤ ω1 < ω2 < . . . < ωL ≤ π are asymptotically independent. These frequencies may be discrete

128

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

frequencies ωk = 2πk N , k ∈ Z in the neighborhood of a value of ω at which we wish to estimate CXX (ejω ). This motivates the estimate S CˆXX (ejω ) =

m   2π X 1 N j N [k(ω)+k] IXX e 2m + 1 k=−m

jω where 2π N ·k(ω) is the nearest frequency to ω at which the spectrum CXX (e ) is to be estimated. Given the above property, we can expect to reduce the variance of the estimation by averagjω N (ejω ) ∼ CXX (e ) χ2 ing over adjacent periodogram ordinates. While for ω 6= kπ, k ∈ Z IXX 2 2 jω

N (ejω ) for ω = πk is not CXX (e ) χ2 distributed but C jω 2 distributed, we note that IXX XX (e )χ1 2 2 distributed. Therefore we exclude these values in the smoothing, which means

k(ω) + k 6= l · N,

l ∈ Z.

Because the spectrum CXX (ejω ) of a real process X(n) is of even symmetry and periodic in 2π N (ejω ) for 0 ≤ ω < π. This results it is natural to calculate only values of the periodogram IXX in the estimation of CXX (ejω ) for ω = 2πl, l ∈ Z, by " −1 # m X X 2πk 2πk 1 N N S IXX (ej (ω+ N ) ) IXX (ej (ω+ N ) ) + CˆXX (ejω ) = 2m =

1 m

k=−m m X 2πk N IXX (ej (ω+ N ) ) k=1

k=1

[ω = 2lπ] or [ω = (2l + 1)π and N even]

which is the same as for the estimation of ω = (2l + 1)π when N is even. For the estimation at frequency ω = (2l + 1)π when N is odd we use S CˆXX (ejω ) =

m i π 2kπ π 2kπ 1 Xh N N IXX (ej (ω− N + N ) ) + IXX (ej (ω+ N − N ) ) 2m k=1

=

m π 2kπ 1 X N IXX (ej (ω− N + N ) ) m

ω = (2l + 1)π and N odd

k=1

which implies that the variance of the estimator at frequencies ω = lπ, l ∈ Z will be higher than at other frequencies. In the sequel, we will only consider frequencies ω 6= lπ, but the method is similar to frequencies ω = lπ. The smoothing method is nothing else but sampling the periodogram at discrete frequencies with the 2π N spacing and averaging. Thus, the method can be easily implemented using the FFT which makes it computational efficient. The mean of the smoothed periodogram is given by i h S (ejω ) = E CˆXX

m i h  2π X 1 N E IXX ej N (k(ω)+k) . 2m + 1 k=−m

In the following calculation we will assume that

2πk(ω) N

= ω. Using the formula for the mean of

10.5. SMOOTHING THE PERIODOGRAM

129

the periodogram (10.2), we obtain Z m i h   dλ X 2πk 1 1 π S jω ˆ |∆N (ej(ω+ N −λ) )|2 CXX ejλ E CXX (e ) = 2m + 1 N −π 2π

(10.7)

k=−m

1 + |∆N (ejω )|2 µ2X N Z π m   dλ 2πk 1 X 1 1 |∆N (ej(ω+ N −λ) )|2 CXX ejλ · · + |∆N (ejω )|2 µ2X = 2π N −π 2m + 1 N k=−m | {z } Am (ej(ω−λ) )

=

Z

π



  dλ 1 Am ej(ω−λ) CXX ejλ + |∆N (ejω )|2 µ2X . 2π N −π 

Examining the shape of Am (ejω ) in Figure 10.4 for a fixed value of N , and different values of m, we can describe Am (ejω ) as having approximately a rectangular shape. A comparison of 15

5

Am (ejω )|N =15, m=1

Am (ejω )|N =15, m=0

4.5 4

3.5

10

3

2.5

5

2

1.5 1

0.5

0 0

0.2

0.4

ω π

0.6

0.8

0 0

1

2.5

0.2

0.4

ω π

0.6

0.8

1

0.2

0.4

ω π

0.6

0.8

1

1.4

Am (ejω )|N =15, m=5

Am (ejω )|N =15, m=3

1.2 2

1.5

1

0.8 0.6

1

0.4

0.5

0.2 0 0

0.2

0.4

ω π

0.6

0.8

1

0 0

Figure 10.4: Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. the mean of the periodogram (10.2) and the mean of the smoothed periodogram (10.7) suggests S (ejω ) will most often be greater than the bias of I N (ejω ) as the integral that the bias of CˆXX XX extends over a greater range. Both will be the same if the spectrum CXX (ejω ) is flat, however S (ejω ) is unbiased as N → ∞. CˆXX

130

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION S (ejω ) is similar to the calculation in (10.6) and results The calculation of the variance for CˆXX

in

i h S (ejω ) = Var CˆXX

 N  1 Var IXX (ejω ) . 2m + 1 S (ejω ) is plotted for different This can be confirmed from Figure 10.5 where the estimate CˆXX values of m. 14

10

12 8

10

6

8 6

4

4 2

2 0

0.1

0.2

0.3

0.4

0.5

0

0.1

0.2

0.3

0.4

0.5

0

0.1

0.2

0.3

0.4

0.5

10 8

8

6

6 4

4

2

2

0

0.1

0.2

0.3

0.4

0.5

Figure 10.5: The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to ω while the ordinate shows the estimate. bottom right). The abscissa displays the frequency 2π With this result and the asymptotic behavior of the periodogram ordinates, we find the S (ejω ) as asymptotic distribution of CˆXX  (ejω ) 2  CXX 4m+2 χ4m+2 , ω 6= πk S jω CˆXX (e ) ∼ k∈Z .  CXX (ejω ) 2 χ , ω = πk 2m 2m

10.6

A General Class of Spectral Estimates

In section 10.5 we considered an estimator which utilised the independence property of the periodogram ordinates. The smoothed periodogram weighted equally all periodogram ordinates

10.6. A GENERAL CLASS OF SPECTRAL ESTIMATES

131

near ω, which, for a nearly constant spectrum is reasonable. However, if the spectrum CXX (ejω ) varies, this method creates a larger bias. In this case it is more appropriate to weigh periodogram ordinates near ω more than those further away. Analogously to Section 10.5, we construct the estimator as m  2π  1 X N SW jω ˆ Wk IXX ej N [k(ω)+k] CXX (e ) = A

k(ω) + k 6= l · N ω 6= l · π

k=−m

l ∈ Z,

(10.8)

where A is a constant to be chosen for an unbiased estimator. The case where ω = lπ can be easily derived from Section 10.5. For further investigations of the estimator in (10.8) we will only consider the case where ω 6= l · π, l ∈ Z. SW (ejω ) can be calculated with the assumption that The mean of the estimator CˆXX

by

2πk(ω) N



m i h h  i 2πk 1 X SW jω N ˆ E CXX (e ) = Wk E IXX ej [ω+ N ] . A k=−m

SW (ejω ) we set To get an asymptotically unbiased estimator for CˆXX

A=

m X

Wk

k=−m

 N  because E IXX (ejω ) = CXX (ejω ) as N → ∞. The mean of the estimator can be derived as m   dλ 2πk 1 X Wk |∆N (ej(ω+ N −λ) )|2 CXX ejλ 2π −π AN k=−m | {z }

Z i SW jω ˆ E CXX (e ) = h

π

ASW (ej(ω−λ) )

1 + |∆N (ejω )|2 µ2X ZN   dλ π 1 = ASW (ej(ω−λ) )CXX ejλ + |∆N (ejω )|2 µ2X 2π N −π where jω

ASW e



m 2πk 1 X Wk |∆N (ej(ω+ N ) )|2 . = Pm · l=−m Wl N

1

k=−m

This expression is very similar to the function Am (ejω ) from Equation (10.7). Indeed if we set 1 ∀ k, it follows automatically that ASW (ejω ) = Am (ejω ), i.e. Am (ejω ) is a special case Wk = 2m+1 of ASW (ejω ). However with the weights Wk we have more control over the shape of ASW (ejω ) which also leads to a better control of the bias of the estimator itself. The effect of different weightings on the shape of ASW (ejω ) can be seen in Figure 10.6.

132

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION 1.8

3

1.6 2.5 1.4

ASW (ejω )

ASW (ejω )

1.2 1

2

1.5

0.8 0.6

1

0.4 0.5 0.2 0 0

0.2

0.4

ω π

0.6

0.8

0 0

1

3.5

3.5

3

3

ASW (ejω )

4

ASW (ejω )

4

2.5

0.4

ω π

0.6

0.8

1

0.2

0.4

ω π

0.6

0.8

1

2.5

2

1.5

2

1.5

1

1

0.5

0.5

0 0

0.2

0.2

0.4

ω π

0.6

0.8

0 0

1

Figure 10.6: Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window.

SW (ejω ) is given by The variance of the estimator CˆXX

i2 i h i  h h SW jω 2 SW jω SW jω (e ) − E CˆXX (e ) (e ) = E CˆXX Var CˆXX =

m m h  2π  i  2π 1 X X N j N [k(ω)+k] N j N [k(ω)+l] W W E I e I e k l XX XX A2

− =

1 A2

i2  h  2π N ej N [k(ω)] Wq Wµ E IXX

q=−m µ=−m  m 1 X 2 N W E IXX A2 p=−m p

− =

k=−m l=−m m m X X

1 A2

m X

m X



[k(ω)+p] j 2π N

e

2 

m m i2  h  2π 1 X X N + 2 Wk Wl E IXX ej N [k(ω)] A k=−m l=−m

 h  2π i2 N ej N [k(ω)] Wq Wµ E IXX

q=−m µ=−m m h 1 X 2 N W Var IXX A2 p=−m p





ej N [k(ω)]

i

.

l6=k

10.7. THE LOG-SPECTRUM With

m X

k=−m

Wk2

=

133

m  X

k=−m

Wk −

Pm

i=−m Wi

2m + 1



+

Pm

k=−m Wk

2m + 1

2

≥0

(10.9)

SW (ejω ). From we are able to calculate the weighting coefficients Wk for the lowest variance of CˆXX (10.9) it is easy to see that if Pm Wi A Wk = i=−m = , k = 0, ±1, . . . , ±m , 2m + 1 2m + 1 SW (ejω ). Because this result is nothing we reach the lowest possible variance for the estimator CˆXX S (ejω ) has the largest other than the smoothed periodogram (see section 10.5) we see that CˆXX bias for a fixed N but the lowest variance of the general class of spectrum estimators covered in this section.

10.7

The Log-Spectrum

Instead of estimating the spectrum directly, it is possible to estimate 10 log CXX (ejω ) from 10 log CˆXX (ejω ). The log-function can be used as a variance stabilizing technique [Priestley 2001]. We then find h i  E 10 log CˆXX ejω ≈ 10 log CXX ejω h i (10 log e)2 Var 10 log CˆXX ejω ≈ C

where C is a constant that controls the variance. If we use Bartlett’s estimator, for example, C would be equal to L.

The above equations show that the variance is approximately a constant and does not depend on CXX (ejω ).

10.8

Estimation of the Spectrum using the Sample Covariance

The covariance function cXX (n) is defined in Chapter 7 as cXX (n) = E [(X(k + n) − µX )(X(k) − µX )∗ ] . Given X(0), . . . , X(N − 1) we can estimate cXX (k) by    P   1 N −1−|n| ¯ X(k) − X ¯ ∗, 0 ≤ n ≤ N −1 X(k + |n|) − X N cˆXX (n) = k=0  0, otherwise

¯ = where X

1 N

NP −1

(10.10)

X(k) is an estimate of the mean µX = E [X(k)]. As it can be seen from

k=0

(10.10) the estimator of the covariance function cˆXX (n) is only defined for n ≥ 0. It is obvious to calculate estimates for n < 0 via the symmetry property of the covariance function cXX (n) = cXX (−n)∗ .

134

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

P Equation (10.10) is a computationally efficient implementation for cˆXX (k) = N1 ∞ k=−∞ (X(k + ¯ ¯ ∗ if X(n) is zero outside the interval n = 0, . . . , N − 1. It can be proven that n) − X)(X(k) − X) cˆXX (n) is an asymptotically consistent estimator of the covariance function cXX (n). Motivated by the fact that the spectrum CXX (ejω ) is the Fourier transform of the covariance function cXX (n) it is intuitive to suggest an estimator of CXX (ejω ) as CˆXX (ejω ) =

∞ X

cˆXX (n)e−jωn

n=−∞

=

=

1 N 1 N

N −1 X

∞ X

¯ ¯ ∗ · e−jωn [X(l + n) − X][X(l) − X]

n=−(N −1) l=−∞ N −1 X

∞ X

¯ ¯ ∗ · e−jωn [X(n − l) − X][X(−l) − X]

n=−(N −1) l=−∞

which is equivalent to N jω CˆXX (ejω ) = IX− ¯ ¯ (e ) X,X− X

(10.11)

and should be shown by the reader as an exercise. The derivation in (10.11) shows that taking the Fourier transform of the sample covariance function is equivalent to computing the periodogram as in (10.1). This is therefore not a consistent method for estimating the spectrum.

10.8.1

Blackman-Tukey Method

In 1958 Blackman and Tukey developed a spectral density estimator, here referred to the Blackman-Tukey method, which is based on the sample covariance function. The basic idea of the method is to estimate the sample covariance function cˆXX (k) and window it with a function w2M −1 (k) of the following form  w2M −1 (−k) , for |k| ≤ M − 1 with M ≪ N w2M −1 (k) = . 0, otherwise This condition ensures that the Fourier transform of the window is real-valued and that the BT (ejω ) will not be complex. We then get estimator CˆXX BT CˆXX (ejω ) =

M −1 X

n=−(M −1)

cˆXX (n) · w2M −1 (n) · e−jωn .

Using (10.11) this is equivalent to BT CˆXX (ejω )

=



−π

  N jλ j(ω−λ) d λ IX− (e ) · W e 2M −1 ¯ ¯ X,X− X 2π

(10.12)

N jω which is the convolution of the Periodogram IX− ¯ ¯ (e ) with the Fourier transform of the X,X− X window w2M −1 (n). From (10.12) we get another constraint for the window

W2M −1 (ejω ) ≥ 0

∀ω ,

10.9. CROSS-SPECTRUM ESTIMATION

135

BT (ejω ) is non-negative for all ω. to ensure the estimate CˆXX BT (ejω ) can be easily calculated as The mean of the estimator CˆXX i h i h BT N jω (ejω ) = E IX− (e )

⋆ W2M −1 (ejω ) E CˆXX ¯ ¯ X,X−X

which, according to Section 10.2.1, is given by i h BT (ejω ) = E CˆXX =

1 N jω 2 |∆ (e )| ⋆ W2M −1 (ejω ) ⋆ CXX (ejω ) N Zπ Z∞ 1 N jϑ 2 1 1 |∆ (e )| · W2M −1 (ej(ω−θ−ϑ) )d ϑd θ . CXX (ejθ ) 2π 2π N −π

−∞

N jω 2 ∆ (e ) = 2π P δ(ω + 2πk), we find k Z ∞     1 1 N jϑ 2 lim |∆ (e )| · W2M −1 ej(ω−θ−ϑ) d ϑ = W2M −1 ej(ω−θ) N →∞ 2π −∞ N

Since limN →∞

1 N

which reduces the term for the mean of the estimator to a simple convolution Z π   i h 1 BT jω ˆ CXX (ejθ ) · W2M −1 ej(ω−θ) d θ . E CXX (e ) = 2π −π

(10.13)

Therefore the Blackman-Tukey estimator is unbiased for M → ∞. The variance of the estimator is given from Eq. (10.12) as Z i h 1 π BT (ejω ) ≈ CXX (ejω )2 · Var CˆXX W2M −1 (ejω )2 d ω N −π ≈ CXX (ejω )2 ·

1 N

M −1 X

n=−(M −1)

w2M −1 (n)2 .

Based on this approximation the variance of the Blackman Tukey method tends to zero if the window function w2M −1 (n) has finite energy.

10.9

Cross-Spectrum Estimation

If X(n) and Y (n) are jointly stationary and real, the cross-covariance function is defined as cY X (n) = E [(Y (n + k) − E [Y (n)])(X(k) − E [X(k)])] cY X (n) = E [Y (n + k)X(k)] − E [Y (n)] E [X(n)]

The cross-spectrum is defined by jω

CY X (e ) =

∞ X

n=−∞

cY X (n)e−jωn = CXY (ejω )∗ = CXY (e−jω ) .

136

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

Given X(0), . . . , X(N − 1), Y (0), . . . , Y (N − 1), the objective is to estimate the cross-spectrum CY X (ejω ) for −π < ω < π. The finite Fourier transforms are given by YN (ejω ) = XN (ejω ) =

N −1 X

n=0 N −1 X

Y (n)e−jωn X(n)e−jωn

n=0

XN (ejω ) and YN (ejω ) are for large N normally distributed random variables as discussed above. At distinct frequencies 0 < ω1 < ω2 < . . . < ωN < π the random variables XN (ejω ) and YN (ejω ) are independent. Considering L data stretches of the series X(0), . . . , X(N − 1) and Y (0), . . . , Y (N − 1), so that M L = N , jω

YN (e , l) = XN (ejω , l) =

M −1 X

m=0 M −1 X m=0

Y ([l − 1]M + m) e−jωm

l = 1, . . . , L and

X ([l − 1]M + m) e−jωm

l = 1, . . . , L

are for large M stochastically independent for 0 < ω ≤ π. Defining V(ejω ) = ℜXN (ejω ), ℜYN (ejω ), ℑXN (ejω ), ℑYN (ejω )

T

, we have

 CXX (ejω ) ℜCY X (ejω ) 0 ℑCY X (ejω )    N  ℜCY X (ejω ) CY Y (ejω ) −ℑCY X (ejω ) 0 .  E V(ejω )V(ejω )H = jω jω jω  0 ℑCY X (e ) CXX (e ) ℜCY X (e )  2 −ℑCY X (ejω ) 0 ℜCY X (ejω ) CY Y (ejω ) 

10.9.1

The Cross-Periodogram

The cross-periodogram is defined as IYNX (ejω ) =

1 YN (ejω )XN (ejω )∗ N

It can be shown that

Also, we have

  E IYNX (ejω ) = CY X (ejω )

and

  E YN (ejω )XN (ejω ) = 0   Var IYNX (ejω ) = CY Y (ejω )CXX (ejω )

10.9. CROSS-SPECTRUM ESTIMATION

137

Because CY X (ejω ) 2 ≤ CY Y (ejω )CXX (ejω )

2 The variance of IYNX (ejω ) is never smaller than CY X (ejω ) for any N .

To construct estimates for CY X (ejω ) with lower variance, we use the properties given above. We calculate L

1X 1 CˆY X (ejω ) = YM (ejω , l)XM (ejω , l)∗ L M l=1

and we have (for large M ) i h E CˆY X (ejω ) ≈ CY X (ejω ) i h 1 Var CˆY X (ejω ) ≈ CY Y (ejω )CXX (ejω ) L

Also we can smooth the cross-periodogram over neighboring frequencies and get similar results.

138

CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

Chapter 11

Parametric Spectrum Estimation 11.1

Spectrum Estimation using an Autoregressive Process

Z(n)

X(n)

h(n)

Figure 11.1: Linear and time-invariant filtering of white noise Consider the system of Figure 11.1 with output X(n) such that X(n) +

p X k=1

ak X(n − k) = Z(n)

where E [Z(n)] = 0, E [Z(n)Z(k)] = σZ2 δ(n−k) and ap 6= 0. It is assumed that the filter with unit impulse response h(n) is stable. The process X(n) is then called an autoregressive process of order p or in short notation AR(p). The recursive filter possesses the following transfer function H(ejω ) =

1 1+

Pp

−jωk k=1 ak e

Because we have assumed the filter to be stable, the roots of the characteristic equation zp +

p X

ak z p−k = 0,

z = ejω

k=1

lie in the unit disc, i.e. |zl | < 1 for all l roots of the above equation. The spectrum is given as CXX (ejω ) = |H(ejω )|2 CZZ (ejω ) thus, CXX (ejω ) =

|1 +

σZ2 −jωk |2 k=1 ak e

Pp

139

140

CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION

The objective is to estimate CXX (ejω ) based on observations of the process X(n) for n = 0, . . . , N − 1. Multiplying X(n) +

p X k=1

ak X(n − k) = Z(n)

by X(n − l) (from the right), l ≥ 0 and taking expectation, leads to E [X(n)X(n − l)] + Thus, cXX (l) +

p X k=1

p X k=1

ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]

ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]

Assuming the recursive filter to be causal, i.e. h(n) = 0 for n < 0 leads to the relationship X(n) =

∞ X k=0

h(k)Z(n − k)

= h(0)Z(n) +

∞ X

h(k)Z(n − k)

∞ X

h(k)Z(n − l − k)

k=1

Thus, shifting X(n) by l gives X(n − l) = h(0)Z(n − l) +

k=1

Taking the expected value of the product Z(n)X(n − l) leads to E [Z(n)X(n − l)] = h(0)cZZ (l) +

∞ X

h(k)cZZ (l + k)

k=1

|

{z

=0

E [Z(n)X(n − l)] = h(0)cZZ (l) = h(0)σZ2 δ(l)

}

Assuming h(0) = 1, leads to the so called Yule-Walker equations  2 p X σZ , l = 0 ak cXX (l − k) = cXX (l) + 0, l = 1, 2, . . . k=1

The difference equation in cXX (l) has a similar form to the system describing difference equation in X(n). Having observed X(n), one can estimate cXX (n). Using cXX (0), . . . , cXX (p), one would get a unique solution for a1 , . . . , ap . The solution is unique if the inverse matrix with the elements cXX (l − k) exists, which is guaranteed if the filter is stable. Given estimates a ˆ1 , . . . , a ˆp of a1 , . . . , ap , one can use the Yule-Walker equations for l = 0 to get σ ˆZ2 . Then it is straightforward to find an estimate for CXX (ejω ) given by CˆXX (ejω ) =

|1 +

σ ˆZ2 ˆk e−jωk |2 k=1 a

Pp

11.2. SPECTRUM ESTIMATION USING A MOVING AVERAGE PROCESS

11.2

141

Spectrum Estimation using a Moving Average Process

Consider a transversal filter with impulse response h(n), input Z(n) and output X(n) ∈ R as shown in Fig. 11.2: X(n) =

q X k=0

h(k)Z(n − k) = Z(n) +

q X k=1

h(k)Z(n − k) .

If Z(n) is a white noise process, i.e., E [Z(n)] = 0 and cZZ (k) = σZ2 δ(k), then X(n) is a Moving Average (MA) Process of order q, MA(q). Z(n) z −1

z −1

z −1

h(1)

h(2)

h(q) X(n)

Figure 11.2: Filter model for a moving average process The spectrum of X(n) can be calculated as 2 CXX (ejω ) = H(ejω ) σZ2 .

The covariance function of X(n) is

cXX (n) = E [X(n + k)X(k)] ! " q X h(r)Z(n + k − r) = E =

q X

r=0 q X

r=0 m=0

cXX (n) =

 

m=0

!#

h(m)Z(k − m)

h(r)h(m)σZ2 δ(n + m − r)

σZ2



q X

q−|n| P m=0

h(m)h(m + |n|), 0 ≤ |n| ≤ q 0,

.

else

If we have cXX (0), . . . , cXX (q), we would need to solve non-linear equations in h(0), . . . , h(q) and σZ2 , which makes it impossible to find a unique solution. Let b(m) = σZ h(m) for m = 0, . . . , q, then cXX (n) =

 q−|n|  P 

m=0

b(m)b(m + |n|), 0 ≤ |n| ≤ q 0,

else

.

142

CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION

Let cˆXX (0), . . . , cˆXX (q) be available (consistent) estimates for cXX (0), . . . , cXX (q). Then cˆXX (n) =

q−n X

ˆb(m)ˆb(m + n),

n = 0, . . . , q .

m=0

The estimates ˆb(0), . . . , ˆb(q) are consistent, because of the consistency of cˆXX (k), k = 0, . . . , q. ˆ ˆ ˆ ˆ Then h(1) = b(1) , . . . , h(q) = b(q) , σ ˆ 2 = ˆb(0)2 . ˆb(0)

Z

ˆb(0)

The implementation of the solution is as follows: With ˆ B(z) = CˆXX (z) =

q X

ˆb(n)z −n

n=0 q X

cˆXX (n)z −n

n=−q

ˆ B ˆ (1/z) = B(z) 1. Calculate P (z) = z q CˆXX (z) Q ˆ ˆ 2. Assign all roots zi of P (z) inside the unit circle to B(z), i.e. B(z) = ˆb(0) qi=1 (1 − zi z −1 ) with ˆb(0) > 0 and |zi | ≤ 1. ˆ 3. Calculate h(m) =

ˆb(m) ˆb(0)

for m = 1, . . . , q

As an example

q X ˆb(1) ˆ = h(1) =− zi . ˆb(0) i=1

To find

σ ˆZ2

= ˆb20 we use cˆXX (n) =

q−k X

ˆb(m)ˆb(m + n) ,

n = 0, 1, . . . , q

m=0

and find for n = 0 ˆb(0)2 = as an estimate for the power σZ2 of Z(n).

cˆXX (0) Pq 2 ˆ 1 + m=1 h(m)

An alternative to estimate the spectrum CXX (ejω ) of X(n) is CˆXX (ejω ) =

q X

n=−q

cˆXX (n)e−jωn = cˆXX (0) + 2

q X

cˆXX (n)e−jωn .

(11.1)

n=1

Example 11.2.1 Consider a receiver input signal X(n) from a wireless transmission. To find the moving average estimate of the spectrum CXX (ejω ), we first calculate the sample covariance cˆXX (n) and take the z-transform CˆXX (z). By taking P (z) = z N −1 CˆXX (z) we find N − 1 roots ˆ zi , i = 1, . . . , N − 1 inside the unit circle and assign them to B(z). The estimator CˆXX (ejω ) can then be found according to (11.1).

11.3. MODEL ORDER SELECTION

143

12

10

8

6 CˆXX (ejω )

4 CXX (ejω )

2

0 0

0.5

1 ω π

1.5

2

Figure 11.3: Estimate of an MA process with N = 100 and q = 5

11.3

Model order selection

A problem arising with the parametric approach is that of estimating the number of coefficients required, i.e. p or q. Having observed X(0), . . . , X(N − 1) there are several techniques for doing this. Akaike’s information criterion (AIC) suggests minimising the following function with respect to m. 2 +m AIC(m) = log σ ˆZ,m

2 , N

m = 1, . . . , M

The order estimate is the smallest m minimising AIC(m). Another more accurate estimate is given by the minimum description length (MDL) criterion, obtained by taking the smallest m minimising 2 MDL(m) = log σ ˆZ,m +m

log N , N

m = 1, . . . , M

Examples We generate realisations from the N (0, 1) distribution. We filter the data recursively to get 1024 values. The filter has poles z1,2 = 0.7e±j0.4 , z3,4 = 0.75e±j1.3 , z5,6 = 0.9e±j1.9 It is a recursive filter of order 6. The process is an AR(6) process.

144

CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION

True Estimate

10 9 8 7 6 5 4 3 2 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Figure 11.4: True and estimated AR(6) spectrum. The abscissa displays the frequency ordinate shows the estimate.

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

AIC

MDL

0.5

0.4

ω 2π

0.5

while the

0.5

0.4

0.3

0.3

0.2

0.2

0.1 0.1 0 0 2

4

6 Order

8

10

2

4

6 Order

8

10

Figure 11.5: Akaike and MDL information criteria for the AR(6) process for orders 1 through 10.

Chapter 12

Discrete Kalman Filter The Kalman filter has become an indispensable tool in modern control and signal processing. Its simplest use is as an estimation tool. However, it is capable of revealing more than just a standard estimate; it provides a complete statistical description of the degree of knowledge available. This can be highly useful in system analysis. In contrast to many other estimation techniques, the Kalman filter operates in the state space domain. This intuitive structure has given it physical motivation in a variety of applications. In addition, its implementation can be computationally efficient. As a result, it has found application in a wide variety of fields; especially in process control and target tracking, but also in many other signal processing topics.

12.1

State Estimation

The purpose of a Kalman filter is to estimate the state of a process. In many (or most) physical applications, this state will consist of multiple random variables. For example, in a remote sensing scenario, the task may be to estimate the position and velocity of a vehicle. In this case information may be available in the form of measurements such as acoustic signals, radar returns or visual images. Once an appropriate model of the system is constructed, it may be possible to derive an algorithm for the estimation of the state, based on a finite set of observations.

12.2

Derivation of Kalman Filter Equations

Consider a multi-dimensional system where the time evolution of the (unobservable) process state can be described by         a11 . . . a1N X1 (n − 1) b11 . . . b1L U1 (n − 1) W1 (n − 1)   X1 (n)    ..    ..   .. ..   .. .. .. .. =  ...  + .  +  . . . . . . . XN (n) aN 1 . . . aN N XN (n − 1) bN 1 . . . bN L UL (n − 1) WN (n − 1) and in vector notation

X(n) = AX(n − 1) + BU (n − 1) + W (n − 1) where • X(n) ∈ ℜN ×1 is the process state at time n 145

(12.1)

146

CHAPTER 12. DISCRETE KALMAN FILTER

• A is the state transition matrix • U (n) ∈ ℜL×1 is the control input • B governs the effect of the control input on the state • W (n) ∈ ℜN ×1 is the process noise. The measurements of the process are determined by        Z1 (n) h11 . . . h1N X1 (n) V1 (n)  ..   . ..   ..  +  ..  ..  .  =  .. . .  .   .  ZM (n)

hM 1 . . . hM N

XN (n)

VM (n)

Z(n) = HX(n) + V (n)

(12.2)

where • Z(n) ∈ ℜM ×1 is the measurement • H maps the process state to a measurement • V (n) ∈ ℜM ×1 is the measurement noise The relationships between the system components can be seen in Figure 12.1. W (n)

U (n)

B

V (n)

δK (n − 1)

X(n)

H

Z(n)

A Figure 12.1: Relationship between system state and measurements assumed for Kalman filter. In addition, we assume that the noise quantities W (n) and V (n) are independent to each other and to U (n), white and Gaussian distributed, W (n) ∼ N (0, Q) V (n) ∼ N (0, R)

.

The matrices A, B and H are assumed known from the physics of the system. The control input U (n) is the intervention or “steering” of the process by an outside source (e.g. a human operator). It is assumed that U (n) is known precisely. Therefore, its effect on the process, through (12.1) is also known. Now, in order to derive the Kalman filter equations we need to consider what is the “best” ˆ ˆ − 1), the preestimate X(n) of the new state X(n) based on the previous state estimate X(n vious control input U (n − 1) and the current observation Z(n). This estimate of X(n) may be calculated in two steps:

12.2. DERIVATION OF KALMAN FILTER EQUATIONS

147

1. Predict forward in time from previous estimate ˆ − (n) = AX(n ˆ − 1) + BU (n − 1) X

.

(12.3)

From (12.1) it can be seen that the new state is a linear function of the previous state with the addition of the (known) control input and (unknown) zero-mean noise. To predict the new state it is reasonable to use the previous state estimate in place of the true state and to ignore the additive noise component since its expected value is zero. The quantity ˆ − (n) is referred to a the a priori state estimate since it is determined from the knowledge X available before time instant n. It is a projection of the previous state estimate forward in time. 2. Predict the new measurement value ˆ − (n) Zˆ − (n) = H X

(12.4)

and correct the state estimate based on the residual between predicted and actual measurements ˆ ˆ − (n) + K(n)(Z(n) − Zˆ − (n)) X(n) = X

.

(12.5)

The predicted measurement, Zˆ − (n), is calculated from the a priori state estimate. It can be seen that (12.4) is similar to (12.2) without the zero-mean measurement noise and using the previous state estimate rather than the true, unknown value. Since the true measurement, Z(n), is available, the measurement residual Z(n) − Zˆ − (n) is a measure of how correct the state estimate is – a small error is more likely when the estimate is good, whereas a poor estimate will be more likely to produce a large difference between predicted and actual measurements. The state estimate is now corrected by the weighted residual K(n)(Z(n) − Zˆ − (n)).

The N × M matrix K(n) is referred to as the Kalman gain. It controls how the measurement residual affects the state estimate. In order to optimise the choice of K(n), a criterion must be specified. A logical choice would be to minimise the mean squared error between the a posteriori ˆ estimate X(n) and the true state vector X(n), i.e. h i 2 ˆ min E kX(n) − X(n)k = min tr {P (n)} (12.6) K(n)

where with

K(n)

  P (n) = E E(n)E(n)T ˆ E(n) = X(n) − X(n)

(12.7) (12.8)

is the (state) estimate error – i.e. the difference between the true state vector and our a posteriori state estimate. To find the K(n) that minimises (12.6), start by writing the a posteriori  error in terms  of ˆ − (n), and its covariance, P − (n) = E E − (n)E − (n)T the a priori error, E − (n) = X(n) − X ˆ E(n) = X(n) − X(n)   ˆ ˆ − (n) −X = E − (n) − X(n)   ˆ − (n) = E − (n) − K(n) Z(n) − H X   ˆ − (n)) + V (n) = E − (n) − K(n) H(X(n) − X  = E − (n) − K(n) HE − (n) + V (n)

(12.9)

148

CHAPTER 12. DISCRETE KALMAN FILTER

Now substitute into (12.7)   P (n) = E E(n)E(n)T h   T i = E E − (n) − K(n) HE − (n) + V (n) E − (n) − K(n) HE − (n) + V (n)      = E E − (n)E − (n)T − E K(n) HE − (n) + V (n) E − (n)T . . . h i T −E E − (n) HE − (n) + V (n) K(n)T . . . h i  T +E K(n) HE − (n) + V (n) HE − (n) + V (n) K(n)T

Differentiate1 tr {P (n)} with respect to K(n) and set to 0

0 = 0 − P − (n)H T − P − (n)H T + 2K(n) HP − (n)H T + R  P − (n)H T = K(n) HP − (n)H T + R −1 K(n) = P − (n)H T HP − (n)H T + R .

 (12.10)

It is easy to see that as R → 0, i.e. the measurement noise becomes less influential , K(n) → H −1 . Thus, the residual between actual and predicted measurements is more trusted and is given a heavy weighting in correcting the state estimate through (12.5). Alternatively, if it is known that the a priori error covariance approaches 0, i.e. P − (n) → 0, or if R is large, then K(n) → 0. This means that the residual is given less weight in updating the state estimate – the a priori estimate is given priority. Recall that X(n) = AX(n − 1) + BU (n − 1) + W (n − 1) ˆ − (n) = AX(n ˆ − 1) + BU (n − 1) X then   P − (n) = E E − (n)E − (n)T   T  − − ˆ ˆ = E X(n) − X (n) X(n) − X (n)   T  ˆ − 1)) + W (n − 1) A(X(n − 1) − X(n ˆ − 1)) + W (n − 1) = E A(X(n − 1) − X(n   T    ˆ ˆ AT + E W (n − 1)W (n − 1)T = AE X(n − 1) − X(n − 1) X(n − 1) − X(n − 1)     = AE E(n − 1)E(n − 1)T AT + E W (n − 1)W (n − 1)T = AP (n − 1)AT + Q

1

(12.11)

The following lemmas are used in the differentiation: ∂tr {AX} ∂X ˘ ¯ ∂tr AX T ∂X ∂XAX T ∂X

=

AT

=

A

=

X(A + AT )

12.3. EXTENDED KALMAN FILTER

149

Starting from (12.9), E(n) = E − (n) − K(n)(HE − (n) + V (n)) = (I − K(n)H)E − (n) − K(n)V (n)

so h  T i   E E(n)E(n)T = E (I − K(n)H)E − (n) − K(n)V (n) (I − K(n)H)E − (n) − K(n)V (n) h i P (n) = E (I − K(n)H) E − (n)E − (n)T (I − K(n)H)T + K(n)V (n)V (n)T K(n)T = (I − K(n)H) P − (n) (I − K(n)H)T + K(n)RK(n)T

= P − (n) − K(n)HP − (n) − P − (n)H T K(n)T + K(n)HP − (n)H T K(n)T + K(n)RK(n)T

= (I − K(n)H) P − (n) − P − (n)H T K(n)T + K(n)(HP − (n)H T + R)K(n)T

= (I − K(n)H) P − (n) − P − (n)H T K(n)T + P − (n)H T K(n)T = (I − K(n)H) P − (n)

Note that we have used the solution for K(n) in (12.10) to simplify the above equation. It is now possible to draw all the equations derived above together to complete the description of the Kalman filter: 1. Predict the next state and covariance using the previous estimate. ˆ − (n) = AX(n ˆ − 1) + BU (n − 1) (a) X

(b) P − (n) = AP (n − 1)AT + Q

2. Correct the predictions using the measurement. (a) K(n) = P − (n)H T (HP − (n)H + R)−1 ˆ ˆ − (n) + K(n)(Z(n) − Zˆ − (n)) (b) X(n) =X (c) P (n) = (I − K(n)H)P − (n)

The distinction between the two main steps is quite clear. The prediction is based on the a priori information available, i.e. the estimates from the previous time instance which was based on all measurements up to, but not including, the current time instance n. While the correction is made using the current measurement Z(n) to yield an a posteriori estimate. In fact, the suitability of the Kalman filter for computer implementation is, in large part, due to this feature. The procedure is recursive – it is run after each measurement is made, meaning intermediate estimates are always available. This is in contrast to some techniques that must process the entire measurement sequence to finally yield an estimate. It also compresses all the information from previous measurements into the current state estimate and its corresponding covariance matrix estimate – there is no need to retain a record of previous measurements. This can result in significant memory savings.

12.3

Extended Kalman Filter

The Kalman filter has very strict assumptions on the system model. In particular it uses linear stochastic difference equations to describe the state transition and the state-to-measurement mapping. This linearity assumption is violated by many or, by strict interpretation, all real-life

(12.12)

150

CHAPTER 12. DISCRETE KALMAN FILTER

applications. In these cases it becomes necessary to use the Extended Kalman Filter (EKF). As the name suggests, this is a generalisation or extension to the standard Kalman filter framework whereby the equations are “linearised” around the estimates in a way analogous to the Taylor series. Let the state transition now be defined as: (c.f. (12.1)) X(n) = f (X(n − 1), U (n − 1), W (n − 1))

(12.13)

and the measurement is found by: (c.f. (12.2)) Z(n) = h(X(n), V (n)).

(12.14)

As for the Kalman filter case, the noise processes, W (n − 1) and V (n − 1), are unobservable ˆ − 1), is available. However, the control input, and only an estimate of the previous state, X(n U (n − 1) is known. Therefore, estimates of the state and measurement vectors can be found by ˆ − (n) = f (X(n ˆ − 1), U (n − 1), 0) X ˆ − (n), 0) Zˆ − (n) = h(X .

(12.15) (12.16)

This means that (12.13) and (12.14) can be linearised around these initial estimates X(n) = f (X(n − 1), U (n − 1), W (n − 1)) ˆ − (n) + A(n)(X(n − 1) − X(n ˆ − 1)) + C(n)W (n − 1) ≈ X and ˆ − (n)) + D(n)V (n) Z(n) ≈ Zˆ − (n) + H(n)(X(n) − X where A(n), C(n), H(n) and D(n) are Jacobian matrices of the partial derivatives of f and h, i.e. A[i,j] = W[i,j] = H[i,j] = V[i,j] =

∂f[i] ˆ − 1), U (k − 1), 0) (X(n ∂x[j] ∂f[i] ˆ − 1), U (k − 1), 0) (X(n ∂w[j] ∂h[i] ˆ − 1), 0) (X(n ∂x[j] ∂h[i] ˆ − 1), 0) (X(n . ∂v[j]

Hopefully the interpretation of these equations is clear. The nonlinear state and measurement equations are linearised about an initial, a priori estimate. The four Jacobian matrices describe the rate of change of the nonlinear functions at these estimates. Due to the non-linearities, the random variables in the EKF cannot be guaranteed to be Gaussian. Further, no general optimality criterion can be associated with the resulting estimate. One way to interpret the EKF is that it is simply an adaptation of the Kalman filter to the nonlinear case through the use of a first order Taylor series expansion about the a priori estimate. By using the equations above and following the methodology in deriving the Kalman filter, the complete algorithm for implementing an EKF is:

12.3. EXTENDED KALMAN FILTER 1. Predict the next state and covariance using the previous estimate. ˆ − (n) = f (X(n ˆ − 1), U (n − 1), 0) (a) X

(b) P − (n) = A(n)P (n − 1)A(n)T + C(n)Q(n − 1)C(n)T 2. Correct the predictions using the measurement. (a) K(n) = P − (n)H(n)T (H(n)P − (n)H(n)T + V RV T )−1 ˆ ˆ − (n) + K(n)(Z(n) − h(X ˆ − (n), 0)) (b) X(n) =X (c) P (n) = (I − KH(n))P − (n)

151

152

CHAPTER 12. DISCRETE KALMAN FILTER

Appendix A

Discrete-Time Fourier Transform Sequence ax(n) + by(n) x(n − nd ) (nd integer) e−jω0 n x(n) x(−n) nx(n) x(n) ⋆ y(n) x(n)y(n) −jnx(n) Symmetry properties: x(n)∗ x(n) real x(n) real

Fourier Transform aX(ejω ) + bY (ejω ) e−jωnd X(ejω ) X(ej(ω+ω0 ) ) X(e−jω ) X(ejω )∗ if x(n) real dX(ejω ) j dω X(eZjω )Y (ejω ) π 1 X(ejθ )Y (ej(ω−θ )dθ 2π −π dX(ejω ) dω

X(e−jω )∗ X(ejω ) = X(e−jω )∗ XR (ejω ) and|X(ejω )| are even XI (ejω ) and argX(ejω ) are odd

x(−n)∗ X(ejω )∗ x(n) real and even X(ejω ) real and even x(n) real and odd X(ejω ) pure imaginary and odd Parseval’s Theorem Z π ∞ X 1 2 2 |X(ejω )| dω |x(n)| = 2π −π n=−∞ Z π ∞ X 1 ∗ x(n) · y(n) = X(ejω ) · Y (ejω )∗ dω 2π −π n=−∞

153

154

APPENDIX A. DISCRETE-TIME FOURIER TRANSFORM

Sequence δ(n) δ(n − n0 ) 1, (−∞ < n < ∞) an u(n),

(|a| < 1)

u(n) (n + 1)an u(n) (|a| < 1) rn sin(ωp (n + 1)) u(n) (|r| < 1) sin(ωp ) sin(ωc n) πn  1, 0 ≤ n ≤ M x(n) = 0, otherwise ejω0 m cos(ω0 n + φ)

Fourier Transform 1 e−jωn0 ∞ X 2πδ(ω + 2πk) k=−∞

1 1 − ae−jω ∞ X 1 πδ(ω + 2πk) + 1 − e−jω k=−∞ 1 (1 − ae−jω )2 1 −jω ) + r 2 e−j2ω 1 − 2r cos(ω  pe 1, |ω| < ωc X(ejω ) = 0, ωc < |ω| ≤ π sin[ω(M + 1)/2] −jωM/2 e sin(ω/2) ∞ X 2πδ(ω − ω0 + 2πk) k=−∞ ∞ X

π

[ejφ δ(ω − ω0 + 2πk) + e−jφ δ(ω + ω0 + 2πk)

k=−∞

Related Documents