Fast Fourier Transform : The Sequential Algorithm Pradosh K Roy Oil and Natural Gas Corporation Ltd., Calcutta 700088
[email protected]
The ubiquitous Transform , named after Baron Jean Baptiste Joseph Fourier[1768-1830][An Egyptologist , Mathematician , Professor École Polytechnique, Administrator of Greenoble, Scientific Adviser to Napoleon's army in its invasion of Egypt ..etc.] [1], providing an analytical expression for an arbitrary function is the mathematical foundation to the study of Linear Systems and Information Theory , apart from its application in Electrical Engineering and Electrical Communications. ‘Despite its strong bonds with Electrical Engineering , Fourier Analysis has nevertheless become indispensable’ in Quantum Physics , Biomedicine , Oceanography , Civil Engineering , Mechanical Vibrations , Signal Processing , Computational Chemistry , Error Correcting Codes , Image Compression .. etc.
Joseph Fourier[1768-1830]
[Fourier Analysis the Saw Tooth Signal]
[Sample Fourier Transform ]
From the algorithmic point of view Fourier decomposition of a digital signal was a major challenge before 1965 , when one of the most famous scientific paper authored by Cooley and Tukey [2] was published on Fast Fourier Transforms [FFT]. However , a paper by the Göttingen mathematician Karl Friedrich Gauss in 1805 , on interpolation , contained essentially the same idea ! [3]. A less general , but still important version of Fourier Transform , used for efficient computation of Hadamard-Walsh Transform was published in 1932 by the statistician Yates. Another important predecessor is the work of Danielson and Lanczos in the context of X-Ray crystallography in 1942 [4]. In spite of these early discoveries , the algorithm gained any notice until Cooley-Tukey’s work. The FFT algorithm is basically a strategy to reduce ‘an ostensibly O(n2) chore to an O(nlg2n) frolic ‘ and is enlisted amongst the Top10 algorithms which had greatest influence in the development of science and engineering in the 20th century [5]. The impact and future of FFT, brilliantly summarized by Rockmore [6] , is mainly because it made working in the frequency domain computationally feasible as working in the temporal or spatial domain, though its prominence might also have contributed to the deceleration of other areas of research. FFTs of several Giga points , which do not fit in the main memory of most machines is required in projects like MAP (Microwave Anisotropy Project) and LIGO. These so called out-of-core FFTs are an active area of research in the 21st century. Similarly approximate FFTs and non-uniform FFTs required e.g. in MRI and Quantum FFTs [ FFTs for Quantum Computers ] are also being pursued in leading laboratories around the globe.
1
A comparison of n2 and nlg2n for various values of n emphasizing the gain in speed of an FFT algorithm [7] is graphically shown in the followings. Comparison n**2 & nlgn
2 4 8 16 32 64 128 256 512 1024 2048
n
nlgn
4 16 64 256 1024 4096 16384 65536 162144 1048576 4194303
2 8 24 64 160 384 896 1024 4608 10240 22528
2
n /nlgn
180000 160000
2.0 2.0 2.7 4.0 6.4 10.7 18.3 32.0 56.9 102.4 186.2
140000 n
120000 n2, nlgn
n
2
n**2
100000
nlgn
80000 60000 40000 20000 0
n
Table 1
Fig 1
The Discrete Fourier Transform [DFT] of a sequence x(n) that is periodic with period N , so that x(n) = x(n+kN) for any integer value k , is defined as [6] : N-1 X(k) = ∑x(n) [WN ] kn , k =0,1,…..,N-1 : WN = e –j(2π/N) , (1) 0 and the Inverse Discrete Fourier Transform (IDFT) is N-1 x(n) = (1/N) ∑X(k) [WN ] -kn , n =0,1,…..,N-1 : WN = e –j(2π/N) is the Nth root of unity (2) 0 DFT can be conveniently expressed as the Matrix Vector Multiplication [X(k)] = [KN][x(n)] The Matrix [KN] is the kernel of the Transformation. [X(k)] = [KN][x(n)]
(3)
Or;
X(0) X(1) X(2) X(3) X(4) X(5) … X(n-2) X(n-1)
1 1 1 1 1 1 .. .. 1
1 WN WN2 WN3 WN4 WN5 … WNN-2 WNN-1
1 WN 2 WN 4 … … … … … …
1 WN3 WN6 … … … … … …
1 .. … … … … … … …
… … … … … … … … …
1 WNN-2 WN2(N-2)
1 WNN-1 WN2(N-1)
x(0) x(1) x(2) x(3) x(4) x(5) … x(n-2) x(n-1)
2
It is obvious that the DFT requires 2N2 evaluation of trigonometric functions , 4N2 real multiplications , 4N(N-1) real additions and a number of indexing and addressing operations . Thus the computational complexity of DFT is O(N2). Hence , any efficient procedure that reduce the number of multiplications and additions are of considerable interest[6]. The fundamental principle that FFT algorithms are based upon , is that of decomposing the computation of DFT of a sequence of length N into successively smaller DFTs , [In the present example it is decomposition of a sequence of length N into successively smaller DFTs of length N/2 , the so called Radix 2 algorithm] , exploiting the symmetry and periodicity of WN viz. WN 2=WN/2 and WNN =1 . Let’s start with the 8 point DFT of a sequence x(n),n=0,1,2,3,4,5,6,7 defined as : [X(k),k=0,1,2,3,…7] = [K8][x(n),n=0,1,2,3…7]
X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7)
=
1 1 1 1 1 1 1 1
1 W8 W82 W83 W84 W85 W86 W87
1 W82 W84 W6 W8 W10 W12 W14
1 W83 W6 W9 W12 W15 W18 W21
1 W84 W8 W12 W16 W20 W24 W28
1 W85 W10 W15 W18 W25 W30 W35
1 W86 W12 W18 W24 W30 W36 W42
1 W87 W14 W21 W28 W35 W42 W49
x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7)
W8 is the 8th root of unity. The roots are :
[1 W8
W82 W83
W84
W85
W86
W87 ]
We can decompose the input sequence x(n) into successively smaller by rearranging the even/odd indexes of the input sequence. This is called the Decimation –in-Time[DIT] algorithm and involves rearranging the columns of the kernel K8 and rows the vector x(n),n=0,1,2,3…7 in the following manner , without affecting the output : X(0) x(0) 1 1 1 1 1 1 1 1 2 4 6 3 5 7 1 W8 W8 W8 W 8 W8 W8 W8 X(1) x(2) 4 8 12 2 6 10 14 1 W W W W W W W 8 8 8 8 8 8 8 x(4) X(2) 1 W86 W812 W818 W83 W89 W815 W821 x(6) X(3) [A] [B] ----= X(4) 1 W88 W816 W824 W84 W812 W820 W832 x(1) 1 W810 W820 W830 W85 W815 W825 W835 X(5) 1 W812 W824 W836 W86 W818 W830 W842 x(3) X(6) 14 28 42 1 W8 W8 W8 W87 W821 W835 W849 x(5) X(7) x(7) [C] [D]
This Matrix-Vector Multiplication, hence, could be rewritten as : [X(k)]k=0,1,2,3 [X(k)]k=4,5,6,7
= =
[A][x(even)] + [B][x(odd)] [C][x(even)] + [D][x(odd)]
3
x(0) x(2) x(4) x(6)
Where x(even) =
and
x(1) x(3) x(5) x(7)
x(odd) =
Using Symmetry and Periodicity of W8 readily be equated with a 4 Point 1 1 K4 = 1 1
viz. W82 = W4 and W88 =1 , matrices A and C could DFT kernel : 1 1 1 W4 W42 W43 W42 W44 W46 W43 W46 W49
Moreover , if the rows of B and D are divided by W8n , n = 0,1,2,3,4,5,6,7 respectively , we get the 4 Point DFT kernel K4. Thus , if G(k)k=0,1,2,3 and H(k) k=0,1,2,3 are the 4 Point DFTs of the even and odd indexed input sequences i.e. , G(0) G(1) G(2) G(3)
and ,
Then ,
1 1 1 1
=
H(0) H(1) H(2) H(3)
1 1 1 1
=
X(0) = X(1) = X(2) = X(3) = X(4) = X(5) = X(6) = X(7) =
G(0) G(1) G(2) G(3) G(0) G(1) G(2) G(3)
1 W4 W42 W43
1 W42 W44 W46
1 W43 W46 W49
x(0) x(2) x(4) x(6)
1 W4 W42 W43
1 W42 W44 W46
1 W43 W46 W49
x(1) x(3)
+ + + + + + + +
W80 W81 W82 W83 W84 W84 W86 W87
x(5) x(7)
H(0) H(1) H(2) H(3) H(0) H(1) H(2) H(3)
Thus two 4 Point DFTs of the even indexed points viz. G(k) )k=0,1,2,3 and odd indexed points Viz. H(k) )k=0,1,2,3 produces an eight-point DFT of the input sequence. The 4 Point DFTs could further be decomposed into Two 2 Point DFTs in the same manner.
4
The kernel of 2 Point DFT is
.
K2
=
which is equal to
1
1
1
W2
1
1
1
-1
Using the data flow notation of Oppenheim & Schafer [6] , the 2 Point DFT of x(0) and x(4) is graphically represented by : x(0) + x(4)
x(0) – x(4)
This topology is referred to as the Basic Butterfly [6]. Thus : G(0) = [x(0)+x(4)]+W40[x(2)+ x(6)] G(1) = [x(0)-x(4)] +W41 [x(2) - x(6)] G(2) = [x(0)+x(4)]+W42[x(2)+ x(6)] G(3) = [x(0)-x(4)]+W43 [x(2) - x(6)] and H(0) = [x(1)+x(5)]+W40[x(3)+ x(7)] H(1) = [x(1)-x(5)]+W41 [x(3) - x(7)] H(2) = [x(1)+x(5)]+W42[x(3)+ x(7)] H(3) = [x(1)-x(5)]+W43 [x(3) - x(7)]
Finally , we get the 8 point DFT in terms of 2 point DFTs of the input sequence x(n) , rearranged as [x(0),x(4)],[x(2),x(6)],[x(1),x(5)],[x(3),x(5)] : X(0) = X(1) = X(2) = X(3) = X(4) = X(5) = X(6) = X(7) =
[x(0)+x(4)]+W40[x(2)+ x(6)] [x(0)- x(4)]+W41[x(2) - x(6)] [x(0)+ x(4)]+W42[x(2) + x(6)] [x(0) - x(4)]+W43 [x(2) - x(6)] [x(0)+ x(4)]+W40[x(2) + x(6)] [x(0) - x(4)]+W41[x(2) - x(6)] [x(0)+ x(4)]+W42[x(2) + x(6)] [x(0) - x(4)]+W43[x(2) - x(6)]
+ + + + + + + +
W80 {[x(1)+x(5)]+W40[x(3)+ x(7)]} W81 {[x(1)-x(5)]+W41[x(3)- x(7)]} W82 {[x(1)+x(5)]+W42[x(3)+ x(7)]} W83 {[x(1)-x(5)]+W43[x(3)- x(7)]} W84 {[x(1)+x(5)]+W40[x(3)+ x(7)]} W85 {[x(1)-x(5)]+W41[x(3)- x(7)]} W86 {[x(1)+x(5)]+W42[x(3)+ x(7)]} W87 {[x(1)-x(5)]+W43[x(3)- x(7)]}
5
Which may also be rewritten , because of the identities W82 = W4 , W80 = 1 , as : The complete flow graph for the computation of the 8 point DFT is shown in the following figure X(0) = X(1) = X(2) = X(3) = X(4) = X(5) = X(6) = X(7) =
[x(0)+x(4)]+ W80[x(2)+ x(6)] [x(0)- x(4)]+ W82[x(2) - x(6)] [x(0)+ x(4)]+ W84[x(2) + x(6)] [x(0) - x(4)]+ W86[x(2) - x(6)] [x(0)+ x(4)]+ W80[x(2) + x(6)] [x(0) - x(4)]+ W82[x(2) - x(6)] [x(0)+ x(4)]+ W84[x(2) + x(6)] [x(0) - x(4)]+ W86[x(2) - x(6)]
+ + + + + + + +
W80 {[x(1)+x(5)]+W80[x(3)+ x(7)]} W81 {[x(1)-x(5)]+W82[x(3)- x(7)]} W82 {[x(1)+x(5)]+W84[x(3)+ x(7)]} W83 {[x(1)-x(5)]+W86[x(3)- x(7)]} W84 {[x(1)+x(5)]+W80[x(3)+ x(7)]} W85 {[x(1)-x(5)]+W82[x(3)- x(7)]} W86 {[x(1)+x(5)]+W84[x(3)+ x(7)]} W87 {[x(1)-x(5)]+W86[x(3)- x(7)]}
:
INPUT
OUTPUT
This Radix-2 FFT Decimation-in-Time is a classic example of the Divide and Conquer Algorithms familiar to the computer scientists. If T(n) is the time taken for computing the DFT , we get the following recurrence relation for T(n) in terms of T(n/2) , T(n/4) , T(n/8) [8] , T(n)
≤ 2T(n/2) + cn , ≤ 2[2T(n/4) + cn/2] + cn = 4T(n/4) + 2cn ≤ 4{2T(n/8) + cn/4] + 2cn = 8T(n/8) + 3cn
…… The generalized inequality is T(n) ≤ 2kT(n/2k) + kcn Setting k=lg2n , T(n) ≤ nT(1) + cnlg2n i.e. T(n) = O(nlg2n) , because T(1) =1 . The best known algorithm for FFT [ Split-Radix Algorithm] due to Johnson and Frigo [8] has T(n) = 34/9 [nlgn] –2lgn –2/9(-1)lgn lgn + 16/27 (-1) lgn +8 for n>1 which supercedes Yavne’s Split Radix FLOP count 4nlgn –6n + 8 for n>1.
6
The DIT algorithm requires that the input data must be stored in a non-sequential , bit reversed order. If (n2n1n0) is the binary representation of the index of the sequence x(n) , then the sequence value is stored in position X(n0n1n2). Formally separation of even and odd numbered data can be carried out by examining the LSB in the index n. The necessity for a bit reversed ordering of the sequence x(n) is seen to result from the manner in which the DFT computation is decomposed into successively smaller DFT computations. For the eight point DFT the input and bit reversed sequence is shown in the following Table :
DEC
BIN
0 1 2 3 4 5 6 7
000 001 010 011 100 101 110 111
BIT Rev 000 100 010 110 001 101 011 111
DEC 0 4 2 6 1 3 5 7
The Radix-2 algorithm DIT algorithm is simple to implement and it is often advantageous to deal with such sequences [6]. Alternatively , we can consider dividing the output sequence X(k) into smaller and smaller , in the same manner. The class of algorithms based on this procedure is known as Decimation-in-Frequency [DIF] algorithms. It can be proved that DIT and DIF algorithms are mathematically equivalent[7]. In short, DFT is a linear transform that maps N regularly samples points from a signal onto an equal number of points representing the frequency spectrum of the signal. Due to its wide scientific and engineering applications , there has been a lot of interest in implementing FFT on parallel computers. For performing FFT on a parallel computer , there are two formulations of the basic algorithm viz. (i) Binary Exchange Algorithm and (ii) Transpose Algorithm . Depending on the size of the input N , the number of processors p and the memory or network bandwidth , one of these may run faster than the other [12]. Suggested Readings. [1] Bell, E T , 1953, Men of Mathematics , vol.1, Penguin Books, London, p.200-224 [2].Cooley,J.W.,Tukey,J.W.,1965,An Algorithm for Machine Calculation of Complex FourierSeries,Math.Comp.,19,p.297-301, [3].Cooley,J.W.,1987,The Rediscovery of the Fast Fourier Transform Algorithm, Microchimica Acta III ,p.33-45.[4].Danielson,G.C.,Lanczos,C.,1942,Some Improvement in the practical Fourier Analysis and their Application to X-Ray Scattering from Liquids, Jour. Franklin Inst.,233, p.365-380 ,p.432452.[ 5].Cipra , Barry A., The Best of the 20th Century : Editors name Top10 Algorithms , SIAM New,vol.33,No.4,[6].Rockmore, Daniel, 2001, http://www.cs.dartmouth.edu/~rockmore/cse-fft.pdf,The FFT: An Algorithm the Whole Family Can Use," Computing in Science and Engineering, vol. 2, no. 1, pp. 60-64, January/February [7].Oppenheim,A.V.,Schafer,R.W.,1975,Digital Signal Processing ,Prentice Hall Inc.,NJ,p.282-336 [8].Rabiner,L.R.,Gold,B.,1975,Theory and Application of Digital Signal Processing , Prentice Hall Inc.,NJ,p.356-43 [9]Dasgupta,S.,Papadimitriou,Vazirani,U.V.,2006,Algorithms,p.83,http://www.cs.berkeley.edu/~vazirani/algorithms/al l.pdf [10].Johnson, Steven G., Frigo ,M. ,2007, A modified Split Radix FFT with fewer Arithmetic Operations , IEEE Trans. Signal Processing , vol.55,no.1,p.111-119 [11]. http://cnx.org.content/m17374/latest [12].Grama,Ananth; Gupta,Anshul;Karypis,George;Kumar,Vipin,2003,Introduction to Parallel computing , 2 nd Edition, Addision Wesley.
7
Annexure I
Fig. 2 Entries of 16 point DFT kernel , where each complex value is shown as a vector . In the 1st row k=0 and the vector rotates in increments of 0 , in the 2nd row k=1 and the vector rotates in increments of 2π/N. [The Transform and Data Compression Handbook Ed. K. R. Rao and P.C. Yip, Boca Raton, ©CRC Press LLC, 2001]
Fig 3. Entries of 2048x2048 DFT Kernel [Real Part] Simulated in MATLAB® R-2009a configured on a 64 Bit Intel®CoreTM i5 3210-M @2.5GHz Processor. ©Pradosh K.Roy 2013
8