Turbo Codes
Seminar ‘04
Introduction: Increasing demand for information exchange is a characteristic of modern civilisation. The transfer of information from the source to its destination has to be done in such a way that the quality of the received information should be as close as possible to the quality of the transmitted information. A typical communication system may be represented by the block diagram shown in Figure 1-1.
The information to be transmitted can be machine generated (e.g., images, computer data) or human generated (e.g., speech). Regardless of its source, the information must be translated into a set of signals optimized for the channel over which we want to send it. The first step is to eliminate the redundant part in order to maximize the information transmission rate. This is achieved by the source encoder block in Figure 1-1. In order to ensure the secrecy of the transmitted information, an encryption scheme must be used. The data must also be protected against perturbations introduced by the communication channel which could lead to misinterpretation of the transmitted message at the receiving end. This protection can be achieved through error control strategies: forward error correction (FEC), i.e., using error correcting codes that are able to correct errors at the receiving end, or automatic repeat request (ARQ) systems. The modulator block generates a signal suitable for the transmission channel. In the traditional approach, the demodulator block from Figure 1-1 makes a “hard” decision for the received symbol and passes it to the error control decoder block. This is equivalent, in the case of a two level modulation scheme, to decide which of two logical values, say -1 and +1, was transmitted. No information is passed on about how reliable the hard decision is. For example, 1 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
when a +1 is output by the demodulator, it is impossible to say if it was received as a 0.2 or a 0.99 or a 1.56 value at the input to the demodulator block. Therefore, the information concerning the confidence into the demodulated output is lost in the case of a “hard” decision demodulator.
Channel Capacity The capacity of a channel, which was first introduced 50 years ago by Claude Shannon, is the theoretical maximum data rate that can be supported by the channel with vanishing error probability. In this discussion, we restrict our attention to the additive white Gaussian noise (AWGN) channel.
Here, x is modulated symbol modelled by arandom process with zero mean and variance Es (Es is the energy per symbol). For the specific case of antipodal signalling2 , x = + Es1/2. z is sample from an additive white Gaussian noise process with zero mean and variance N0/2. The capacity of the AWGN channel is given by :
bits per channel use.
2 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Signalling at rates close to capacity is achieved in practice by error correction coding. An error correction code maps data sequences of k bits to code words of n symbols. Because n>k, the code word contains structured redundancy. The code rate, r = k/n is a measure of the spectral efficiency of the code. In order to achieve reliable communications, the code rate cannot exceed the channel capacity ( r < C). The minimum theoretical signal-to-noise ratio3 Eb/N0 required to achieve arbitrarily reliable communications4 can be found by rearranging equation (2)
This is the minimum Eb/N0 required for any arbitrary distribution for the input x. Equation (3) can be satisfied with equality only if the input is Gaussian distributed. If the input is antipodal instead of Gaussian, slightly higher Eb/N0 is required. Plots of the minimum Eb/N0 required to achieve channel capacity as a function of code rate r are given in Figure 1 for both Gaussian distributed inputs and antipodal inputs.
3 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
4 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Shannon’s proof of the Channel Coding Theorem5 used a random coding argument. He showed that if one selects a rate r < C code at random, then the bit error probability approaches zero as the block length n of the code approaches infinity. However, random codes are not practically feasible. In order to be able to encode and decode with reasonable complexity, codes must possess some sort of structure. Unfortunately, structured codes perform considerably worse than random codes. This is the basis of the coding paradox:
Almost all codes are good, except those we can think of. J. Wolfowitz
As will become apparent in the following sections, the reason that turbo codes perform so well is that they attack the coding paradox head on. On the one hand, they have structure and therefore can be efficiently encoded and decoded. But on the other hand they appear to be random and thus emulate the good performance of random codes.
Convolutional Codes Before the introduction of turbo codes power efficient communications were achieved by either a strong convolutional code or the serial concatenation of a convolutional code with a block code6. A convolutional code adds redundancy to a continuous stream of input data by using a linear shift register. For a rate r = k/n convolutional encoder, each set of n output symbols is a linear combination of the current set of k input bits and m bits stored in the shift register. The total number of bits that each output depends on is called the constraint length, and is denoted by K. an example convolutional encoder is shown on Figure 2. The rate of this code is r = ½ and the constraint length is K = 3. A convolutional code can be made systematic without affecting its minimum distance properties by feeding back one of the outputs to the input. Such a code is called a Recursive Systematic Convolutional (RSC) code, and is the basic building block for turbo codes. An 5 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
example RSC encoder derived from the non-systematic encoder of Figure 2 is shown in Figure 3.
6 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
The decoder for a convolutional code finds the most probable sequence of data bits given the received sequence y
where y is the set of code symbols x observed through noise. Equation (4) can be solved using the Viterbi algorithm, in which case the solution is termed the “maximum likelihood” or ML solution. The complexity of the Viterbi algorithm when used to decode convolutional code is O(2K).
Shannon’s Solution In a landmark 1948 paper, Shannon, showed that with the right error-correction codes, data could be transmitted at speeds up to channel capacity, virtually free from errors, and surprisingly low transmitting power. Before Shannon’s work, engineers thought that to reduce communications errors, it was necessary to increase transmission power or to send the same message repeatedly.
The channel coding theorem specifies the channel capacity C as a fundamental limit on the rate at which the transmission of reliable error-free messages can take place over a discrete memoryless channel. Shannon basically showed that it wasn’t necessary to waste so much energy and time if you had the right coding schemes. After his discovery, the field of coding theory thrived, and researchers developed fairly good codes. But still, before turbo codes, even the best codes usually required more than twice the transmitting power that Shannon’s law said was necessary to reach a certain level of reliability-a huge wastage of energy. The gap between the practical and the ideal, measured in decibels-a ratio between the signal level and the noise level on a logarithmic scale-was about 3.5db.To chip away at it, engineers needed more elaborate codes.
That was the goal that persisted for more than four decades, until Berrou and Glavieux 7 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
made their discovery in the early 1990s. When they introduced turbo codes in 1993, they showed it was possible to get within an astonishing 0.5db of the Shannon limit, for a bit-error rate of one in 100000. Today, turbo codes are still chipping away at even that small gap.
The solution to overcome the noise that plagued all communications channels , according to Shannon’s seminal paper, was to divide the data into strings of bits and add to each string a set of extra bits-called parity bits-that would help identify and correct errors at the receiving end. The resulting group of bits-the data bits plus the parity bits-is called a codeword, and typically it represents a block of characters, a few image pixels, a sample of voice, or some other piece of data.
Shannon showed that with the right collection of codewords-with the right code, in other words-it was possible to attain the channel capacity. But then, which code could do it?”Shannon left unanswered the question of inventing codes,’ says David Forney, a professor of electrical engineering at the Cambridge-based MIT. Shannon proved mathematically that coding was the means to reach capacity, but he didn’t show exactly how to construct these capacity-approaching codes. His work, nevertheless, contained valuable clues.
Shannon thought of codewords as points in space. For example, the codeword 011 can be considered a point in a three-dimensional space with coordinated x=0, y=1 and z=1.Codewords with more than three bits are points in hyperspace. Noise can alter a code-word’s bits, and therefore its coordinates, displacing the point in space.If two points are close to each other and one is affected by noise, this point might fall exactly onto other, resulting in decoding error. Therefore, the larger the differences in codewords-the farther apart they are-the more difficult it is for noise to cause errors.
To achieve capacity, Shannon demonstated that we should randomly choose infinitely long codewords. In other words, going to his spatial analogy, if you could make the codewords both random and as long as you wanted, you could put the points arbitrarily far from each other in space. There would be essentially no possibility of one point erroneously falling on another. Unfortunately, such long, random codes are not practical: first, because this code would be 8 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
extremely slow to use as you transmitted many, many bits for just one codeword. Still, the random nature of a good code would turn out to be critical for turbo codes.
Coding experts put aside Shannon’s ideal random codes, as they concentrated on developing practical codes that could be implemented in real systems. They soon began to develop good codes by cleverly choosing parity bits that constrained codewords to certain values making these codewords unlikely to be confused with other ones.
For example, suppose we have an eight-bit codeword(seven bits plus one parity bit). Suppose we further insist that all the codewords have an even number of 1s, making that extra parity bit a 1if necessary to fulfill that requirement. Now, if any of the eight bits is altered by noise, including the parity bit itself, the receiver knows that there was an error, because the parity bit count won’t check-there would be an odd number of 1s.
The basic scheme can detect an error, but it can’t correct it-you don’t know which bit was flipped. To correct errors, you need more parity bits. Coding experts have come up with numerous and ever more sophisticated ways of generating parity bits.Block codes, Hamming codes, Reed-Solomon codes, and convolutional codes are widely used to achieve low error rates.
Nevertheless, a computational-complexity problem hounded coding specialists and plagued all these codes. The complexity problem emerges as you figure the cost of a code in terms of the amount of computation required to decode your data. The closer you get to Shannon’s limit, the more complicated this process becomes, because you need more parity bits and the codewords get longer and longer.
For codewords with just 3 bits, for instance, there are a total of only 8 codewords. To approach capacity, however, one might need codewords with, say, 1000 bits, and therefore the decoder would need to search through an unimaginable large collection of 21000 codewords.
Turbo codes solved the complexity problem by splitting it into more manageable 9 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
components.Instead of a single encoder at the transmitter and a single decoder at the receiver, turbo codes use two encoders at one end and two decoders at the other.
Researchers had realized in the late 1960s that passing data through two encoders in series could improve the error-resistance capability of a transmission-for such a combination of encoders, the whole is more than the sum of the parts. Turbo codes employ two encoders working synergistically-not in series, but in parallel.
The turbo process starts with three copies of the datablock to be transmitted. The first copy goes into one of the encoders, wher e a convolutional code takes the data bits and computed parity bits from them. The second copy goes to the second encoder which contains an identical convolutional code.The second encoder gets not the original string of bits, but rather a string witrh the bits in another order, scrambled by a system called an interleaver. This encoder then reads this scrambled data bits and computed parity bits from them. Finally, the transmitter takes the third copy of the original dat and send it, along with the two strings of parity bits, over the channel.
That rearranging of bits in the interleaver is the key step in the whole process. Basically, this permutation brings more diversity to the codewords; in the spatial analogy, it pushes the points farther apart in space. “The role of the permutation is to introduce some random behaviour in the code”, says Berrou.In other words, the interleaver adds a random character to the transmitted information much as Shannon’s ransom codes would do.
But then turbo codes, like any other codes with a huge number of codewords would also hit the wall of computational complexity. In fact, turbo codes usually work with codewords having around a thousand bits. Afairly unwieldy number.Because turbo codes use two component decoders that work together to byepass the complexity problem.
The role of each decoder is to get the data, which might have been corrupted by the noise along the channel, and decide which is the more likely value, 0 or 1, for each individual bit. In a 10 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
sense, deciding about the value of each bit is as if u have to guess whether its raining or not outside.
Each turbo decoder also counts on “clues” that helps it guess whether the received bit is a 0 or 1.First it inspects the analog signal level of the received bits. While many decoding schemes transform the received signal into either a 0 or a 1-therefore throwing away valuable information, because the analog signal has fluctuations that can tell us more about each bit- a turbo decoder transforms the signal into integers that measure how confident we can be that a bit is a 0 or a 1. In addition, the decoder looks at its parity bits, which tell it whether the received data seems infact or has errors.
The result of this analysis is essentially an informed guess for each bit. ”What turbo codes do internally is to come up with bit decisions along with reliabilities that bit decisions are correct”, says David Garrett, a researcher in the wireless research laboratory at Bell Labs, part to Lucent Technologies, Murray Hill, NJ. These bit reliabilities are expressed as numbers, log likelihood ratios, that can vary , for instance, between –7 and +7. A ratio of +7 means the decoder is almost completely sure the bit is a 1; a –5 means the decoder thinks the bit is a 0 but not totally convinced (Real systems usually have large intervals, like –127 to +127).
Even though the signal level and parity checks are helpful clues, they are not enough.A single decoder still can’t always make correct decisions on the transmitted bits and often will come up with a wrong string of bits-the decoder is lost in a universe of codewords, and the codeword it chooses as the decoded data is not always the right one. That is why a decoder alone can’t do the job.
But it turns out that the reliability information of one decoder is useful to the other and viceversa, because the two strings to parity bits refer to the very same data; it is justr that the bits are arranged in a different order. So the two decoders are trying to solve the same problem, but looking at it from different perspectives.
11 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
The two decoders,then,can exchange reliability information in an iterative way to improve their own decoding. All they have to do, before swapping reliability strings, is arrange the strings content in the order each decoder needs so a bit that was strongly detected as a 1 in one decoder,for example, influences the other decoder’s opinion on the corresponding bit. In case of the turbo decoders, now each decoder not only has its own “opinion” , it also has an “external opinion” to help it come up with a decision about each bit. At the heart of turbo coding is this iterative process, in which each component decoder takes advantage of the work of the other at a previous decoding step. After a certain number of iterations, typically 4 to 10, both decoders begin to agree on all bits. That means the decoders are not lost anymore in the universe of codewords; they have overcome the complexity barrier.Now lets consider each step of generating and decoding of the turbo codes.
Turbo Codes: Encoding A turbo code is the parallel concatenation of two RSC codes separated by an interleaver. An example turbo encoder is shown in Figure 4. Here, the two encoders are identical rate ½ RSC encoders. The upper encoder receives the data directly, while the lower encoder receives the data after it has been interleaved by a permutation function “a”. The interleaver “a” is in general a pseudo-random interleaver --- that is it maps bits in position i to position a(i) according to a prescribed, but randomly generated rule. The interleaver operates in a block-wise fashion, interleaving L bits at a time, and thus turbo codes are actually block codes. Since both encoders are systematic and receive the same set of data (although in permuted order), only one of the systematic outputs need to be sent. By convention, the systematic output of the top encoder is sent while the systematic output of the lower encoder is not transmitted. However, the parity outputs of both encoders are transmitted. The overall code rate of a turbo code composed from the parallel concatenation of two rate ½ systematic codes is r = 1/3. This code rate can be made higher by puncturing. The code rate of a turbo code is typically increased to r = ½ by only transmitting the odd indexed parity bits from the upper encoder and the even indexed parity from the lower encoder (along with all the systematic bits from the top encoder).
12 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
13 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Turbo codes: Decoding As with convolutional codes, an ML solution can be obtained by using Equation (4) and the Viterbi algorithm. However, due to the presence of the interleaver, the complexity of Viterbi algorithm when used to decode turbo codes is O(2L), where L is the size of the data frame (and interleaver). Due to the prohibitive complexity of the ML solution to decoding turbo codes, we seek a reduced complexity, albeit suboptimal, solution. In particular, a good estimate of the data can be found by solving the following system of equations:
where y(0) is the observed systematic bits, y(1) is the observed parity bits from encoder one, and y(2) is the observed parity bits from encoder two. A tilde over a variable represents its interleaved value, i.e.
is the interleaved version of y.
is the a posteriori Log-likelihood ratio (LLR),
and z is the so-called extrinsic information
14 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
which is related to the LLR by
The system of Equations (5) to (8) can be solved iteratively using thw structure shown in the fig.5.Here the decoder one determines the solution to (5) and decoder two determined the solution to (6).Each decoder passes information to the other decoder , which in turn refines the estimated a posteriori probabilities using information derived by the other decoder. The final estimate of the data is obtained by hard limiting the output of one of the decoders(by convention the second decoder’s output is used) 15 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Turbo codes get their name from the feed back structure of figure 5 and its analogy to a turbo engine. In fact, there is nothing “turbo” about turbo codes, rather the turbo effect comes from the decoder implementation.
The a posteriori LLR’s of (5) and (6) are computed using a derivation of the symbol-bysymbol maximum a posteriori (MAP) algorithm of (7).Although the algorithm of (7) can be used directly to compute the LLR’s, the algorithm is computationally difficult and sensitive to finite precision numerical representations.These problems are alleviated by performing the algorithm in the log-arithmetic domain,as represented in (8) and (9). The resulting algorithm is termed the log-MAP algorithm. The algorithm consists of two instances of the Viterbi algorithm-one performing a forward recursion and the other performing a backwards recursion. Thus complexity of the Log-MAP algorithm is twice that of the Viterbi algorithm.
16 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
The simulated performance of a turbo code with BPSK modulation is shown in figure 6. This is the rate r=1/2 turbo code created by the parallel concatenation of two K=5 RSC encoders 17 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
that was originally reported in [1]. The interleaver size of this code is L=65536 bits. The performance after various numbers of decoder iterations is shown-note that performance improves as the number of decorder iteration increases. After 18 iterations a bit error rate of 10-5 is reached at just slightly below 0.7dB. Comparing this to figure 1,we see that performance within 0.5dB of capacity when the input is constrained to be anti-podal. Also note the presence of an error floor as the signal-to-noise ratio increases.
Turbo codes: Performance factors There are many factors that affect the performance of turbo codes. The most influential parameter is the interleaver size. As the frame/interleaver size increases, performance improves. In figure 7, the performance of the r=1/2, K=5 turbo code is shown for various interleaver sizes. Note that as the interleaver gets smaller,performance degrades. This would imply that one would select the largest possible interleaver size. However, as the interleaver size increases so does the decoder latency, since the entire code owrd must be received before decoding can be completed. Thus turbo codes possess an inherent tradeoff between performance and latency. This tradeoff can be exploited for wireless multimedia communication systems.
Another parameter affecting performance is the overall code rate. Just like other codes, performance improves as the code rate becomes lower. When thw code rates higher than 1/3 are used, then the particular puncturing pattern that is used impacts the performance. Like convolutional codes, the constraint length also influences the performance. However, the impact that constraint length has on the performance is weak, and thus only the short constraint lengths of K=3, 4, or 5 are considered for practical turbo codes. The interleaver design plays a significant role in the performance of a turbo code, particularly for higher signal to noise ratios. In general, a randomly chosen interleaver design will give good performance, while highly structured interleavers such a the “block interleaver” should be avoided.
18 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
The choice of decoding algorithm and number of decoder iterations also influences performance. As can be seen in figure 5, performance improves as the number of iterations increases. This improvement follows a law of diminishing returns. Also, the number of iterations required is a function of the interleavers size-bigger interleavers require more iterations. For example, a turbo code with an interleaver size of 16,384 bits only needs about 9 iterations of decoding in practice.
19 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
20 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Limitation: The decoding delay-the time it takes to decode the data-is a major drawback to turbo codes. The several iterations required by turbo decoding make the delay unacceptable for realtime voice communications and other applications that require instant data processing, like hard disk storage and optical transmission. Therefore, for voice transmission, convolutional codes are used, because their decoding delays are smaller than those of turbo codes.
Applications: For systems that can tolerate decoding delays, like deep-space communications, turbo codes have become an attractive option. In fact, last September, the European Space Agency, based in Paris, France, launched SMART-1, the first probe to go into space with data transmission powered by turbo codes. ESA will also use the codes on other missions, such as Rosetta, scheduled for launch early this year to rendevous with a comet. Digital audio broadcasting, which provides CD-quality radio programs, and satellite links, such as the new Global Area Network of Immarsat Ltd., in London, are both also about to incorporate turbo codes into their systems. Turbo codes are already in use in Japan, where the have been incorporated into third-generation mobile phone systems, known officially as the Universal Mobile Telecommunication System(UMTS).
21 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
Conclusions: Turbo codes are a very efficient coding system for applications where decoding delay requirements are not that stringent In applications requiring extreme bit error requirements the overhead due to turbo coding is quite bearable considering the excellent BER offered. Turbo codes have better noise immunity, lesser computational complexity and lesser transmission power requirements and so represent the best solution to achieving data rates close to the Shannon limit.
22 Department of Electronics
College Of Applied Science,Vaddakencherry
Turbo Codes
Seminar ‘04
References: 1) A paper by Matthew C.Valenti, Virgina Polytechnic and State University, Blackburg, USA on Turbo codes and Iterative Processing
2)‘What a Wonderful Turbo World…’ by Dr. Sorin Adrian Barbulescu
3) IEEE spectrum March 2004
4) www.icc2004.org
5) www331.jpl.nasa.gov/public/JPLtcodes.html 6) www.turbo.enst-bretagne.fr/
7) www.itr.unisa.edu.au/~steven/turbo/
8) www2.elen.utah.edu/~s-howard/ tutorials/turbotutorial.html 9) www.comelec.enst.fr/turbocodes/ 10) www.ee.vt.edu/~yufei/turbo.html
23 Department of Electronics
College Of Applied Science,Vaddakencherry