Stochastic LDPC Decoder Design for IEEE 802.15.3c Standard Chih-Lung Chen, Yi-Hsuan Tsai, Yu-Hsiang Lin, Hsie-Chia Chang, and Chen-Yi Lee Department of Electronics Engineering National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu, Taiwan, R.O.C. Tel: +886-3-5712121 ext.54238 email:
[email protected]
Abstract—In this paper, a LDPC decoder based on stochastic computation is presented. The VLSI structures of key components are demonstrated, and a systematic method to design parameters required by stochastic decoding is also discussed. The simulatoin results based on IEEE 802.15.3c standard show that the proposed decoder could achieve performance equal or better than tradditional min-sum decoding algorithm. Index Terms—LDPC, stochastic computing, Wireless LAN.
I. I NTRODUCTION
Main attraction of stochastic computation is the simple computing units, as shown in Fig. 1 [8]. The multiplication of two stochastic sequences is performed by a single twoinputs AND gate, and the JK flip-flop can be used to compute stochastic division according to the following equation: Pdiv = P1 /(P1 + P2 ). The stochastic addition and subtraction should be combined with a scaling operation to ensure the range of [0, 1] for the outcome; as a result, the operation can be implemented using a multiplexer, where RS represents random selection using random number generators. The output probability is Padd = n1 ni=1 Pi .
P1 P2
Pmul
P1
J
P2
K
Pdiv
P1 P2 Pn
CLK
Fig. 1.
Padd
...
Low-density parity-check (LDPC) code is a famous error control code for its near Shannel bound error-correcting performance. After introduced by Gallager in 1963 [1], LDPC code was disregarded until rediscovered by MacKay and Neal [2]. One reason for LDPC codes to have outstanding performance is the belief-propagation (BP) algorithm [3] which is an iterative decoding process on bipartite graph. Further simplified decoding algorithms such as min-sum (MS) algorithm and normalized min-sum (NMS) algorithm have succesfully reduced the decoder complexity by the cost of some performance loss. Benefited by the advance of VLSI technology, high-speed moderate-cost implementation of LDPC decoder becomes possible and draws great research interests. Nowadays many modern communication standards such as IEEE 802.3an [4] and IEEE 802.15.3c [5] have adopted LDPC codes as error-control codes. The characteristic of LDPC codes used in high-throughput systems is high code-rate, and that means high check node degrees in most cases. Since MS or NMS algorithms need to find the minimum value among all incoming messages in check node update, large sorters are required for parallel implementation. Furthermore, large amount messages connecting to one operation unit will cause routing difficulties, leading to lower chip density. In order to solve this problem, some decoding algorithms based on binary message passing are proposed, such as differential decoding with binary message passing (DD-BMP), but still encounter performance loss comparing with soft message passing [6]. Stochastic decoding is another alternative approach to resolve complex computation. Values are represented by a Bernoulli sequence in stochastic computing [7]. The probability of the i-th bit d i in the random sequence being a binary one is P (di ) = N/L, where N is the number of ones and L stands
for sequence length. For example, a 10-bits sequence could represent either the probability of 0.3 when 3 bits equal to 1 or the probability of 0.7 when 7 bits are ones. The information is contained in the statistics of the bit stream; therefore, there is no fixed mapping between probability value and encoded sequence.
RS
Stochastic Computation
Stochastic computation has been successfully applied to several error control codes, either tentative small codes [9], [10] or pratical cases [11], [12]. Unfortunately none of them provides a systematic method to design required parameters of stochastic decoding for real applications; therefore, in this paper we introduce a complete performance analysis of essential parameters by using IEEE 802.15.3c as a experimental platform. The rest of this paper is is organized as follows. Section II introduces the background of stochastic decoding of LDPC codes. In Section III, we present a regular structure of stochastic variable node. Two random number generation techniques are also discussed. Simulation result and analysis are shown in Section IV. Finally, the conclusion is given in Section V.
II. S TOCHASTIC LDPC D ECODER
decoder will decode the attempted codeword according to the status of counter. When the counter is equal to or greater than zero, the decoded bit is ’1’, and vice versa.
The stochastic LDPC decoder architecture is illistrated in Fig. 2. The framework consists of the following: 1) Channel value to probability conversion: Assuming additive white Gaussian noise (AWGN) channel with binary phase shift keying (BPSK) modulation, the LLR value converted from channel value is calculated by equation (1), and LLR value is transformed into probability according to equation (2).
RNG
y1
y2
P to S
Channel Value to Probability Conversion
P to S
Probability to Stochastic Stream Conversion
Vi
Stochastic Variable Node
V2
...
P to S
VN-1
C2 Ci
...
...
C to P
C to P
V1
RNG
(1)
yN
1 P (xi = 1 | yi ) = (2) P (xi = 1 | yi ) + P (xi = 0 | yi ) 1 + eL i
C to P
P to S
Fig. 2.
w
Stochastic LDPC decoder architecture
Comparator
in1
out1
in2
out2
...
Bi in dc-1
...
1
Pi>Ri
Ri
Random Number Generator
...
Pi
Stochastic Check Node
CN-K
VN
...
2) Probability to stochastic stream conversion: For this part, the comparator shown in Fig. 3(a) can be used to convert probabilities into stochastic streams. In this figure, the input probability and random stream are quantized with w-bits. Random stream is generated by random number generators with uniform distribution. Since the output stochastic stream is 1 when P > R, the probability of output bit equal to 1 is 2Pw . 3) Stochastic decoding on factor graph: In stochastic decoding, the 1-bit message passing through the check node and going back to variable node is called one decoding cycle (DC). According to the sum-product algorithm of LDPC decoding, the check node update and varaible node update of two-input cases are carried out by (3) and (4), respectively. Pout = Pin1 (1 − Pin2 ) + Pin2 (1 − Pin1 ) Pin1 Pin2 Pout = Pin1 Pin2 + (1 − Pin1 )(1 − Pin2 )
C to P
...
Pi =
1 P (xi = 0 | yi ) = (4yi ) P (xi = 1 | yi ) N0
P to S
C1
yN-1
Li = ln
C to P
Bipartite Graph
indc
w
outdc-1 outdc
(a) Probability to stochastic stream (b) Stochastic check node with deconverter gree dc update update
in1 in2
1 0
out CLK
in1 in2
1
EM Random number
out
0
CLK
(c) 2-input stochastic variable node (d) 2-input stochastic variable node (VNU) with edge memory (VNUE) Fig. 3.
Key components for stochastic LDPC decoder
(3) (4)
These equations are difficult to realize in probability domain, but can be implemented by simple logic circuitry in stochastic domain. For instance, a 2-input XOR gate performs the function of 2-input check node, and the structure is straightforwardly extended to check node with degree d c , as shown in Fig. 3(b). Similarly, since the division is required to normalize the output messages of 2-input variable node, a JK flip-flop with two additional gates can accomplish this task. Fig. 3(c) exhibits the circuit of 2-input stochastic variable node, where the JK flip-flop is replaced by a normal register with additional circuits. The message exchange process continues until syndrome is zero or maximum DC is reached. 4) Final statistics to digital conversion: In order to gather information from the varaible node, an up/down counter is used to compute the statistical property. If a ’1’ is received from the stochastic stream, the counter increases by one; otherwise, the counter decreases by one. After certain decoding rounds, the
Although the circuitry of variable node is simple, the hold state of JK flip-flop will greatly degrade the performance of stochastic decoding. The hold state occurs when two inputs of JK flip-flop are not equal. In addition, if the varaible nodes in one cycle on bipartite graph are all locked, the iterative process will be ineffective. Currently the techniques successful in solving the latching problem are noise-dependent scaling (NDS) and edge memory (EM) [11], [12]. Since the probability in (2) will approximate to zero or one in high SNR condition, the switching activities of bitstream will become too few to unlock the fixed state. NDS is a method to increase switching activities through scaling LLR by the following equation: 1 (5) Li = (ρN0 )Li = (ρN0 ) (4yi ) = ρ(4yi ) N0 where ρ is a design parameter called the NDS factor and its value needs to be optimized individually for different LDPC codes. Edge memories are composed of M-bit shift registers to store previous unlocked outputs. Fig. 3(d) illustrates the architecture of 2-input variable node with edge memory. As long as the hold state happens, the output of variable node is randomly
selected from the EM. In other words, the output of locked varaible node is based on the statistics of previous successful update. If switching activity of stochastic stream is too few, the unsuccessful update of variable node will remain locked and lower the efficiency of EM. Hence, the EMs must work with NDS technique. III. A RCHITECTURE D ESIGN According to previous discussion, stochastic bitstream converter, check node, variable node and random number generator are four key components. The prior two components are simple and can be extended to higher degree cases; however, the later two components are fewly discussed. In the following we’ll demonstrate the structure of variable node units and how to generate w-bit random number by two techniques. A. Stochastic variable node For irregular LDPC codes, it is necessary to build various structures of stochastic variable nodes with different degrees. According to the code specification of IEEE 802.15.3c standard in Table I, the VNU degrees are varied from 1 to 4. In fact, the VNU degrees may range from 1 to 6 or even larger in other standards. Therefore Fig. 4 shows the structure of varaible node units used in next section. TABLE I PARITY- CHECK MATRIX OF LDPC CODES IN IEEE 802.15.3 C STANDARD Code
CNU degrees
(N, K)
VNU degrees
C1
(672, 336)
1, 2, 3, 4
5, 6, 7, 8
C2
(672, 504)
1, 2, 3, 4
13, 14, 15, 16
C3
(672, 588)
1, 2, 3, 4
29, 30, 31, 32
m
Initial Value
C4
(1440, 1344)
3
45
7
000_0010
6
00_0001
out2
VNUE
out1
in2
VNU
VNUE
out2
in3
VNU
VNUE
in1 in2 in3 in4
VNU
in1
VNU
out3
in3
Bi
Bi : stream from channel value
(a) Degree 2 VNU
(b) Degree 3 VNU
in1 in2 in3 in4 in5 Bi
VNU
VNU
VNU
VNU
VNU
VNU
in1 in4
VNUE
out1
in1 in2
VNUE
out2
VNUE
out3
in3 in4 in5 in6
in3
VNUE
out4
Bi
VNUE
out5
(d) Degree 5 VNU Fig. 4.
VNUE
out2
VNU
VNUE
out3
VNUE
out4
VNU
LFSR m=7
7
100_0000
6
00_1000
VNU
VNU
VNU
VNU
VNU
VNU
1 Bi
Ri
LFSR m=6 LFSR m=7
(a) LFSR structure
Bi
23 bit CA30 8 bits
VNU
VNUE
out1
IN
111
110
101
100
011
010
001
000
VNU
VNUE
out2
OUT
0
0
0
1
1
1
1
0
in4
VNU
VNUE
out3
in3
VNU
VNUE
out4
in6
VNU
VNUE
out5
in5
VNU
VNUE
out6
in1
bitstream generator Pi
8
(c) Degree 4 VNU in2
in2
in4
VNU
8
...
VNUE
out1
VNU
out1
LFSR m=6
... ...
in1
VNUE
in1
VNUE
VNU
... ...
in2
in2
are illustrated for LFSR and CA, respectively, and their performance will be compared in the next section. LFSR is one kind of well-known pseudo random number generators with good statistical properties. The generated sequence is not truly random since it repeats periodically. A mbit LFSR consists of m registers and some XOR gates which provide feedback from specific position with maximal period. Using 8 bit random number as an example, we performed some experiments to find the combination of 8 sets LFSRs with different sizes, trying to obtain uniform distribution of output number between 0 to 255. Fig. 5(a) is the final configuration we use in this paper, which has lower cost but acceptable distribution. The other combinations of 8 sets LFSRs with m larger than 17 would obtain excellent distribution; however, it costs two times more registers. Cellular automata (CA) is composed of a group of cells which have their own specific color. The next state of a cell can be generated by a rule which depends on its neighbors. Considering a 3 neighbors and 2 colors system, all cells can update as the method in Fig. 5(b). By representing white by 0 and black by 1, the colors can be written in binary form as the table. This rule is called Rule-30 since 30 is the binary string 00011110 in the second row of the table. Among different rules, Rule-30 generates complex patterns especially suitable for random number generator. However, the distribution of parallel 8 sets CA with rule-30 (denoted as CA30) is not euqally distributed. To solve this problem, by performing XOR between two CA30 with different number of cells(Fig. 5(b).) a more uniform distribution can be achieved.
8 bit RN 8 bit 8 bit CA30
(b) CA structure Fig. 5.
Random number generators
Bi
(e) Degree 6 VNU
VNU structures for different degrees
B. Random Number Generator Linear feedback shift register (LFSR) and cellular automata (CA) are two well-known random number generators, but they can only generate one random bit at one cycle. In order to generate variant bitwdith random number, two architectures
IV. P ERFORMANCE A NALYSIS The following is the performance simulation of BPSK modulation over AWGN channel. All performance figures are fixed-point results based on the configuration of 8-bit input quantization, 64-bit EM, 6-bit up/down counter and 8-bit RNG. 1) Noise dependent scaling factor: Assuming target bit-error-rate (BER) is 10 −5 , the plot of
BER versus NDS factors could be used to obtain optimal NDS factors. As Fig. 6 recommends, the suitable NDS factors are 0.45, 0.9, 1.3, 1.4 for C1, C2, C3 and C4 LDPC codes, respectively. It is interesting that the range of suitable NDS factors for low SNR is narrow, but there is more selection for high SNR.
C1, NDS = 0.45, CA C1, NDS = 0.45, LFSR C2, NDS = 0.85, CA C2, NDS = 0.90, LFSR C3, NDS = 1.25, CA C3, NDS = 1.30, LFSR C4, NDS = 1.35, CA C4, NDS = 1.40, LFSR
−2
10
BER −2
Stochastic LDPC Decoding, DC = 1K
−1
10
−3
10
Stochastic LDPC decoding using LFSR, DC = 1K
10
(672,336) & (SNR = 3.3) (672,504) & (SNR = 4.0) (672,588) & (SNR = 5.0) (1440,1344) & (SNR = 5.6)
−3
BER
10
−4
10
1
1.5
−4
2
2.5
3
3.5
4
4.5
5
5.5
6
SNR
10
−5
Fig. 8.
10
Performance of different RNG (CA vs LFSR)
−6
10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
NDS
Fig. 6.
Performance of different NDS factors (LFSR structure)
2) Decoding cycles: Larger maximum DCs will guarantee better performance, but the improvement will saturate. The performance of different DCs using the NDS factor we found previously is shown in Fig. 7. There is no obvious improvement when decoding cycles exceed 1000. In other words, decoding cycle 1000 is efficient to perform stochastic decoding without evident performance loss. 3) Random number generation: After choosing the proper NDS factors and decoding cycles, RNG structure is another important factor. Fig. 8 compares the performance of stochastic decoder with LFSR RNG and CA RNG. Performance of LFSR is better than CA especially in C3. The dash lines are the performance of tradditional MS algorithm for the same LDPC codes. Apparently the performances of CA and LFSR are comparable to MS algorithm, but the stochastic computation units are much simpler.
−3
Stochastic LDPC decoding using LFSR, DC = 1K
10
(672,336) & (SNR = 3.3) & (NDS = 0.45) (672,504) & (SNR = 3.9) & (NDS = 0.9) (672,588) & (SNR = 5.0) & (NDS = 1.3) (1440,1344) & (SNR = 5.6) & (NDS = 1.4) −4
BER
10
−5
10
−6
10
400
Fig. 7.
600
800
1000 1200 1400 Decoding Cycles
1600
1800
2000
Performance of different decoding cycles (LFSR structure)
V. C ONCLUSION In this work, a stochastic LDPC decoder as long as its VLSI structures are presented. Since designing parameters for stochastic LDPC decoder is time consuming through exhaustive simulations, the approach illustrated in this paper is more systematic and may save lots of time in the early design stage. After performing experiments based on IEEE 802.15.3c standard, the proposed stochastic LDPC decoder could achieve performance comparable to the tradditional MS decoding algorithm with less design complexity. R EFERENCES [1] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963. [2] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance of low density parity check codes,” Electron. Lett., vol. 33, no. 6, pp. 457–458, Mar. 1997. [3] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. [4] Part 3: carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specificaions, IEEE Std. P802.3an-2006, Sept. 2006. [5] Part 15.3: wireless medium access control (MAC) and physical layer (PHY) specifications for high rate wireless personal area networks (WPANs), IEEE Std. P802.15.3c-DF3, 2008. [6] N. Mobini, A. H. Banihashemi, and S. Hemati, “A differential binary message-passing LDPC decoder,” in Proc. IEEE GLOBECOM 2007, Nov. 2007, pp. 1561–1565. [7] B. R. Gaines, “Stochastic computing systems,” Advances in information systems science, vol. 2, pp. 37–172, 1969. [8] B. D. Brown and H. C. Card, “Stochastic neural computation I: computational elements,” IEEE Transactions on Computers, vol. 50, pp. 891– 905, Sept. 2001. [9] W. J. Gross, V. C. Gaudet, and A. Milner, “Stochastic implementation of LDPC decoders,” in Proc. 39th Asilomar Conf. on Signals, Systems, and Computers, Oct. 2005, pp. 713–717. [10] C. Winstead, V. C. Gaudet, A. Rapley, and C. Schlegel, “Stochastic iterative decoders,” in Proc. IEEE Int. Symp. Inf. Theory, Sept. 2005, pp. 1116–1120. [11] S. S. Tehrani, W. J. Gross, and S. Mannor, “Stochastic decoding of LDPC codes,” IEEE Commun. Lett., vol. 10, no. 10, pp. 716–718, Oct. 2006. [12] S. S. Tehrani, S. Mannor, and W. J. Gross, “An area-efficient FPGAbased architecture for fully-parallel stochastic LDPC decoding,” in Proc. IEEE Workshop on Signal Processing Systems (SiPS’07), Oct. 2007, pp. 255–260.