CMOS Logic Circuit Design http://www.rnbs.hiroshima-u.ac.jp/RNBS/ Link(リンク): 研究所教員講義ノート の下 CMOS論理回路設計
Arithmetic Modules (Part 2) • Circuits for Multiplication – Manual Multiplication Process of Positive Binaries – Combinational Multiplier Circuits for Positive Multi-Bit Binaries – Sequential Multiplier Circuits for Positive Multi-Bit Binaries – Handling of the Sign Bit
• Circuits for Division – Manual Division Process of Positive Binaries – Combinational Divider Circuits for Positive Multi-Bit Binaries – Handling of the Sign Bit
• Parallel Arithmetic for Increased Throughput – Matrix Arrangement of Simple Arithmetic Modules – Important Application: Picture Processing Mattausch, CMOS Design, H21/6/12
1
Construction of the Datapath (Arithmetic Part)
Last Lecture
Bit 31 Bit 30
Data-Out
Multiplexer
Divider
Multiplier
Boolean Unit
Shifter
Adder
Register
Data-In
Control
Bit 3 Bit 2 Bit 1 Bit 0
Today’s Lecture
Only digital processing systems for high-speed applications contain specialized multiplier/divider circuits in their datapath. Mattausch, CMOS Design, H21/6/12
2
Circuits for Multiplication -
Manual Multiplication Process of Positive Binaries Combinational Multiplier Circuits for Positive Multi-Bit Binaries Sequential Multiplier Circuits for Positive Multi-Bit Binaries Handling of the Sign Bit
Mattausch, CMOS Design, H21/6/12
3
Manual Multiplication Process (Positive Binaries) 4-Bit Example
General Algorithm for N Bit Start
Multiplicand (X):
1 0 0 0
Multiplier (Y):
• 1 0 0 1
Partial Products (PP) Product (Z)
1 0 0 0 + 0 0 0 0 + 0 0 0 0 + 1 0 0 0
Y0=1
Y0=?
Y0=0
Add multiplicand to product Shift multiplicand left 1 bit Shift multiplier right 1 bit
1 0 0 1 0 0 0 Nth repetition
No
Yes
End
In the manual multiplication process partial products are calculated and added for obtaining the product. Mattausch, CMOS Design, H21/6/12
4
Mathematical Formulation (Positive Binaries) Multiplicand (X, M-bit binary):
X=
M −1
∑ (X ⋅2 ) i
i
i= 0
N −1
Multiplier (Y, N-bit binary):
Y = ∑ (Yi ⋅2 i ) i =0
Product (P,( M+N-1)-bit binary):
Z=
N +M −1
N−1 M −1
⎡ i+j ⎤ X ⋅ Y ⋅2 ( ) ∑ i j ⎢ ⎥⎦ j= 0 ⎣ i = 0
∑ (Z k ⋅ 2 ) = ∑ i
k= 0 N −1
= ∑ PPj j =0
Partial Product (PPj,(j+M)-bit binary):
⎡ M −1 ⎤ PPj = 2 ⋅Yj⋅ ∑ (Xi ⋅ 2i ) ⎢⎣ i= 0 ⎦⎥ j
The manual multiplication process of adding the partial products is used in CMOS digital multiplier circuits. Mattausch, CMOS Design, H21/6/12
5
Array-Multiplier Circuit (4-Bit Binary Example) X3
FA = Full Adder
X1
X0
X3
X2
X1
X0
HA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
Z6
Z5
Z4
Z3
HA = Half Adder (Cin=0)
C out,HA = A• B SHA = A• B + A • B Critical Path: 8 Adder Stages
Z7
X2
Y0
Y1
Y2
Y3
Z2
Z1
Z0
The combinational array-multiplier circuit directly imitates the manual adding of partial products. Mattausch, CMOS Design, H21/6/12
6
Carry-Save Multiplier Circuit (4-Bit Example) Definition: Xi Yj
X3Y1
XiYj
X3Y0 X2Y1 X2Y0 X1Y1 X1Y0 X0Y1
HA Critical Path: 6 Adder Stages
X3Y2
X2Y2
FA X3Y3
X1Y3
FA
FA
FA
FA
HA
Z6
Z5
Z4
Merging Adder Z7
X2Y3
X1Y2
FA
HA
X0Y0
HA
X0Y2
FA
X0Y3
FA
Z3
Z2
Z1
Z0
The carry-save multiplier circuit shortens the critical path by transferring the carry bit always to the next partial product. Mattausch, CMOS Design, H21/6/12
7
Wallace-Tree Multiplier Circuit (Principle) Bit-slice in a carry-save multiplier circuit for the input bit of order 25 X5Y0 X4Y1
HA
Same bit-slice in a Wallace-tree multiplier circuit has a shorter critical path by 2 adder stages.
C5,1
X5Y0 X4Y1 X3Y2 X2Y3 X1Y4 X0Y5
X3Y2 C6,1
FA
FA C5,2
X2Y3 C6,2
FA
FA
C6,1 C6,2
C5,1 C5,2
C5,3
FA
X1Y4 C6,3
FA X0Y5
C6,4 C6
C5,3
C6,3 C5,4
FA C6
FA
Z5
Z5
The Wallace-tree moves all XiYj to the first or second fulladder (FA) level. Each FA serves as 3 to 2 compressor circuit. Mattausch, CMOS Design, H21/6/12
8
Wallace-Tree Multiplier Circuit (6•6 Example) 210 29
Merging adder
28
Full-adder :
27
+
26 25 24 23 22 21 20
BitSlice for 25
In a Wallace-tree multiplier circuit each bit slice has a different structure. However, the critical path has the smallest number of full-adder stages.
The Wallace-tree multiplier circuit has an irregular structure for the bit slices so that the layout becomes difficult. Mattausch, CMOS Design, H21/6/12
9
Circuits for Multiplication -
Manual Multiplication Process of Positive Binaries Combinational Multiplier Circuits for Positive Multi-Bit Binaries Sequential Multiplier Circuits for Positive Multi-Bit Binaries Handling of the Sign Bit
Mattausch, CMOS Design, H21/6/12
10
Bit-Serial Multiplier Circuit
N-bit shift register Serial streams least-significant bit first
Clock for Y is (M+1)•CLK Basic multiplier-
circuit elements
Definition: X Y
X Y &
The bit-serial multiplier determines the product of X (M-bit) and Y (N-bit) in (M+1)•(N+1) clock cycles. Mattausch, CMOS Design, H21/6/12
11
Serial-Parallel Multiplier Circuit (Example Y=4-Bit)
Clock
Y3
Clock Cout
Cout
Cout
FA
Clock
Y2
FA
X Multiplier bits in, least-significant bit first
Y1
FA
Y0
Cin
Cin
Cin
Clock
Each coefficient Zk(•2k) of the product is calculated in one clock cycle for all k.
Clock
Z Product bits out, least-significant bit first Clock
Critical Path of N-1 Adder Stages determines the clock cycle. Critical Path:
The serial-parallel multiplier feeds X (M-bit) serial and Y (Nbit) parallel. The product Z is determined in M+N clock cycles. Mattausch, CMOS Design, H21/6/12
12
Pipelined Serial-Parallel Multiplier Circuit (Example Y=4-Bit) Y0
Y1
Y2
Y3
X Clock
Clock Clock Cout
Cout
FA
Cout
FA
Clock
FA
Multiplier bits in, least-significant bit first
Cin
Cin
Cin
Clock
Each coefficient Zk(•2k) of the product is calculated in N-1 clock cycles, but pipelined, for all k.
Clock
Product bits out, leastsignificant bit first Z
Clock
Critical Path of 1 Adder Stage allows a shorter clock cycle. Critical Path:
Again X (M-bit) and Y (N-bit) inputs are fed serial and parallel, respectively. Z is determined pipelined in M+2N clock cycles. Mattausch, CMOS Design, H21/6/12
13
Handling of the Sign Bit in Multiplications The sign bits SX and SY of multiplicand and multiplier determine the sign bit SZ of the product.
The sign bit SZ of the product is only 1 (negative) if SX and SY are different.
X base2 = SX XM −1 XM − 2 o o o X3 X2 X1 X0 Ybase 2 = SYYN −1YN − 2 o o o Y3 Y2 Y1Y0
SZ = SX⊗SY = SX•SY+ SX•SY
SX
SY
0
0
0
0
1
1
1
0
1
1
1
0
The sign bit of the product is simply determined with an EXOR circuit from the sign bits of multiplicand and multiplier. Mattausch, CMOS Design, H21/6/12
14
Circuits for Division -
Manual Division Process of Positive Binaries Combinational Divider Circuits for Positive Multi-Bit Binaries Handling of the Sign Bit
Mattausch, CMOS Design, H21/6/12
15
Manual Division Process (Positive Binaries) Example: 4-Bit Divisor and 7 Bit Dividend
General Algorithm for M-Bit Divisor and N-Bit Dividend Start
Quotient (Q): 1 0 0 1
D’=D•2N-1, Q=0, R=A
Divisor (D): 1000 1 0 0 1 0 1 0 Dividend (A): -1 0 0 0 1 1 1 -1
0 0 1 0 1 0 0 0 0
R=R-D’ yes
R≥0
Q=Q •21+20
Remainder (R): 1 0
no Q=Q •21 R=R+D’
D’=D’ •2-1 R≥D no End
yes
The division process recursively subtracts 2i•D from R (Rinitial=A). The sign of the result determines bit qi of Q. Mattausch, CMOS Design, H21/6/12
16
Basic Unit of a Combinational Divider Circuit The basic unit of a divider circuit has to provide a subtract, a quotient-bit dependent restore and a shift function for the divisor bit. Si
dj dj Ci+1 qk
Si
DC S’i
Ci
=
Ci+1
FA 1
dj
Ci 0
MUX qk S’i
dj
The quotient-bit-dependent restore can be realized with a multiplexer and the shift function can be realized by interconnecting the divisor bit to the next column. Mattausch, CMOS Design, H21/6/12
17
Combinational Array-Divider Circuit (Example for 6-bit dividend and 3-bit divisor) d2
d1
q5 q4
q3
d0
a5
DC
0 a 4
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
r2
r1
r0
q2 q1
Critical Path:
0
a3 0 a 2
q0
0
a1 0 a 0 0
The critical delay path of array-divider circuits (N-bit dividend, M-bit divisor) contains (N-M/2)•(M-1) full-adder stages. Mattausch, CMOS Design, H21/6/12
18
Handling of the Sign Bit in Divisions The sign bits SA and SD of dividend and divisor determine the sign bits Sq and SR of the quotient and remainder.
The sign bits Sq and SR of the quotient and remainder are only 1 (negative) if SA and SD are different.
A base2 = SA AM −1 AM − 2 o o o A3 A2 A1A 0 Dbase 2 = SD DN −1 DN − 2 o o o D3 D2 D1D0
SA
SD
SQ = SA =SA⊗SD
0
0
0
0
1
1
1
0
1
1
1
0
The sign bits of quotient and remainder are equal. An EXOR circuit of dividend and divisor sign bits determines them. Mattausch, CMOS Design, H21/6/12
19
Parallel Arithmetic for Increased Throughput -
Matrix Arrangement of Simple Arithmetic Modules (Processing Elements) Important Application: Picture Processing
Mattausch, CMOS Design, H21/6/12
20
Parallel Processing with Many Simple Elements
Memory
Input/ Output
Interconnect Unit
Input/Output of PE
Processing Element (PE)
Control
Datapath
Construct simple processing elements, which are optimized for a target application.
High performance applications can be realized with simple application-specific processing elements (PE). Each PE has the principle construction with datapath, control and memory. Mattausch, CMOS Design, H21/6/12
21
Parallel Processing Structures with PEs Global Data-Exchange Bus
Linear structure for parallel processing
PE
PE
PE
PE
PE
Local Data-Exchange between PEs
Matrix structure for parallel processing
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
The common structure for parallel processing with PEs are the linear structure and the matrix structure. Mattausch, CMOS Design, H21/6/12
22
Real-Time Picture Processing Needs a PE Matrix Picture Segmentation • Natural image is partitioned into meaningful regions • Important initial task for higher level image processing
Intelligent Information Processing
Picture PictureSegmentation Segmentation
Region RegionExtraction Extraction
Object Object Tracking Tracking
・ ・ ・
Object Object Recognition Recognition
Car Car Motion Motion
High performance image processing needs the parallel processing power of a processing matrix with PEs. Mattausch, CMOS Design, H21/6/12
23
Research at RCNS: Video-Picture Segmentation (RCNS: Research Center for Nanodevices and Systems)
P1
P2
P3 Interconnection Registers (Type 1)
Matrix-Processing Network
P4
P5
P6
P7
P8
P9
Processing Element (PE) Interconnection Registers (Type 2)
Color-Picture Segmentation Example
Video-picture segmentation is a hot research example for the necessity of parallel processing with a PE matrix. Mattausch, CMOS Design, H21/6/12
24
Video-Picture Segmentation Test-Chip Design (RCNS: Research Center for Nanodevices and Systems)
Interconnection registers
2.1mm
Cell-Network (10×10pixels) 2.6mm 4.31mm2
Type 2
Type 1 PE
Type 1
207μm Type 2
213μm
z
The CMOS test-chip has a processing matrix for 100 (10×10) pixels (Processing Elements: 100, Interconnection Registers: 121)
z
The area of the processing matrix is 4.31mm2
森本高志 (M2),原田洋明 (M1) の研究成果。 (システムLSIを実現するためのハード設計資産およびソフト設計 資産を対象とする、主要半導体メーカー10社等からの賞。)
Mattausch, CMOS Design, H21/6/12
25