EE382V Fall 2006
VLSI Physical Design Automation Placement (3) Prof. David Pan
[email protected] Office: ACES 5.434
10/18/08
1
Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods
• Timing and congestion consideration during placement • Newer trends
2
Timing Cost Critical Path
Delay of the circuit is defined as the longest delay among all possible paths from primary inputs to primary outputs. Interconnection delay becomes more and more important in deep sub-micron regime.
3
Timing Analysis netlist with delay for each gate
PI1
1
4
6
5
PO1
PI2
3
6
6
7
PO2
PI3
1
4 4
5
4
PO3
1
0 1
PI1
arrival times
0 PI2
3
0 PI3
1
7 4
3
1
13 5
6 9
6
6
4 7 4 7
5
15
14
7
4
18
22
18
PO1
PO2
PO3
4
Timing Analysis 1/5
0/4 1
PI1
arrival time/required time
0/0 PI2
3
0/8 1
PI3
slack = required time arrival time
3/3
1/9
1 0
PI2
3
8 PI3
4
6 4 4
1
8
5
6
6 7/15
5
15/15
14/18
7
4
18/22
22/22
18/22
PO1
PO2
PO3
7/13
2 4
0
13/15
9/9
4
4 PI1
7/9
2 5
6 0
6
6
4 8 4 6
5
0
4
7
4
4
0
4
PO1
PO2
PO3
5
Another example with interconnect delay – Same Timing Analysis 22 3 L A T C H
2
5
19
2
4
5
1
2
1
4
5
1 L A T C H
1
3
4
2
6
Timing Driven Placement Approaches • Path-based – Most accurate information – Very slow
• Budgeting – Inaccurate information – Hard to budget – Fast
• Net-based approach – Net-weighting
7
Net-Weighting • Basic approach – For more timing critical nets (i.e., smaller slack), assign higher net weights – Minimize
∑ w ⋅ net _ length(i), i
i
where
1 wi ∝ Si 8
Sensitivity Guided Netweighting for Placement Driven Synthesis
H. Ren, D. Z. Pan and D.S. Kung ISPD-04
10/18/08
9
Figure of Merit (FOM) • FOM is the total slack difference compared to a certain slack threshold for all timing end points. FOM =
t∈Po
∑ (Slk (t ) − Slk )
Slk ( t ) < Slk t
t
• Interpreted as the amount of work left for the physical synthesis engine or to the designers for manual fix. • FOM and WNS (worst negative slack) are the two most important metrics for timing closure in modern physical synthesis • However, FOM was not used to guide placement explicitly 10
Sensitivity Definitions • Net length sensitivity to net weight
S
• Net delay sensitivity to net length
L W
S
=
T L
• Net slack sensitivity to net weight:
S
Slk W
=−
ΔL ΔW
=
ΔT ΔL
ΔT ΔT ΔL T L T =− ⋅ ≡ −S L ⋅ S W ≡ −S W ΔW ΔL ΔW
• FOM sensitivity to net delay
S
FOM T
=
ΔFOM ΔT
• FOM sensitivity to net weight:
S
FOM W
=
ΔFOM ΔFOM ΔT FOM T = ⋅ ≡ ST ⋅ SW ΔW ΔT ΔW 11
Closed-Form Sensitivity • For net length to weight sensitivity, we have
S •
L W
W +Wsin k −2W =− L ⋅ src WsrcWsin k
For delay to wire length sensitivity, we have
ΔT S L = ΔL = rcL + cRd + rCl T
• Use switch-level RC and Elmore delay to illustrate the concept • Good enough during placement • Can be extended to more accurate models 12
FOM to Net Delay Sensitivity • • •
Question: suppose the delay of net i is reduced by a small amount ∆T(i), what is the impact to FOM? Define: K(i) to be the number of timing end points whose slack will change due to ∆T(i) Then, we have the following Theorem
S
FOM T
ΔFOM (i ) ≡ = − K (i ) ΔT (i )
13
K(i) Computation • Topologically sorted order from PO to PI • Only propagate K(i) to the most timing critical input pin (slack, K(i)) pair (-3, 2)
A
(-3, 2)
(-3, 1)
B
(-1.2, 1)
D
C (-0.8, 0)
(-0.8, 0)
(-3, 1)
(-3, 1)
Po1
(-1.2, 1)
Po2
14
Net Weight Generation • Put these sensitivities together and generate new net weight
[
∆W (i ) = β ( Slk t − Slk (i )) S wSlk (i ) + αSWFOM (i ) Worg (i ) W (i ) = Worg (i ) + ∆W (i )
]
Slk (i ) > Slk t Slk (i ) ≤ Slk t
15
Experiments • We compare the placement and physical synthesis results of three different algorithms on 7 industry chips (up to 444k movable objects) from IBM – WL: wire length driven placement with uniform weight – TS: timing driven placement using slack sensitivity – TSF: timing driven placement using both slack and FOM sensitivity
16
Timing after Placement FOM Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average
ZW -9134 0 -535 -322 -114 -142 -4
WL -41650 -6966 -13711 -8057 -28527 -20257 -452
TS -26093 -4102 -6468 -4024 -15334 -9417 -248
TSF -25602 -3454 -5595 -3440 -12229 -9536 -131
Improvement TS TSF 48% 49% 41% 50% 55% 62% 52% 60% 46% 57% 54% 53% 46% 72% 49% 58%
TSF -4.254 -1.754 -3.788 -3.605 -2.002 -4.856 -0.432
Improvement TS TSF 63% 44% 37% 38% 30% 27% 55% 58% 34% 45% 0% 12% 37% 54% 37% 40%
WNS Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average
ZW -1.702 0.248 -0.55 -0.941 -0.102 -0.508 0.16
WL -6.274 -2.977 -4.997 -7.218 -3.575 -5.47 -1.135
TS -3.392 -1.784 -3.684 -3.736 -2.379 -5.484 -0.66
17
Timing after Physical Synthesis Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7
FOM WL TS -7829 -6086 -2059 -384 -1854 -405 -2537 -1844 -4732 -2726 -1481 -541 -94 -8 Average
Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7
WNS WL TS -0.834 -0.743 -0.705 -0.011 -0.701 -0.139 -2.156 -1.908 -0.472 -0.443 -0.36 -0.293 -0.097 0.182 Average
TSF -5170 -631 -422 -1770 -1819 -266 0
Improvement TS TSF 22% 34% 81% 69% 78% 77% 27% 30% 42% 62% 63% 82% 91% 100% 58% 65%
TSF -0.739 -0.073 -0.19 -1.9 -0.341 -0.351 0.283
Improvement TS TSF 11% 11% 98% 90% 80% 73% 12% 12% 6% 28% 19% 3% 100% 100% 47% 45% 18
Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods
• Timing and congestion consideration • Newer trends
19
Congestion Minimization • Traditional placement problem is to minimize interconnection length (wirelength) • A valid placement has to be routable • Congestion is important because it represents routability (lower congestion implies better routability) • There is not yet enough research work on the congestion minimization problem
20
Definition of Congestion Routing demand = 3 Assume routing supply is 1, overflow = 3 - 1 = 2 on this edge.
Overflow on each edge = Routing Demand - Routing Supply (if Routing Demand > Routing Supply) 0 (otherwise)
Overflow =
Σ
overflow
all edges
21
Correlation between Wirelength and Congestion
Total Wirelength = Total Routing Demand 22
Wirelength ≠ Congestion
A congestion minimized placement
A wirelength minimized placement
23
Congestion Map of a Wirelength Minimized Placement Congested Spots
24
Congestion Reduction Postprocessing
Reduce congestion globally by minimizing the traditional wirelength
Post process the wirelength optimized placement using the congestion objective
25
An Effective Congestion Driven Placement Framework
André Rohe University of Bonn, Germany joint work with Ulrich Brenner ISPD 2002 (Best Paper) 10/18/08
26
A dense Placement
• good wirelength • impossible to route 27
Possible Solution
• easy to route • bad wirelength/timing 28
Congestion Driven Placement
• easy to route + good wirelength almost no extra computation efford ! 29
Overall Algorithm: Bonn Place • Partitioning based approach • Solves QP in each level, followed by partitioning • Partitioning is done by quadrisection: circuits are partitioned with minimum movement (Vygen)
30
Methods used for congestion driven placement • Very fast congestion calculation • Inflate circuits in congested regions • Spreading inflated cells
31
Congestion calculation
• Calculate Steiner Tree for each net • Probablitiy estimation for each 2-point connection (similar to Hung & Flynn, Lou et al.)
32
Quality of congestion calculation
congestion estimation 33
Quality of congestion calculation Bonn Global
HDP Global
34
Inflation of circuits (used previously by Hou et al.)
• Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by up to 100% • The inflation is based on the congestion in R and the surrounding regions & the pin density in R • Deflation is possible if the circuit is no longer critical.
35
Placement Step 0
36
Placement Step 1
37
Placement Step 2
38
Placement Step 3
39
Placement Step 4
40
Placement Step 5
41
Placement Step 6
42
Placement Step 7
43
Spreading inflated cells
• Repartitioning considers 2x2 windows in placement grid to optimize netlength • Use extra repartitioning step to move cells away from overloaded regions
44
Summary: Algorithm overview •
Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits}
•
Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w)
Repartitioning •
Legalization
45
Algorithm overview •
Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
•
Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w)
Repartitioning •
Legalization
46
Algorithm overview •
Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
•
Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Repartitioning
•
Legalization
47
Algorithm overview •
Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)
•
Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Reduce overloaded windows # extra repartitioning steps Repartitioning
•
Legalization
48
Computational Results Standard Chip
CPU
Congestion Driven len
CPU
len
Blow
IBM 1
0:23 h
7.2 m
0:26 h
7.4 m
10.2 %
IBM 2
0:26 h
7.9 m
0:27 h
9.0 m
6.6 %
IBM 3
3:50 h
134 m
4:39 h
142 m
20.1 %
IBM 4
7:08 h
241 m
7:24 h
270 m
20.2 %
IBM 5
16:10 h
375 m
16:37 h
406 m
57.8 %
+8.7 %
+8.5%
Mean
49
Computational Results II Standard Chip
HDP
ov
CPU
Congestion Driven len
HDP
ov
CPU
len
IBM 1
81.7
8374
0:15 h
9m
75.5
0
0:05 h
7.5 m
IBM 2
82.7
7000
0:19 h
11.5 m
75.4
0
0:05 h
10.1 m
IBM 3
88.8
78111
47:36 h
162 m
77.3
0
4:51 h
164 m
IBM 4
82.8
972
7:18 h
324 m
75.2
0
2:48 h
326 m
IBM 5
89.9
14382
70:57 h
512 m
84.2
0
29:48 h
527 m
-73 %
-5.2 %
Mean
-9 %
50
Summary • In this module, we cover two important concepts during placement to consider besides wire length – Timing driven placement, using net-weighting • A new sensitivity based net weighting in ISPD’04 paper
– Congestion minimization (using ISPD’02 as an example) • congestion estimation • Inflate cells in congested region • Spread inflated cells
51