Vlsi Physical Design Automation: Placement (3)

  • Uploaded by: api-3834272
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Vlsi Physical Design Automation: Placement (3) as PDF for free.

More details

  • Words: 1,983
  • Pages: 51
EE382V Fall 2006

VLSI Physical Design Automation Placement (3) Prof. David Pan [email protected] Office: ACES 5.434

10/18/08

1

Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods

• Timing and congestion consideration during placement • Newer trends

2

Timing Cost Critical Path

Delay of the circuit is defined as the longest delay among all possible paths from primary inputs to primary outputs. Interconnection delay becomes more and more important in deep sub-micron regime.

3

Timing Analysis netlist with delay for each gate

PI1

1

4

6

5

PO1

PI2

3

6

6

7

PO2

PI3

1

4 4

5

4

PO3

1

0 1

PI1

arrival times

0 PI2

3

0 PI3

1

7 4

3

1

13 5

6 9

6

6

4 7 4 7

5

15

14

7

4

18

22

18

PO1

PO2

PO3

4

Timing Analysis 1/5

0/4 1

PI1

arrival time/required time

0/0 PI2

3

0/8 1

PI3

slack = required time arrival time

3/3

1/9

1 0

PI2

3

8 PI3

4

6 4 4

1

8

5

6

6 7/15

5

15/15

14/18

7

4

18/22

22/22

18/22

PO1

PO2

PO3

7/13

2 4

0

13/15

9/9

4

4 PI1

7/9

2 5

6 0

6

6

4 8 4 6

5

0

4

7

4

4

0

4

PO1

PO2

PO3

5

Another example with interconnect delay – Same Timing Analysis 22 3 L A T C H

2

5

19

2

4

5

1

2

1

4

5

1 L A T C H

1

3

4

2

6

Timing Driven Placement Approaches • Path-based – Most accurate information – Very slow

• Budgeting – Inaccurate information – Hard to budget – Fast

• Net-based approach – Net-weighting

7

Net-Weighting • Basic approach – For more timing critical nets (i.e., smaller slack), assign higher net weights – Minimize

∑ w ⋅ net _ length(i), i

i

where

1 wi ∝ Si 8

Sensitivity Guided Netweighting for Placement Driven Synthesis

H. Ren, D. Z. Pan and D.S. Kung ISPD-04

10/18/08

9

Figure of Merit (FOM) • FOM is the total slack difference compared to a certain slack threshold for all timing end points. FOM =

t∈Po

∑ (Slk (t ) − Slk )

Slk ( t ) < Slk t

t

• Interpreted as the amount of work left for the physical synthesis engine or to the designers for manual fix. • FOM and WNS (worst negative slack) are the two most important metrics for timing closure in modern physical synthesis • However, FOM was not used to guide placement explicitly 10

Sensitivity Definitions • Net length sensitivity to net weight

S

• Net delay sensitivity to net length

L W

S

=

T L

• Net slack sensitivity to net weight:

S

Slk W

=−

ΔL ΔW

=

ΔT ΔL

ΔT ΔT ΔL T L T =− ⋅ ≡ −S L ⋅ S W ≡ −S W ΔW ΔL ΔW

• FOM sensitivity to net delay

S

FOM T

=

ΔFOM ΔT

• FOM sensitivity to net weight:

S

FOM W

=

ΔFOM ΔFOM ΔT FOM T = ⋅ ≡ ST ⋅ SW ΔW ΔT ΔW 11

Closed-Form Sensitivity • For net length to weight sensitivity, we have

S •

L W

W +Wsin k −2W =− L ⋅ src WsrcWsin k

For delay to wire length sensitivity, we have

ΔT S L = ΔL = rcL + cRd + rCl T

• Use switch-level RC and Elmore delay to illustrate the concept • Good enough during placement • Can be extended to more accurate models 12

FOM to Net Delay Sensitivity • • •

Question: suppose the delay of net i is reduced by a small amount ∆T(i), what is the impact to FOM? Define: K(i) to be the number of timing end points whose slack will change due to ∆T(i) Then, we have the following Theorem

S

FOM T

ΔFOM (i ) ≡ = − K (i ) ΔT (i )

13

K(i) Computation • Topologically sorted order from PO to PI • Only propagate K(i) to the most timing critical input pin (slack, K(i)) pair (-3, 2)

A

(-3, 2)

(-3, 1)

B

(-1.2, 1)

D

C (-0.8, 0)

(-0.8, 0)

(-3, 1)

(-3, 1)

Po1

(-1.2, 1)

Po2

14

Net Weight Generation • Put these sensitivities together and generate new net weight

[

∆W (i ) = β ( Slk t − Slk (i )) S wSlk (i ) + αSWFOM (i ) Worg (i )  W (i ) =  Worg (i ) + ∆W (i )

]

Slk (i ) > Slk t Slk (i ) ≤ Slk t

15

Experiments • We compare the placement and physical synthesis results of three different algorithms on 7 industry chips (up to 444k movable objects) from IBM – WL: wire length driven placement with uniform weight – TS: timing driven placement using slack sensitivity – TSF: timing driven placement using both slack and FOM sensitivity

16

Timing after Placement FOM Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average

ZW -9134 0 -535 -322 -114 -142 -4

WL -41650 -6966 -13711 -8057 -28527 -20257 -452

TS -26093 -4102 -6468 -4024 -15334 -9417 -248

TSF -25602 -3454 -5595 -3440 -12229 -9536 -131

Improvement TS TSF 48% 49% 41% 50% 55% 62% 52% 60% 46% 57% 54% 53% 46% 72% 49% 58%

TSF -4.254 -1.754 -3.788 -3.605 -2.002 -4.856 -0.432

Improvement TS TSF 63% 44% 37% 38% 30% 27% 55% 58% 34% 45% 0% 12% 37% 54% 37% 40%

WNS Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average

ZW -1.702 0.248 -0.55 -0.941 -0.102 -0.508 0.16

WL -6.274 -2.977 -4.997 -7.218 -3.575 -5.47 -1.135

TS -3.392 -1.784 -3.684 -3.736 -2.379 -5.484 -0.66

17

Timing after Physical Synthesis Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7

FOM WL TS -7829 -6086 -2059 -384 -1854 -405 -2537 -1844 -4732 -2726 -1481 -541 -94 -8 Average

Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7

WNS WL TS -0.834 -0.743 -0.705 -0.011 -0.701 -0.139 -2.156 -1.908 -0.472 -0.443 -0.36 -0.293 -0.097 0.182 Average

TSF -5170 -631 -422 -1770 -1819 -266 0

Improvement TS TSF 22% 34% 81% 69% 78% 77% 27% 30% 42% 62% 63% 82% 91% 100% 58% 65%

TSF -0.739 -0.073 -0.19 -1.9 -0.341 -0.351 0.283

Improvement TS TSF 11% 11% 98% 90% 80% 73% 12% 12% 6% 28% 19% 3% 100% 100% 47% 45% 18

Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods

• Timing and congestion consideration • Newer trends

19

Congestion Minimization • Traditional placement problem is to minimize interconnection length (wirelength) • A valid placement has to be routable • Congestion is important because it represents routability (lower congestion implies better routability) • There is not yet enough research work on the congestion minimization problem

20

Definition of Congestion Routing demand = 3 Assume routing supply is 1, overflow = 3 - 1 = 2 on this edge.

Overflow on each edge = Routing Demand - Routing Supply (if Routing Demand > Routing Supply) 0 (otherwise)

Overflow =

Σ

overflow

all edges

21

Correlation between Wirelength and Congestion

Total Wirelength = Total Routing Demand 22

Wirelength ≠ Congestion

A congestion minimized placement

A wirelength minimized placement

23

Congestion Map of a Wirelength Minimized Placement Congested Spots

24

Congestion Reduction Postprocessing

Reduce congestion globally by minimizing the traditional wirelength

Post process the wirelength optimized placement using the congestion objective

25

An Effective Congestion Driven Placement Framework

André Rohe University of Bonn, Germany joint work with Ulrich Brenner ISPD 2002 (Best Paper) 10/18/08

26

A dense Placement

• good wirelength • impossible to route 27

Possible Solution

• easy to route • bad wirelength/timing 28

Congestion Driven Placement

• easy to route + good wirelength almost no extra computation efford ! 29

Overall Algorithm: Bonn Place • Partitioning based approach • Solves QP in each level, followed by partitioning • Partitioning is done by quadrisection: circuits are partitioned with minimum movement (Vygen)

30

Methods used for congestion driven placement • Very fast congestion calculation • Inflate circuits in congested regions • Spreading inflated cells

31

Congestion calculation

• Calculate Steiner Tree for each net • Probablitiy estimation for each 2-point connection (similar to Hung & Flynn, Lou et al.)

32

Quality of congestion calculation

congestion estimation 33

Quality of congestion calculation Bonn Global

HDP Global

34

Inflation of circuits (used previously by Hou et al.)

• Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by up to 100% • The inflation is based on the congestion in R and the surrounding regions & the pin density in R • Deflation is possible if the circuit is no longer critical.

35

Placement Step 0

36

Placement Step 1

37

Placement Step 2

38

Placement Step 3

39

Placement Step 4

40

Placement Step 5

41

Placement Step 6

42

Placement Step 7

43

Spreading inflated cells

• Repartitioning considers 2x2 windows in placement grid to optimize netlength • Use extra repartitioning step to move cells away from overloaded regions

44

Summary: Algorithm overview •

Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits}



Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w)

Repartitioning •

Legalization

45

Algorithm overview •

Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)



Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w)

Repartitioning •

Legalization

46

Algorithm overview •

Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)



Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Repartitioning



Legalization

47

Algorithm overview •

Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c)



Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Reduce overloaded windows # extra repartitioning steps Repartitioning



Legalization

48

Computational Results Standard Chip

CPU

Congestion Driven len

CPU

len

Blow

IBM 1

0:23 h

7.2 m

0:26 h

7.4 m

10.2 %

IBM 2

0:26 h

7.9 m

0:27 h

9.0 m

6.6 %

IBM 3

3:50 h

134 m

4:39 h

142 m

20.1 %

IBM 4

7:08 h

241 m

7:24 h

270 m

20.2 %

IBM 5

16:10 h

375 m

16:37 h

406 m

57.8 %

+8.7 %

+8.5%

Mean

49

Computational Results II Standard Chip

HDP

ov

CPU

Congestion Driven len

HDP

ov

CPU

len

IBM 1

81.7

8374

0:15 h

9m

75.5

0

0:05 h

7.5 m

IBM 2

82.7

7000

0:19 h

11.5 m

75.4

0

0:05 h

10.1 m

IBM 3

88.8

78111

47:36 h

162 m

77.3

0

4:51 h

164 m

IBM 4

82.8

972

7:18 h

324 m

75.2

0

2:48 h

326 m

IBM 5

89.9

14382

70:57 h

512 m

84.2

0

29:48 h

527 m

-73 %

-5.2 %

Mean

-9 %

50

Summary • In this module, we cover two important concepts during placement to consider besides wire length – Timing driven placement, using net-weighting • A new sensitivity based net weighting in ISPD’04 paper

– Congestion minimization (using ISPD’02 as an example) • congestion estimation • Inflate cells in congested region • Spread inflated cells

51

Related Documents