Master Thesis Complete.docx

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Master Thesis Complete.docx as PDF for free.

More details

  • Words: 6,989
  • Pages: 60
INTRODUCTION

Variation in performance is becoming an objective for critical physical design, which is increasing in recent years. There is significant variation in the parameters as the VLSI technology is scaling into nanometer design domain. In this paper an efficient static timing analysis is reported, it features the various method of fixing the setup and hold timing violations for various paths. In semiconductor devices, metal interconnection traces are used to make links between various parts of the circuit to analyze the design. As the process technology is becoming smaller day-to-day, these metal traces affect the performance of the design. For submicron, nanometer process technologies, the coupling present in the interconnection induces noise and crosstalk - both of which can limit the operating speed of design. The noise and coupling effects are not significant at older generation technologies, they play very important role in nanometer technologies. Thus, the physical design process should even consider the effect of crosstalk and noise and the design verification process should then comprise the effects of crosstalk and noise.

Chapter1: DESIGN FLOW 1.1 VLSI design flow

Figure 1.1 The VLSI flow cycle begins with a normal specification of a VLSI chip, it follows through various phases, and produces a complete packaged chip. A design cycle can be represented by the flow chart as shown in Figure. Our main goal is on the physical design of the VLSI design cycle. However, to gain overall perspective, we can briefly outline all the process going on the VLSI design cycle.

1.1.1 System Specification: The very first phase of the VLSI design process is the specifications of the system. System specification is the high-level description of the system. The features to be examined in this process involve: execution, functionality, and physical dimensions (size of the chip). The fabrication technology and design techniques are also examined. The specification of a system is an agreement between market needs, technology and cost. The extreme results are the specifications for size, speed, power, and functionality of the system. 1.1.2. Architectural Design: The elementary architecture of the system is proposed in this phase. This involves, decisions such as RISC (Reduced Instruction Set Computer) vs. CISC (Complex Instruction Set Computer), total number of ALUs, Floating Point sections, number of pipelines, and size of caches midst of others. The effect of this architectural design is a Micro-Architectural Specification (MAS). While MAS is a documentary description, architects can forecast the performance, power and die size of the design built on such a description. 1.1.3. Behavioral or Functional Design: In this phase, the main functional blocks of the system are acknowledged. This also categorizes the interconnect necessities between the blocks. The area, power, and additional parameters of every unit are assessed. The behavioral features of the system are examined without the execution of specific information. For example, it may stipulate that

multiplication is required, but in which mode such multiplication to be executed is not stated. We may custom a range of multiplication hardware subjecting on the speed and word size necessity. The essential concept is to identify the behavior, in terms of input, output and timing of every block, without enumerating its internal structure. The outcome of functional design is generally a timing diagram or other relationships between each block. This leads to the development of overall design process and decrease the complexity of succeeding phases. Functional or behavioral design delivers fast simulation of the system and permits quick debugging of the complete system. Behavioral design is mainly a labor-intensive phase with slight or no automation aid. 1.1.4. Logic Design: In this phase the control flow, word widths, register distribution, arithmetic processes, and logic processes of the design that signify the functional design are developed and verified. This description is called Register Transfer Level (RTL) description. RTL is stated in a Hardware Description Language (HDL), such as VHDL or Verilog. This description can be used in simulation and verification. This description involves Boolean expressions and timing data. The Boolean expressions are curtailed to attain the tiniest logic design which adapts to the functional design. This logic design of the system is simulated and verified to validate its correctness. In some unusual cases, logic design can be computerized by means of high-level synthesis tools. These tools yield a RTL description from behavioral description. 1.1.5. Circuit Design: The function of circuit design is to extend a circuit demonstration built on the logic design. The Boolean expressions are transformed into a circuit demonstration by taking into account the speed and power

necessities of the original design. Circuit Simulation is used to validate the correctness and timing of respective unit. The circuit design is generally stated in a circuit diagram. This diagram illustrates the circuit elements (cells, macros, gates, transistors) and interconnection among these elements. This demonstration is also called a netlist. In many instances, a netlist can be formed automatically from logic (RTL) report by using logic synthesis tools. 1.1.6. Physical Design: In this phase the netlist is transformed into a geometric demonstration. As specified, this geometric demonstration of a circuit is called a layout. Layout is formed by changing each logic component (cells, macros, gates, transistors) into a geometric demonstration (specific shapes in multiple layers), which operates as the proposed logic function. Interconnections amongst different components are also stated as geometric patterns in several layers. The precise facts of the layout also hinge on the design rules, which are the strategies, of the fabrication method and the electrical properties of fabrication resources. Physical design is the actual complex process and therefore it is generally split down into numerous sub-phases. Several verification and validation inspections are performed on the layout throughout physical design. In most of the cases, physical design can be fully or partly automated and layout can be produced straightly from netlist by Layout Synthesis tools. While layout synthesis tools are fast, do have an area and performance drawback, which bound their use to certain designs. Labor-intensive layout, while slow and manually concentrated, does have better area and performance as associated to synthesized layout. Though, this advantage may dispel, as larger and larger designs may

challenge human capability to realize and get globally enhanced solutions. 1.1.7. Fabrication: Subsequent to layout and verification, the design is set for fabrication. Since layout data is sent to fabrication on a tape, the occasion of announcement of data is called Tape Out. Layout data is fractured into photo-lithographic masks, one for individual layer. Masks recognize places on the wafer, where certain materials need to be dropped, subtle or even detached. Silicon crystals are fully-fledged and sliced to yield wafers. Very small dimensions of VLSI devices involve that the wafers be polished nearly to perfection. The fabrication process involves numerous phases consisting of deposition, and diffusion of several materials on the wafer. During each phase single mask is used. Quite a few dozen masks may be used to complete the fabrication process. A large wafer is 20 cm (8 inch) in dia. and can be run down to yield thousands of chips, subject to the size of the chip. A prototype is prepared and verified before the chip is formed in mass. Industry is briskly progressing towards a 30 cm (12 inch) wafer permitting even additional chips per wafer escorting to lower cost per chip. 1.1.8. Packaging, Testing and Debugging: To conclude, the wafer is fabricated and cubed into individual chips in a fabrication resource. Each chip is then wrapped and verified to certify that it encounters all the design specifications and that it functions appropriately. Chips used in Printed Circuit Boards (PCBs) are packed in Dual In-line Package (DIP), Pin Grid Array (PGA), Ball Grid Array (BGA), and Quad Flat Package (QFP). Chips used in Multi-Chip Modules (MCM) are not packed, since MCMs use exposed chips.

1.2 ASIC DESIGN FLOW

Physical design is grounded on a netlist which is the result of Synthesis process. Synthesis translates the RTL design generally coded in VHDL or Verilog HDL to gate-level sketches which the tools can understand. This netlist comprises data on the cells used, interconnections, area and other details. Distinctive synthesis tools are: 

Cadence RTL Compiler/Build Gates/Physically Knowledgeable Synthesis (PKS)



Synopsys Design Compiler

Throughout the synthesis process, constraints are put on to ensure that the design encounters the essential functionality and specifications. Only after the netlist is validated for functionality and timing, it is directed to the flow of physical design. 1.2.1 Partitioning Partitioning is a course of isolating the chip into small units. This is done mainly to split dissimilar functional blocks and to formulate placement

and routing easier. Partitioning can be completed in the RTL design phase when the designer subdivides the entire design into sub-blocks and then continues to design every module. These modules are connected in the TOP LEVEL module. This type of partitioning is usually stated to as Logical partitioning. 1.2.2 Floorplanning The initial phase in the physical design flow is floorplanning. Floorplanning is the method of recognizing structures that should be placed close to each other and assigning space for the structures in such a way to meet the now and then-conflicting goals of existing space (cost of the chip), required performance, and the desire to have the whole thing close. Based on the design area and the hierarchy, a proper floorplan is determined. Floorplanning considers the macros used in the design, memory, the routing possibilities, and the area of the whole design. Floorplanning also determines the IO structure and aspect ratio of the design. A bad floorplan will top to consumption of die area and routing bottleneck. In various design policies, area and speed are the themes of trade-offs. This is due to inadequate routing resources, slower the operation as more resources are used. Optimizing for least area lets the design to use less resources, and for better proximity of the units of the design. This tops to concise interconnect distances, less routing resources used, quicker end-to-end signal paths, and even earlier and more reliable place and route times. There are no negatives to floorplanning, if done correctly.

Generally, data-path segments aid most from floorplanning, while random logic, state machines, and other non-structured logic can securely be left to the placer segment and route software. Data paths are usually the areas of the design where several bits are handled in parallel with each bit being altered the same way with, maybe some impact from adjacent bits. For example, structures that make data paths are Adders, Registers, Counters and Muxes. 1.2.3 Placement Afore the beginning of placement optimization all Wire Load Models (WLM) are detached. Placement uses RC values for timing calculation. Virtual Routing is the direct Manhattan distance between pins. VR RCs are more precise than WLM RCs. Placement is achieved in four optimization phases: 1. Pre-placement optimization 2. In placement optimization 3. Post Placement Optimization (PPO) before clock tree synthesis (CTS) 4. PPO after CTS. 

Pre-placement Optimization develops the netlist earlier to placement. It can downsize the cells and HFNs (High Fanout Nets) are collapsed.



Based on VR, In-placement optimization re-optimizes the logic. This can accomplish cell sizing, gate duplication, cell moving, cell bypassing, area recovery, buffer insertion. Optimization performs repetition of setup fixing, incremental timing and bottleneck driven placement.



Post placement optimization before CTS accomplishes netlist optimization with idyllic clocks. It can fix violations pertaining to setup, hold, max trans/cap. Based on global routing, it can do placement optimization. It even re does the HFN synthesis.



With the propagated clock, post placement optimization after CTS optimizes timing. It attempts to maintain the clock skew.

1.2.4 Clock tree synthesis The aim of clock tree synthesis (CTS) is to reduce skew and insertion delay. Clock is not transmitted before CTS. After CTS hold slack should progress. Clock tree begins at .sdc stated clock source and closes at stop pins of flop. There are two stop pins known as ignore pins and sync pins. 'Don't touch' circuits and pins in front end are considered as 'ignore' circuits or pins at back end. 'Ignore' pins are disregarded for timing analysis. If clock is separated, then distinct skew analysis is necessary. 

Global skew accomplishes zero skew amongst two synchronous pins short of reflecting logic relationship.



Local skew accomplishes zero skew amongst two synchronous pins while reflecting logic relationship.



If clock is skewed purposely to increase setup slack, then it is useful skew.

Firmness is the term coined in Astro to specify the slackening of constraints. Higher the firmness constricted is the constraints. In clock tree optimization (CTO) clock can be safeguarded so that noise is not joined to other signals. But shielding upsurges area by 12 to 15%. Meanwhile, the clock signal is global in nature so similar metal layer is used for power routing for clock. CTO is accomplished by buffer sizing, gate sizing, buffer relocation, level adjustment and HFN synthesis. We

try to advance setup slack in pre-placement, in placement and post placement optimization afore CTS stages while ignoring hold slack. In post placement optimization after CTS hold slack is enhanced. As a result of CTS portion of buffers are added. Normally, for 200k gates around 1300 buffers are in addition. 1.2.5 Signal Routing There are two categories of routing in the physical design course, global routing and detailed routing. Global routing assigns routing resources for connections. It even does track assignment for certain net. Complete routing does the authentic connections. Unlike constraints which should be taken care during the routing are DRC, timing, wire length etc. 1.2.6 Physical verification Physical verification tests the precision of the constructed layout design. This includes validating that the layout 

All technology requirements are compiled – Design Rule Checking (DRC)



Whether it is consistent with the original netlist – Layout vs. Schematic (LVS)



Antenna Rule Checking.



Density verification at the top level. Cleaning density is a very crucial phase in lower technology.



Electrical Rule Checking (ERC).

CHAPTER 2: STATIC TIMING ANALYSIS 2.1 What is meant by Static Timing Analysis? Static Timing Analysis (STA) is one among the several techniques offered to validate the timing of a digital design. An alternative approach used to validate the timing is the timing simulation which can validate the functionality and the timing of the complete design. The idiom timing analysis is used to denote to either of the two means-static timing analysis and the timing simulation. Thus, timing analysis signifies the analysis of the design for the issues in timing. STA is static, because the analysis of the design is done statically, and it does not hinge on any data values which is being applied at the input ports. This differs with simulation-based timing analysis where a stimulus is put on the input signals, following behavior is noticed and tested, then time is well along with new input stimulus, and the new behavior is noticed and verified. For example, given a design with a set of input clock definitions and the external environment definitions, the idea of static timing analysis is to verify whether the design can operate at the expected speed. That is, the design can work at a particular frequency without any violations. Figure 1-1 illustrates the functionality of static timing analysis. Some examples of timing reviews are setup and hold. A setup check ensures that the data arrives before the clock edge, similarly, hold check makes sure that the data is held for a minimum amount of time so that it can be captured correctly. These checks make sure that the correct data is available and ready for capture and can be latched in for new state.

Figure 1-1 Static Timing Analysis

The most significant characteristic of static timing analysis is that the complete design is examined once, and the essential timing checks are carried for all the probable paths and states of the design. Thus, STA is a perfect and intensive method for verifying the timing of the design. The design which is under analysis is usually specified using a hardware description language or Verilog. The external environment, comprising the clock definitions, are stated normally using SDC3 or a corresponding format. SDC is a timing constraint description language. The timing details are in ASCII form, usually with several columns, with each column indicating one attribute of the delay path. 2.2 Why STA? Static timing analysis is a whole and thorough verification of all timing checks of the design. Other timing analysis approaches such as simulation can validate only a part of the design which gets implemented by stimulus. Authentication through timing simulation is exhaustive as

the test vectors employed. To simulate and validate all timing conditions with 10-100 million gates is very deliberate and the timing also could not be tested completely. Hence, it’s difficult to do thorough verification over simulation. Static timing analysis even provides a fastest way for checking and analyzing the timing paths in the design for any timing violations. If the complication of present day ASICs4 are 10 to 100 million gates, the static timing analysis has become a requirement to comprehensively validate the timing of the complete design. The functionality and the performance of the device can be limited by noise. This noise occurs due to crosstalk between the different signals or due to noise on the device inputs or power supply. The frequency of operation of the design is impacted by the noise and it can even bring functional failures. Thus, a design implementation should be validated to be hardy which means it can resist the noise without causing any effect on the esteemed performance of the design. Verification constructed upon logic simulation cannot switch the consequences of noise, crosstalk and on-chip variations. The scrutiny approaches described here is not only the traditional timing analysis methods but also noise analysis to attest the design includes the upshots of noise. 2.3 STA employed at Different Design Phases: STA is carried out at the logical level using: i. ii.

Based on wire load model we use interconnect or ideal interconnect. Latencies and jitter estimated ideal clocks.

STA in the physical design phase, can be achieved using:

i.

ii. iii.

Interconnect - estimated from global routing in which it can range, real routes with rough extraction, or real routes with signoff precision. Clock trees-with actual clock trees. Including the effect of crosstalk and without its effect.

2.4 Drawbacks of Static Timing Analysis: Under all possible situations, While the timing and noise analysis do an outstanding job of analyzing a design for its issues in timing, still the state-of-the-art does not permit STA to substitute simulation completely. This is due to some traits of timing verification that cannot, but can be completely captured and verified in STA. Some of the limitations of STA are: i.

ii.

Reset sequence: After an asynchronous or synchronous reset to check whether all flip-flops are reset into the essential logical values. These are something that cannot be tested using static timing analysis. The chip may sometime not come out of the reset state. This is because some specified declarations such as the initial values on signals will not be synthesized and can only be validated during simulation. X-handling: The STA techniques pact only with the logical domain of state -0 and state-1 (rise and fall), high and low. Consider an unknown value X in the design, which triggers indeterminate values to circulate through the design, which cannot be tested using STA. Even though the noise analysis in STA can examine and transmit the glitches within the design, the simulation-based timing verification for nanometer design differ with the X-handling since, the possibility of glitch analysis and proliferation is different.

iii. iv.

v.

vi.

vii.

PLL settings: Loading or setting the PLL configurations may not be done properly. Clock domain crossings (Asynchronous): STA cannot not check whether the correct clock synchronizers are used in the design. Wherever the asynchronous clock domain crossings are present, other tools should ensure that the correct clock synchronizers are at hand. IO interface timings: The IO interface requirements cannot be specified in terms of STA constraints only. For example, the designer may select thorough circuit level simulation for the DDR1 interface using SDRAM simulation representation. The simulation ensures that the memories are read from and written to with acceptable margin, and that the DLL2, can be organized to align the signals wherever necessary. Interfaces amongst the analog and digital blocks: Since the analog blocks are not dealt with STA, the verification methodology needs to guarantee that the connectivity is correct between these two blocks. False paths: The static timing analysis ensures that the timing among the logic path encounters all the constraints and labels the violations if the timing along the logic path does not meet the obligatory specifications. In most of the cases, the STA may label a logical path as a failing path, even though the logic may not be able to traverse through the path. This will happen if the system application cannot utilize such path or if a commonly inconsistent conditions are used through the sensitization of the failing path. Such paths are known as false paths. The STA results are better when appropriate timing constraints includes false path and multicycle path in the design. In most of the cases, the designer can make use of the

integral knowledge and can lay down the constraints to eliminate the false paths are during STA. viii. FIFO pointers are out of synchronization: If the two finite state machines are to be synchronous but are actually out of synchronization, STA will not be able to detect the problem. During this functional simulation, the two finite state machines can be always synchronized and can be changed together in lock-phase. Still, after delays are inspected, it is to be more likely for one of the finite state machines to be out of synchronization with another, possible because each finite state machine comes out of reset earlier than another. STA will not be able to detect such situations. ix. Clock synchronization logic: If the clock generation logic does not tie with the clock definition STA cannot detect such problem. STA imagines that the clock generator will be responsible for the delivery of waveform as detailed in the clock definition. There can also be a bad optimization made on the clock generator logic that causes, for example, a larger delay to be inserted in some of the paths that would not have been constrained properly. In turn, the extra logic may modify the duty cycle of the clock. The STA cannot detect these possible conditions. x. Functional behavior through the clock cycles: Changes across clock cycles cannot be modelled or simulated by STA. Despite all such issues, STA is used to validate the timing of the complete design and to check corner cases, simulation is used as a backup and more basically to validate the normal functional modes of the design.

Chapter 3: Concepts of Static timing analysis 3.1 Propagation Delay The propagation delay of a cell is described with reverence to some measurement points on the interchanging waveforms. Such points are explained using the below four variables: # Threshold of input falling edge: input_threshold_pct_fall : 50.0; # Threshold of input rising edge: input_threshold_pct_rise : 50.0; # Threshold of output falling edge: output_threshold_pct_fall : 50.0; # Threshold of output rising edge: output_threshold_pct_rise : 50.0; These variables are a function of the command set which is used to define a cell library………. These threshold specifications are in terms of the percent of Vdd, or the power supply. Typically, 50% threshold is used for delay measurement for most standard cell libraries. Risingedgeisthetransitionfromlogic-0tologic-1. Falling edge is the transition from logic-1 to logic-0. Consider the example inverter cell and the waveforms at its pins shown in Figure2-11. The propagation delays are represented as: i. Output fall delay (Tf) ii. Output rise delay (Tr) In general, these two values are different. Figure 2-1showshowthesetwo propagation delays are measured.

Figure 3-1 Propagation delay If we were looking at ideal waveforms, propagation delay would simply be the delay between the two edges. This is shown in Figure2-2.

Figure 3-2 Propagation delay using ideal waveforms 2.5 Slew of a Waveform A slew rate is defined as a rate of change. In static timing analysis, the risingorfallingwaveformsaremeasuredintermsofwhetherthetransitionis slow or fast. The slew is typically measured in terms of the transition time, that is, the time it takes for a signal to transition between two specific levels. Note that the transition time is actually inverse of the slew rate - the larger the transition time, the slower the slew, and vice versa. Figure 2-10 illustrates a typical waveform at the output of a CMOS cell. The waveforms at the ends are asymptotic and it is hard to determine the exact start and end points of the transition. Consequently, the transition time is defined with respect to specific threshold levels. For example, the slew threshold settings can be:

# Falling edge thresholds: slew_lower_threshold_pct_fall: 30.0; slew_upper_threshold_pct_fall: 70.0; # Rising edge thresholds: slew_lower_threshold_pct_rise : 30.0; slew_upper_threshold_pct_rise : 70.0; These values are specifiedasa percentof Vdd.The threshold settingsspecify that falling slew is the difference between the times that falling edge reaches 70%and30% of Vdd.Similarly,thesettingsfor risespecifythatthe riseslewisthedifferenceintimesthattherisingedgereaches30%and70% ofVdd.Figure 2-3 showsthispictorially.

Figure 3-3 Transition time for rise and fall 2.6 Skew between Signals Skewisthe differenceintimingbetweentwoor moresignals,maybe data, clock or both. For example, if a clock tree has 500 end points and has a skew of 50ps, it means that the difference in latency between the longest pathandtheshortestclockpath is50ps.Figure2-4 showsanexample ofa clock tree. The beginning point of a clock tree typically is a node where a clock is defined. The end points of a clock tree are typically clock pins of synchronous elements, such as flip-flops. Clock latency is the total time it takes fromtheclocksourcetoanendpoint.Clockskewisthedifferencein arrivaltimesatthe endpointsofthe clocktree.

Figure 3-4 Clock tree, latency and skew An idealclocktreeisonewhere theclocksource isassumedtohavean infinitedrive,thatis,theclockcandriveinfinitesourceswithnodelay.Inadditi on,anycellspresentintheclocktreeareassumedtohavezerodelay.In the early stages of logical design, STA is often performed with ideal clock treesso that the focus of the analysisis on the data paths. In an ideal clock tree, clock skew is 0ps by default. Latency of a clock tree can be explicitly specified using the set_clock_latency command. The following example modelsthe latencyofa clocktree: set_clock_latency 2.2 [get_clocks BZCLK] # Both rise and fall latency is 2.2ns. # Use options -rise and -fall if different. Clock skew for a clock tree can also be implied by explicitly specifying its valueusingtheset_clock_uncertainty command: set_clock_uncertainty 0.250 -setup [get_clocks set_clock_uncertainty 0.100 -hold [get_clocks BZCLK]

BZCLK]

The set_clock_uncertainty specifies awindowwithinwhicha clockedgecan occur. Theuncertaintyin the timing of the clock edge istoaccount forseveral factorssuchasclockperiodjitter andadditionalmargins usedfor timing verification. Every real clock source has a finite amount of jitter - a window within which a clock edge can

occur. The clock period jitter is determinedbythetypeofclockgeneratorutilized.Inreality,therearenoideal clocks, that is, all clocks have a finite amount of jitter and the clock periodjittershouldbe involvedwhile specifyingtheclockuncertainty. Before the clock tree is implemented, the clock uncertainty must also involve theexpectedclockskewof theimplementation. One can specify different clock uncertainties for setup checks and for hold checks. The hold checksdo notrequire the clockjitter tobe involvedin the uncertaintyand thusasmallervalueofclockuncertaintyisgenerallyspecifiedfor hold. Figure 3-5 showsan example of a clockwith a setup uncertainty of 250ps. Figure 3-5(b) shows how the uncertainty takes away from the time available for the logic topropagate to the nextflip-flop stage. Thisisequivalent tovalidatingthedesign torunata higherfrequency.

Figure 3-5 setup uncertainty As specified above, the set_clock_uncertainty can also be used to model any additional margin. For example, a designer may use a 50ps timing margin as additional pessimism during design. This component can be added and involved in the set_clock_uncertaintycommand. In

general, before the clock tree isimplemented,the set_clock_uncertainty commandisusedtospecifya value that involves clock jitter plus estimated clock skew plus additional pessimism. set_clock_latency 2.0 [get_clocks USBCLK] set_clock_uncertainty 0.2 [get_clocks USBCLK] # The 200ps may be composed of 50ps clock jitter, # 100ps clock skew and 50ps additional pessimism. We shall see later how set_clock_uncertainty influences setup and hold checks. It is best to think of clock uncertainty as an offset to the final slack calculation. 2.7 Timing Arcs and Unateness Everycellhasmultipletimingarcs.Forexample,acombinationallogicc ell, such asand,or,nand,nor,addercell,hastimingarcsfromeachinputtoeach output of the cell. Sequential cells such as flip-flops have timing arcs from the clock to the outputs and timing constraints for the data pins with respect to the clock. Each timing arc has a timing sense, that is, how the output changes for different types of transitions on input. The timing arc is positiveunate ifarisingtransitiononaninputcausestheoutputtorise(or not to change) and a falling transition on an input causes the output to fall (ornottochange).Forexample,thetimingarcsforandandortypecellsare positiveunate.SeeFigure 2-17(a).

Figure 3-6 timing arcs A negative unate timing arc is one where a rising transition on an input causes the output to have a falling transition (or not to change) and a falling transition on an input causes the output to have a rising transition (or nottochange). For example, the timing arcsfor nand and nor type cellsare negative unate.See Figure2-17(b). Inanonunatetimingarc,theoutputtransitioncannotbedeterminedsolelyfromthe direction ofchange ofan inputbutalsodependsuponthestate oftheotherinputs.Forexample,thetimingarcsinanxorcell(exclusive-or) arenon-unate.1 See Figure2-17(c). Unatenessisimportantfortimingasitspecifieshowtheedges(transitio ns) can propagate through acellandhowtheyappearatthe outputofthe cell. One can take advantage of the non-unateness property of a timing arc, suchaswhenanxorcellisused,toinvertthepolarityofaclock.Seetheexample

in Figure 2-18. If input POLCTRL is a logic-0, the clock DDRCLK on output of the cell UXOR0 has the same polarity as the input clock MEMCLK.IfPOLCTRLisalogic-1,theclockontheoutputofthecellUXOR0has theoppositepolarityastheinputclock MEMCLK.

Figure 3-7 Controlling clock polarity. 2.8 Min and Max Timing Paths Thetotaldelayforthelogictopropagatethroughalogicpathisreferredt o as the path delay. This corresponds to the sum of the delays through the various logic cells and nets along the path. In general, there are multiple paths through which the logic can propagate to the required destination point. The actual path taken depends upon the state of the other inputs along the logic path. An example is illustrated in Figure 219. Since there are multiple paths to the destination, the maximum and minimum timing to the destination points can be obtained. The paths corresponding to the maximumtimingandminimumtimingarereferredtoasthemaxpathand minpathrespectively.Amaxpathbetweentwoendpointsisthepathwith the

largest delay (also referred to as the longest path). Similarly, a min path is the path with the smallest delay (also referred to as the shortest path). Note that the longest and shortest refer to the cumulative delay of the path,nottothenumberof cellsin thepath. Figure 2-19 shows an example of a data path between flip-flops. A max path between flip-flopsUFF1 and UFF3is assumed to be the one that goes through UNAND0, UBUF2, UOR2 and UNAND6 cells. A min path between the flip-flops UFF1 and UFF3 is assumed to be the one that goes through the UOR4 and UNAND6 cells. Note that in this example, the max and min are with reference to the destination point which is the D pin of the flip-flop UFF3. A max path is often called a late path, while a min path is often called an earlypath.

Figure 3-8 Max and min timing paths. When aflip-floptoflip-flop pathsuchasfrom UFF1toUFF3isexamined, one of the flip-flops launches the data and the other flip-flop captures the data. In this case, since UFF1 launches the data, UFF1 is referred to as the launch flip-flop. And since UFF3 captures the data, UFF3 is referred to as the capture flip-flop. Notice that the launch and capture terminology are always with reference to a flip-flop

to flip-flop path. For example, UFF3 wouldbecomealaunchflipflopforthepathtowhateverflip-flopcaptures thedataproducedbyUFF3. 2.9 Clock Domains In synchronous logic design, a periodic clock signal latches the new data computed into the flip-flops. The new data inputs are based upon the flipflopvaluesfromapreviousclockcycle.Thelatcheddatathusgetsusedfor computingthedatafor thenextclockcycle. Aclocktypicallyfeedsanumberofflip-flops.Thesetofflipflopsbeingfed by one clock is called its clock domain. In a typical design, there may be morethanoneclockdomain.Forexample,200flipflopsmaybeclockedby USBCLK and1000flipflopsmaybefedbyclockMEMCLK.Figure2-20depictsthe flip-flopsalong with theclocks.In thisexample,we say thatthere are twoclockdomains.

Figure 3-9 Two clock domains Aquestionofinterestiswhethertheclock domainsarerelatedor independent of each other. The answer depends on whether there are any data paths that startfrom one clock domain and end in the other clock domain. Ifthere arenosuchpaths,wecansafelysaythatthetwoclockdomainsare

independent of each other. This means that there is no timing path that startsfromoneclock domain andendsin the other clockdomain.

Figure 3-10 domain crossing. Ifindeedtherearedatapathsthatcrossbetweenclockdomains(seeFig ure 2-21),a decision has tobe made as towhether the paths are real or not. An exampleofarealpathisaflipflopwitha2xspeedclockdrivingintoaflipflopwitha1xspeedclock.Anexampl eofafalsepathiswherethedesigner has explicitly placed clock synchronizer logic between the two clock domains.Inthiscase,eventhoughthereappearstobeatimingpathfromone clock domain to the next, it is not a real timing path since the data is not constrainedtopropagatethroughthesynchronizerlogicinoneclockcycle. Such a path is referred to as a false path - not real - because the clock synchronizer ensures that the data passes correctly from one domain to the next. False paths between clock domains can be specified using the set_false_path specification,suchas: set_false_path -from [get_clocks USBCLK] \ -to [get_clocks MEMCLK] # This specification is explained in more detail in Chapter 8. Even though it is not depicted in Figure 2-21, a clock domain crossing can occur both ways, from USBCLK clock domain to MEMCLK clock domain, andfromMEMCLKclockdomaintoUSBCLKclockdomain.Bothscenarios needtobeunderstoodandhandledproperlyin STA.

Whatisthereasontodiscusspathsbetweenclockdomains?Typicallya design has a large number of clocks and there can be a myriad number of paths between the clock domains. Identifying which clock domain crossings are real and which clock crossings are not real is an important part of thetiming verification effort. Thisenables thedesigner tofocuson validatingonlytherealtiming paths. Figure222showsanotherexampleofclockdomains.Amultiplexerselects aclocksource-itiseitheroneortheotherdependingonthemodeofoperation of the design. There is only one clock domain, but two clocks, and these two clocks are said to be mutually-exclusive, as only one clock is activeatonetime.Thus,inthisexample,itisimportanttonotethatthere can never be a path between the two clock domains for USBCLK and USBCLKx2 (assuming that the multiplexer control is static and that such pathsdonotexistelsewhere in thedesign).

Figure 3-11 mutually exclusive clock domain. 2.10 Operating Conditions Static timing analysis is typically performed at a specific operating condition1. An operating condition is defined as a combination of Process, Voltage and Temperature (PVT). Cell delays and interconnect delays are computedbasedupon thespecifiedoperating condition.

There are three kinds of manufacturing process models that are provided bythesemiconductorfoundryfordigitaldesigns:slowprocessmodels,typica l process models, and fast process models. The slow and fast process models represent the extreme corners of the manufacturing process of a foundry. For robust design, the design is validated at the extreme corners of the manufacturing process aswell as environmentextremes for temperature and power supply. Figure 2-23(a) shows how a cell delay changes with the process corners. Figure 2-23(b) shows how cell delays vary with power supply voltage, and Figure 2-23(c) shows how cell delays can vary with temperature. Thus it is important to decide the operating conditions thatshouldbe usedforvariousstatictiming analyses.

Figure 3-12 PVT Delay variations The choice of what operating condition to use for STA is also governed by the operating conditions under which cell libraries are available. Three standardoperatingconditionsare:

i. WCS (Worst-Case Slow): Process is slow, temperature is highest (say 125C) and voltage is lowest (say nominal 1.2V minus 10%). For nanometer technologies that use low power supplies, there can be another worst-case slow corner that corresponds to the slow process, lowestpower supply, and lowesttemperature.The delays at low temperatures are not always smaller than the delays sathighertemperatures.This isbecause the device threshold voltage (Vt) margin with respect to the power supply isreduced for nanometer technologies. In such cases, at low power supply, the delay of a lightly loaded cell is higher at low temperatures than at high temperatures. This is especially true of high Vt (higher threshold, larger delay) or even standard Vt (regular threshold, lower delay)cells. Thisanomalousbehavior of delays increasing at lower temperatures is called temperature inversion. SeeFigure 2-23(c). ii. TYP (Typical): Process is typical, temperature is nominal (say 25C)andvoltageisnominal(say1.2V). iii. BCF (Best-Case Fast): Process is fast, temperature is lowest (say -40C)andvoltage ishighest(saynominal1.2V plus10%). The environment conditions for power analysis are generally different than the ones used for static timing analysis. For power analysis, the operatingconditionsmaybe: i. ML (Maximal Leakage): Process isfast, temperature ishighest (say 125C) and the voltage is also the highest (say 1.2V plus 10%). This corner corresponds to the maximum leakage power. For most designs, this corner also corresponds to the largest active power. ii. TL (Typical Leakage): Process is typical, temperature is highest (say 125C) and the voltage is nominal (say 1.2V). This refers to the condition where the leakage is representative for most designssincethechiptemperaturewillbehigherduetopowerdissipatedin normaloperation.

The static timing analysis is based on the libraries that are loaded and linkedinfortheSTA.Anoperatingconditionforthedesigncanbeexplicitly specifiedusingthe set_operating_conditions command.

Chapter 4: STA approach • Break the circuit into sets of timing paths • Calculate the delay of each path • Assume that signal becomes stable at latest possible time • Assume signal becomes unstable at the earliest possible time • Check timing Constraints. If the design works at these extremes, we can guarantee it always will 4.1 types of paths.

Figure 4-1 timing paths 4.2 delay modelling To compute the path delay, we need models of Interconnect Delay Cell delay

Interconnect delay refers to the total time needed to charge or discharge all of the parasitics of a given net. Cell delay is the time between a 50% transition on the input to a 50% transition on the output. It is a function of the input transition time (slope) and the output load.

Figure 4-2 delay model Delay = Time delta between two wave forms as measured at the 50% Vcc point 4.3 interconnect models Interconnect is any net used to connect two different pins on a circuit An interconnect Net involves, segments and Vias Segment: Is a polygon of metal layer Via: Is the connection between segments from different metal Layer Each Metal Layer has it own electrical parameters, these parameters are listed in the process file

We can build an electrical model of each line using the Resistor and Capacitor The simplest model for a VLSI interconnect is the RC model. R is the total wire resistance, and C is the capacitance.

Figure 4.3 RC model 4.4 cross capacitance Unlike the Line resistance, its capacitance is dependent not only on the line topography, but also on its neighbors from all directions. A line has capacitance to the “ground” Vss (self capacitance), and capacitance to other lines (cross capacitance)

Figure 4-4 impact of cross capacitance 4.5 Gate Delay and o/p Transition Time The gate delay and the output transition time are functions of both input slew and the output load T in

Gate/Cell Cload

Figure 4-5 gate delay When each component is built, the tables are constructed and attached with the design as timing description in the library.

Figure 4-6 lookup table 4.6 Timing Constraints Every digital signal has timing constraints imposed by the sequential elements Two types of timing constraints » Max Delay Constraint » Min Delay Constraint All paths in given circuit should meet above two constraints, else the path will be flagged as a timing violator 4.6.1 Max Delay Constraint

4.6.2 Max Delay Failure

4.6.3 min delay constraint

4.6.4 min delay failure

4.7 Setup Time & Hold Time

Setup time: Time the signal has to be stable prior to the edge Hold time: Time the signal has to be stable after the edge 4.8 Clock Skew Clock: The Clock is a periodic synchronization signal used as a time reference for data transfer in synchronous digital system. Sequentially Adjacent Register Pair: Two registers with only combinational logic or interconnect between them Clock Skew: Given two sequentially-adjacent registers, Ri and Rj, the clock skew between these two registers is defined as Tskew-i,j =Tci-Tcj whereTci and Tcj are the clock delays from the clock source to the registers Ri and Rj, respectively. Sequentially – Adjacent Register pair Skew is only meaningful between adjacent pairs of registers.

Figure 4-7 clock skew Clock skew is important only between the pairs (FF1, FF2) and (FF2, FF3). The skew between the pair (FF1, FF3) is meaningless.

Chapter 5: STA Algorithm 5.1 Extract combinational block

5.2 Combinational Block

Arrival time in green Interconnect delay in red Gate delay in blue

5.3 Modelling circuit

• Use a labeled directed graph G = • Vertices represent gates, primary inputs and primary outputs • Edges represent wires • Labels represent delays

5.4 Actual arrival time Actual arrival time A(v) for a node v is latest time that signal can arrive at node v

Where duv is delay from u to v, v = {Z}, and FI(v) = {X,Y} A(v) = max (A(u) + d(v) + duv) v Σ FI(u) 5.5 Algorithm Execution

for each vertex v in W for each edge from v

Temp-delay(w) = Final-delay(v) + delay(w) + delay()) if Temp-delay(w) > Final-delay(w) then Final-delay(w) = Temp-delay(w) path(w) = path(v).w

5.6 Required arrival time Required arrival time R(v) is the time before which a signal must arrive to avoid a timing violation

Required time comes from the flip flops. R(u) = min (R(v) – d(v) - duv ) v Σ FO(u) Where duv is delay from u to v, u = {X}, and FO(u) = {Y,Z} 5.7 Required time propagation

Assume required time at output R(f) = 5.80 Propagate required times backwards 5.8 Timing Slack • Slack can be computed from arrival and required times. For each node v: S(v) = R(v) – A(v) •

Slack reflects criticality of a node



Positive slack Node is not on critical path. Timing constraints met.



Zero slack Node is on critical path. Timing constraints are barely met.



Negative slack There is a timing violation Slack distribution is key for timing optimization!

Figure Timing slack computation

Chapter6: Timing Correction Driven by STA 6.1 Transformation

6.2 Resizing

6.3 Cloning

6.4 Buffering

6.5 Redesign Fan-in Tree

6.6 Redesign Fan-out Tree

6.7 Swap Commutative Pins

6.8 Methods for Improving RC delay 6.8.1 Optimize Routing Optimize routing topology In many cases there is no good reason for a bad routing!!

6.8.2 Use Tapering

6.8.3 Optimize the Size

6.8.4 Increase Spacing

6.8.5 Shielding

Isolate the signal from other attacking signals by surrounding it with DC lines This is an area consuming method, and should be used only for very sensitive signals.

Chapter 7 Results and analysis 7.1 Hierarchical Design (and analysis)

Adjacent foil shows hierarchical division of a chip into leaf cells Timing analysis is usually run both on full chip level, where the building blocks of the circuit are leaf cells, and on leaf cell level where the building blocks are standard cells Each leaf cell has a set of timing properties for each of its interface signals.

7.2 Timing Convergence Flow

7.3 Max Violations

7.4 Min violations

Related Documents