PLACEMENT AND ROUTING OF 1oo2 SYSTEM ON FPGA Josef Börcöck, Ali Hayek, Bashir Machmur, Muhammad Umar Institute of Computer Architecture and System Programming, University of Kassel, Germany
[email protected],
[email protected],
[email protected],
[email protected]
ABSTRACT This paper discusses the process of implementing the synthesis netlist of the CPU design into a target FPGA device. The place and route tools read the netlist, extract the components and nets from the netlist, place the components on the target device, and interconnect the components using the specified interconnections. Place & Route, also referred to as PAR, follows synthesis and simulation. PAR is actually a subset of a larger EDA stage known as design implementation. Efficient placement and routing algorithms play an important role in FPGA architecture research. Together, the place-and-route algorithms are responsible for producing a physical implementation of an application circuit on the FPGA hardware. Plan ahead is new design tool from Xilinx that is used for floor planning. Keywords: FPGA, VHDL Plan Ahead, Place & Route, 1oo2 system
1. INTRODUCTION Efficient placement and routing algorithms play an important role in FPGA architecture research. Together, the place-and-route algorithms are responsible for producing a physical implementation of an application circuit on the FPGA hardware Erreur ! Source du renvoi introuvable.. The quality of the place-and-route algorithms has a direct bearing on the usefulness of the target FPGA architecture. The benefits of including powerful new features on an FPGA might be lost due to the inability of the place-and-route algorithms to fully exploit these features. Thus, the advancement of FPGA architectures relies heavily on the development of efficient place-and-route algorithms. When working with an FPGA, one designs a large digital circuit, and then a software synthesis tool breaks it down into pieces small enough to be implemented by single logic blocks. Then a second computer-aided design (CAD) tool, called a placer/router, is needed to determine exactly where on the FPGA each part of the design should be located. As the name implies, a placer/router’s work has two stages: placement and routing. Placement determines exactly which FPGA logic blocks should implement which logic-block sized pieces of the circuit design, and routing decides how the FPGA’s switch boxes should be configured so that the circuit’s signals will travel to the appropriate places through the FPGA’s channels. The placer tries to arrange the logic blocks in a way that will minimize the distance signals in the circuit will have to travel. The router tries to minimize
the amount of wiring needed and the speed the final circuit will run at. Routing is generally much more time-consuming than placement. Placement & Routing plays important role for safe system, so this process is the key factor when designing safety system. Fault tolerance techniques are important to increase the yield of the VLSI chips in advanced fabrication technologies. In regular structure like FPGA, redundancy is commonly used for fault tolerance. Redundancy is kind of safety that can be implemented in safety system Erreur ! Source du renvoi introuvable.. One way to improve redundancy in FPGA system after programming is Place and route process. This idea sounds strange as how redundancy can be improved by P&R process. P&R process involves placement of components that are listed in net-list and routing among them. Placement and routing tools map any circuit (netlist) to the fabric optimally. Because all cells are identical in nature, if one cell is found defective then other cells can be used in place of it. This approach is generally called redundancy based approach. There are several tools in market today that will guarantee best design solution. This whole document will deal with P&R process. For our design we have worked with Xilinx ISE cad tool and Xilinx Plan Ahead. Results are mentioned. Xilinx’s ISE tool Erreur ! Source du renvoi introuvable. is used for all aspects of design implementation including PAR, design mapping, translation, and generating the bit stream file. Xilinx Virtex FPGAs Erreur ! Source du renvoi introuvable. are based on Look up Tables (LUTs). The basic elements of the Virtex devices are Configurable Logic Blocks (CLBs), and each CLB contains eight LUTs. CLBs are made up of FPGA slices which contain function generators; storage elements, logic gates, multiplexers, and a fast carry look-ahead chain. In design implementation, the mapping phase involves simplified functions being mapped to component primitives within the FPGA slices. PAR involves the process of interconnecting other primitives within the slice to a matrix of wire segments, programmable switches, and routing resources within the FPGA. The ISE tool is timing driven and therefore provides the ability to specify timing requirements for critical paths in the design through its timing analyzer and constraints editor tools. A typical FPGA CAD tool-flow is shown in Figure 1. Initially, a circuit specification of the application is produced high-level description in a Hardware Description Language (HDL). The appropriate circuit
specification serves as the input to a Synthesis tool. The Synthesis tool synthesizes the circuit specification into a circuit that consists of basic logic gates and their interconnections. In the Technology Mapping phase, the gate-level netlist is transformed into a functionally equivalent netlist that is expressed in terms of the logic units that are provided by the FPGA device. The mapped netlist serves as an input to the Placement tool, which determines the actual physical location of each netlist logic block in the FPGA layout. After the physical location of each logic block has been determined, the Routing tool determines the FPGA routing resources that are needed to route the nets that connect the placed logic blocks. At the end of a successful routing phase, a stream of configuration bits is produced. The configuration bit stream is used to program SRAM cells in the FPGA fabric so that the target application can be implemented.
2. PLACE & ROUTE PROCESS
timing constraints are not met, the place and route software continues to try different placements and signal routing to try to meet the constraints. Typical target devices have areas of the chip where logical functions are placed, and areas where interconnect signals are routed to connect the logical functions. This is shown in Figure3. The device is split into a number of logic areas with routing channels that surround the logic areas. Logic areas contain the logical gates to implement the Boolean function of the design. Routing channels contain the signals that are used to connect the logical gates together. For FPGA devices, the routing channels contain programmable interconnect wires. FPGA devices use an onboard RAM to store the value of programmable switches that are used to form the signal interconnections. By enabling the proper sets of pass transistor gates, signal interconnections between logic gates can be formed as shown in the example in Figure4.
The place and route process places each macro from the synthesis netlist into an available location on the target silicon and connects the macros using routing resources available on the target silicon. The place and route process is shown in Fig 2
Figure 3: Logic Area Architecture.
Figure 1: FPGA CAD tool flow [2].
Figure 4: Programmable interconnect switch.
Figure 2: Place & Route process [2].
The synthesis netlist is input to the placement process. The placement process analyzes all of the macros used in the design and their connectivity to try to determine an optimal placement for the macros. The placement algorithms take into account a number of technologyspecific factors of the target technology to determine whether a particular placement is good or not. After a trial placement and signal route is attempted, the design is analyzed with respect to timing constraints. If the
To make a connection from logic block 1 to logic block 3, all of the switches shown need to be enabled with logic 1 value. The logic gates of the devices are connected to local routing signals that can be connected to more global routing signals by pass transistors that bridge the two signals. The control signals of the pass transistors are stored in a loadable RAM. The place and route tool generates the RAM image to be loaded into the RAM on the device. The routing channels contain vertical and horizontal lines. The horizontal wires connect devices within a row, while the vertical lines allow connections across rows. Most routing channels contain wires of different
lengths that allow connections to adjacent logic areas. Sometimes, longer connections are needed, and either a longer line must be used or shorter lines must be connected together to form the connection. This is shown in Figure 5.
Figure 5: Routing Procedure [2].
3. FLOOR PLANNING DANGER The danger in floor planning is that if done incorrectly, it can dramatically decrease the performance of the implementation. This is because of the fact that good placement directly corresponds with a design that performs well, and bad placement directly corresponds with a design that performs poorly. This may seem like an obvious statement, but a corollary to this is that a bad floor plan will lead to bad placement and will subsequently lead to poor performance. Thus, a floor plan of any type will not have a non decreasing impact on performance. Rather, a bad floor plan will make things much worse. The fundamental concept to understand here is that routing is extremely dependent on placement. For a design of any complexity, a good route can only be achieved with a good placement. This is particularly true of FPGAs due to the coarse nature of a routing matrix (in an ASIC, there is more flexibility to be creative with the routing). For example take a typical design and run hundreds of combinations of placement and routing effort levels and then plot data, you would likely see something very similar to the graph shown in figure 6 Erreur ! Source du renvoi introuvable.. As can be seen from the figure 6, placement has a dominant (call it first-order effect) on the performance of the design, and routing has a relatively minor effect.
Figure 6: Performance versus placement and routing effort [11].
4. QUICKER INCREMENTAL DESIGN CHANGES While using a flat methodology a minor changes to a given logic block requires redo place and route for the entire design. This adds up to a significant amount of wasted time, since fifty or more place and route iterations, at eight or more hours apiece, are common
with today’s larger FPGA netlists, particularly when they are heavily constrained with timing requirements. With new design methodology by using Plan Ahead, place and route time can be reduced by using hierarchy. By breaking the designs into smaller pieces, or blocks, there is no need to run place and route on the entire design each time an incremental design change. Instead, just run place and route on the block or blocks that have changed, leaving the rest intact. As an example, see Figure 7. In the flattened methodology on the left, a logic block is highlighted in yellow. Notice how it is scattered over nearly the whole chip. Any design change in this logic block means that place and route the entire flattened design all over again, and the performance of the design may significantly change. In the hierarchical methodology on the right, the same logic block is highlighted in yellow and note that it is localized within a distinct area of the chip. If there is a change in this logic block, only the block itself will be placed and routed again. The rest of the design stays put and the performance of the design remains largely unaffected.
Figure 7: Flat design on left and new hierarchical style on right
5. PLAN AHEAD IMPLEMENTATION The Plan Ahead Erreur ! Source du renvoi introuvable. hierarchical floor planner delivers a faster, more efficient design environment that lets you find and fix problems early in the design process. Xilinx Plan Ahead delivers a better solution by streamlining the design step between synthesis and place and route. Plan Ahead offers: • A hierarchical, block-based, incremental design methodology • A convenient method for connectivity, timing, and utilization analysis • Clock I/O and clock region planning • Automatic and manual partitioning and physical block sizing/placement • An easy-to-use integrated graphical user interface Plan Ahead will guide to: • Improve Design Performance through floor planning • Improve Productivity with multiple Implementation Strategies in Explore Ahead • Enable IP Re-use through its Block-Based Implementation flow
Plan Ahead design flow is shown in Fig 8. Till synthesis level it resemble like typical FPGA CAD tool flow. After Synthesis, netlist can be imported in Plan Ahead and one can start floor planning.
Elements in the Netlist referred to as instances include leaf-level logic primitives and hierarchical module components or instances.
5.6. MODULE Elements in the netlist that represent hierarchical module instantiations are referred to as modules or components. Leaf-level primitive logic is referred to as instances or primitives.
5.7. PRIMITIVE Elements in the netlist that represent leaf-level logic objects are referred to as primitives (ex. LUTs, FlipFlops, etc.).
6. FASTER PLACE AND ROUTE Figure 8: Design flow using plan ahead.
Important parameters that are involved in floor planning through PlanAhead are explained below:
5.1. FLOOR PLAN Each Floor plan associates the active netlist with the specific placement and timing constraints defined in it. The single imported Project netlist can support multiple Floor plans using different constraints or devices.
5.2. NETLIST A netlist represents a logical description of the design to be implemented. A netlist should be hierarchical consisting of a top-level netlist with child netlists for underlying levels of hierarchy (“modules”).
5.3. CONSTRAINT A constraint can either be a description of logic timing or behaviour requirement or a physical placement requirement. I/O Port assignments are also defined by constraints.
There is a tradeoff between processing speed and layout quality. Simple constructive placement algorithms, such as direct placing and random placing, place the design fast but cannot guarantee the quality; iterative placement methodologies, such as simulated annealing and force-directed method, provide high quality layouts but the processing time is long. During module tests and prototype designs, the speed of an FPGA design tool is as important as its layout quality. Thus, a methodology that presents fast processing time and acceptable performance is practical and imperative for large FPGA designs. Place and route algorithms are less efficient and need far more time to operate on a flat design, yet traditional FPGA design tools still primarily operate in that mode. By using area groups, which we also tried to do with our design by breaking the whole design into smaller areas which effectively break the design into smaller pieces that the place and route algorithms can more easily handle. Since individual blocks are more manageable, place and route needs less time to complete the overall design. Once the floor planning is done in Plan ahead, floor plan is exported to ISE for place and route and bit generation .ISE cad tool take pblock as area group and place it as plan ahead.
5.4. PHYSICAL BLOCK (PBLOCK) Design partitions are referred to as physical blocks or Pblocks. Traditionally, a single or groups of logic instances are assigned to a Pblock. The Pblock can have an area such as a rectangle defined on the FPGA device to constrain the logic. Pblocks can be defined without rectangles, and ISE will attempt to group the logic during placement. Netlist logic placed inside of Pblocks will receive AREA_GROUP constraints for ISE. Pblocks may be specified with specific RANGE types to contain various types of logic only (e.g. SLICE, RAM/MULT, DSP, etc.). Pblocks may also be defined with multiple rectangles to enable non-rectangular shapes to be created such as Ls and Ts. Figure 9: Floor Planning using ISE
5.5. INSTANCE
software will attempt to group the logic during placement. Netlist logic placed inside of Pblocks will receive AREA_GROUP constraints. Figure 10 shows Pblock implementation. 1oo2 system has been developed on one FPGA and now new work is to configure this FPGA in such a way that if a fault in one system is detected, the FPGA is dynamically reconfigured to be a backup circuit for the faulty control circuit.
9. CONCLUSION Figure 10: Floor Planning using Plan Ahead
The rectangles show Pblock. Inside each pblock resides the leaf logic (primitives). For 1oo2 system 6 components are used (Alu, Memory, Accumulator etc...) for each processor. One Pblock represent one component in 1oo2 system. To know more about plan ahead and how to use it please refers to manual Erreur ! Source du renvoi introuvable. & Erreur ! Source du renvoi introuvable..
7. ANALYSIS Placement and routing have been carried on various 1oo2 architectures and result from one of them is shown over here. Device utilization for 1oo2 system is categorized in 3 parts. First one is number of BUFGMUXs [10] and they were 2 out of 16(12% utilization), number of external IOBs were 31 out of 264 (11% utilization) and there are 745 number of slices used out of 3072 (21% utilization). Above stated results were before floor planning in PlanAhead. Only major difference was 552 numbers of slices out of 3,072 (17.97%) as compared to 21% when automatically done by ISE.
8. WORK IN PROGRESS Partial reconfiguration [9] is a process of device configuration that allows a limited, predefined portion of an FPGA to be reconfigured while the remainder of the device continues to operate. This is especially valuable where devices operate in a mission-critical environment that can not be disrupted while some subsystems are being redefined. Using partial reconfiguration, one can dramatically increase the functionality of a single FPGA, allowing for fewer, smaller devices than would otherwise be needed. Important applications for this technology include reconfigurable communication and cryptographic systems. Floor planning for Partial Reconfiguration is an important step in the partial reconfiguration flow. Floor planning within Plan Ahead software is based on design partitions referred to as physical blocks, or Pblocks. Pblocks can be defined without rectangles and ISE™
It is bad enough that the number and duration of place and route runs have become a bottleneck, but the results themselves can be unpredictable and unreliable. Even if the no changes are made to the design, one run may produce a design that is faster or slower than the previous one due to the randomness of the algorithms. But with this new Xilinx tool that use hierarchical methodology, provides a more deterministic process by enabling users to define area groups to guide place and route toward acceptable timing. Users can also lock placement results for individual blocks that already meet timing, so that subsequent place and route iterations do not change their performance. This serves to further stabilize the place and route process, making the results more reliable and predictable.
REFERENCES [1] Xilinx, “Virtex-II platform FPGAs,” Complete Data sheet 2007. [2] Douglas L. Perry: VHDL: Programming by Example, Tool usage for simulation, Synthesis and debugging, McGraw-Hill, 2002. [3] Xilinx, Floor Planner tutorial. [4] Xilinx., “Plan Ahead Tutorial,” 2008. [5] Xilinx, Plan Ahead Hierarchical Floor planner, 2005. [6] Börcöck J: Electronic Safety Systems- Hardware Concepts, Models and Calculations. HüthigVerlag Heidelberg, 2004. [7] Akshay Sharma, Place and Route technique for fpga architecture advancement, University of Washington, 2005 [8] IEC/EN 61508: International Standard 61508 Functional safety: Safety-related System, Geneva, International Electrotechnical Commission [9] Xilinx, Partial Reconfiguration tutorial. [10] Global Clock MUX Buffer with Output State 0 http://toolbox.xilinx.com/docsan/xilinx6/books/dat a/docs/v4lsc/v4lsc0030_21.html [11] Steve kilts, Advance FPGA design Architecture, Implementation and Optimization,WILEY,2007