Basic FPGA Architecture
© 2005 Xilinx, Inc. All Rights Reserved
Objectives After completing this module, you will be able to: •
•
•
Identify the basic architectural resources of the Virtex™-II FPGA List the differences between the Virtex-II, Virtex-II Pro, Spartan™-3, and Spartan-3E devices List the new and enhanced features of the new Virtex-4 device family
Basic FPGA Architecture 2 - 2
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 3
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Overview •
All Xilinx FPGAs contain the same basic resources –
Slices (grouped into CLBs) •
–
IOBs •
– –
Contain combinatorial logic and register resources Interface between the FPGA and the outside world
Programmable interconnect Other resources • • • •
Basic FPGA Architecture 2 - 4
Memory Multipliers Global clock buffers Boundary scan logic
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II Architecture I/O Blocks (IOBs)
Block SelectRAM™ resource
Programmable interconnect Dedicated multipliers Configurable Logic Blocks (CLBs) •
Virtex™-II architecture’s core voltage operates at 1.5V
Basic FPGA Architecture 2 - 5
Clock Management (DCMs, BUFGMUXes)
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 6
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Slices and CLBs •
Each Virtex-II CLB contains four slices –
–
COUT BUFT BUF T
Slice S3
Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs A switch matrix provides access to general routing resources
Slice S2 Switch Matrix
SHIFT
Slice S1
Slice S0
CIN
Basic FPGA Architecture 2 - 7
COUT
© 2005 Xilinx, Inc. All Rights Reserved
Local Routing
CIN
For Academic Use Only
Simplified Slice Structure •
Each slice has four outputs –
–
•
Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs
Carry logic runs vertically, up only
Slice 0 LUT
Carry
PRE D Q CE CLR
LUT
Carry
D PRE Q CE CLR
Two independent carry chains per CLB Basic FPGA Architecture 2 - 8 –
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Detailed Slice Structure •
The next few slides discuss the slice features – –
– – –
LUTs MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) Carry Logic MULT_ANDs Sequential Elements
Basic FPGA Architecture 2 - 9
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Look-Up Tables •
Combinatorial logic is stored in LookUp Tables (LUTs) – –
•
Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the complexity
Delay through the LUT is constant Combinatorial Logic
A B C D
Basic FPGA Architecture 2 - 10
Z
© 2005 Xilinx, Inc. All Rights Reserved
A B C D Z 0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
1
1
0
1
0
0
1
0
1
0
1
1
.
.
.
1
1
0
0
0
1
1
0
1
0
1
1
1
0
0
1
1
1
1
1
For Academic Use Only
Connecting Look-Up Tables
Basic FPGA Architecture 2 - 11
F6
Slice S0
F5
Slice S1
F5
F7
Slice S2
F5
F6
Slice S3
F5
F8
CLB
MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs
MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Fast Carry Logic •
Simple, fast, and complete arithmetic Logic –
–
–
Dedicated XOR gate for singlelevel sum completion Uses dedicated routing resources All synthesis tools can infer carry logic
COUT
COUT To S0 of the next CLB
To CIN of S2 of the next CLB
SLICE S3
First Carry Chain
CIN COUT
SLICE S2 SLICE S1
CIN
Second Carry Chain
COUT
SLICE S0 CIN
Basic FPGA Architecture 2 - 12
© 2005 Xilinx, Inc. All Rights Reserved
CIN
CLB
For Academic Use Only
MULT_AND Gate •
Highly efficient multiply and add implementation –
–
Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit LUT
A
CY_MUX
S CO DI CI
CY_XOR MULT_AND
AxB LUT
B
Basic FPGA Architecture 2 - 13
LUT
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Flexible Sequential Elements • •
•
•
Either flip-flops or latches Two in each slice; eight in each CLB Inputs come from LUTs or from an independent CLB input Separate set and reset controls –
•
Can be synchronous or asynchronous
All controls are shared within a slice
Control signals can be inverted locally within a Basic FPGA Architecture 2 - 14
FDRSE_1 D
S
Q
CE R FDCPE D PRE Q CE CLR
LDCPE D PRE Q CE G CLR
–
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Shift Register LUT (SRL16CE) •
Dynamically addressable serial shift registers –
–
D CE CLK
Maximum delay of 16 clock cycles per LUT (128 per CLB) Cascadable to other LUTs or CLBs for longer shift registers •
–
LUT
D Q CE
D Q CE
Dedicated connection from Q15 to D input of the next SRL16CE LUT
Shift register length can be changed asynchronously by toggling address A
Basic FPGA Architecture 2 - 15
D Q CE
Q
D Q CE
A[3:0]
© 2005 Xilinx, Inc. All Rights Reserved
Q15 (cascade out)
For Academic Use Only
Shift Register LUT Example •
The SRL can be used to create a No Operation (NOP) –
This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72 CLBs) and associated routing and delays 12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
Operation C
Operation D NOP
3 Cycles
9 Cycles
64
Paths are Statically Balanced 12 Cycles
Basic FPGA Architecture 2 - 16
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 17
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
IOB Element •
Input path –
•
–
•
•
Two DDR registers DDR MUX Reg
Output path –
IOB
Two DDR registers Two 3-state enable DDR registers
Separate clocks and clock enables for I and O Set and reset signals are shared
OCK1
Reg ICK1
Reg
OCK2
3-state
Reg ICK2
DDR MUX Reg
OCK1
Reg OCK2
Basic FPGA Architecture 2 - 18
Input
© 2005 Xilinx, Inc. All Rights Reserved
PAD Output
For Academic Use Only
SelectIO Standard •
Allows direct connections to external signals of varied voltages and thresholds – –
•
Differential signaling standards – – –
•
Optimizes the speed/noise tradeoff Saves having to place interface components onto your board LVDS, BLVDS, ULVDS LDT LVPECL
Single-ended I/O standards – – – –
LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz) GTL, GTLP and more!
Basic FPGA Architecture 2 - 19
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Digital Controlled Impedance (DCI) •
DCI provides –
–
•
Output drivers that match the impedance of the traces On-chip termination for receivers and transmitters
DCI advantages –
–
–
Improves signal integrity by eliminating stub reflections Reduces board routing complexity and component count by eliminating external resistors Eliminates the effects of temperature, voltage, and process variations by using an internal feedback circuit
Basic FPGA Architecture 2 - 20
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 21
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Other Virtex-II Features •
Distributed RAM and block RAM –
–
• •
Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits) Block RAM is a dedicated resources on the device (18kb blocks)
Dedicated 18 x 18 multipliers next to block RAMs Clock management resources – –
Sixteen dedicated global clock multiplexers Digital Clock Managers (DCMs)
Basic FPGA Architecture 2 - 22
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Distributed SelectRAM Resources •
• •
Uses a LUT in a slice as memory Synchronous write Asynchronous read –
•
RAM and ROM are initialized during configuration –
•
Accompanying flip-flops can be used to create synchronous read
LUT
Slice LUT
Data can be written to RAM after configuration
RAM16X1S D WE WCLK A0 O A1 A2 A3
RAM32X1S D WE WCLK A0 O A1 A2 A3 A4
LUT
Emulated dual-port RAM
One read/write port Basic FPGA Architecture 2 - 23 – One read-only port –
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3
Block SelectRAM Resources •
Up to 3.5 Mb of RAM in 18-kb blocks –
•
Synchronous read and write
True dual-port memory –
–
Each port has synchronous read and write capability Different clocks for each port
Supports initial values • Synchronous reset on output latches • Supports parity bits Basic FPGA Architecture 2 - 24 •
© 2005 Xilinx, Inc. All Rights Reserved
18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA CLKA
DOA DOPA
DIB DIPB ADDRB WEB ENB SSRB CLKB
DOB DOPB
For Academic Use Only
Dedicated Multiplier Blocks • •
•
18-bit twos complement signed operation Optimized to implement Multiply and Accumulate functions Multipliers are physically located next to block SelectRAM™ memory Data_A (18 bits)
18 x 18 Multiplier
Output (36 bits)
signed 18 x 18 signed
Data_B (18 bits)
Basic FPGA Architecture 2 - 25
4x4 signed 8x8 signed 12 x 12
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Global Clock Routing Resources •
Sixteen dedicated global clock multiplexers –
–
•
Global clock multiplexers provide the following: – – –
•
Eight on the top-center of the die, eight on the bottom-center Driven by a clock input pad, a DCM, or local routing Traditional clock buffer (BUFG) function Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX)
Up to eight clock nets can be used in each clock region of the device –
Each device contains four or more clock regions
Basic FPGA Architecture 2 - 26
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Digital Clock Manager (DCM) •
Up to twelve DCMs per device – –
•
DCMs provide the following: – – –
•
Located on the top and bottom edges of the die Driven by clock input pads Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)
Up to four outputs of each DCM can drive onto global clock buffers –
All DCM outputs can drive general routing
Basic FPGA Architecture 2 - 27
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • • Basic FPGA Architecture 2 - 28
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan-3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-3 versus Virtex-II • •
Lower cost Smaller process = lower core voltage –
–
•
•
•
.09 micron versus .15 micron Vccint = 1.2V versus 1.5V
•
Different I/O standard support –
–
New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL Default is LVCMOS, versus LVTTL
Basic FPGA Architecture 2 - 29
More I/O pins per package Only one-half of the slices support RAM or SRL16s (SLICEM) Fewer block RAMs and multiplier blocks –
•
• •
Same size and functionality
Eight global clock multiplexers Two or four DCM blocks No internal 3-state buffers For Academic Use Only
© 2005 Xilinx, Inc. All Rights Reserved
SLICEM and SLICEL •
Each Spartan™-3 CLB contains four slices –
•
Right-Hand SLICEL Left-Hand SLICEM COUT
Similar to the Virtex™-II
Slice X1Y1
Slices are grouped in pairs –
–
Slice X1Y0 Switch Matrix
Left-hand SLICEM (Memory) •
LUTs can be configured as memory or SRL16
Right-hand SLICEL (Logic) •
COUT
SHIFTIN
Slice X0Y1
Fast Connects
Slice X0Y0
SHIFTOUT CIN
CIN
LUT can be used as logic only
Basic FPGA Architecture 2 - 30
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Spartan-3E Features •
•
More gates per I/O than Spartan-3 Removed some I/O standards – – – –
–
•
Higher-drive LVCMOS GTL, GTLP SSTL2_II HSTL_II_18, HSTL_I, HSTL_III LVDS_EXT, ULVDS
DDR Cascade
•
16 BUFGMUXes on left and right sides –
–
• •
Pipelined multipliers Additional configuration modes – –
Internal data is presented on a single clock edge Basic FPGA Architecture 2 - 31
Drive half the chip only In addition to eight global clocks
SPI, BPI Multi-Boot mode
–
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II Pro Features • •
0.13 micron process Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT) blocks – –
– –
•
Serializer and deserializer (SERDES) Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers, and others 8-, 16-, and 32-bit selectable FPGA interface 8B/10B encoder and decoder
PowerPC™ RISC processor blocks – – –
Thirty-two 32-bit General Purpose Registers (GPRs) Low power consumption: 0.9mW/MHz IBM CoreConnect bus architecture support
Basic FPGA Architecture 2 - 32
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 33
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-4 Features •
New features – – – –
•
Dedicated DSP blocks Phase-matched clock dividers (PMCD) SERDES built into the Virtex™-4 SelectIO™ standard Dynamic reconfiguration port (DRP)
Enhanced features – –
–
–
Block RAM can be configured as a FIFO Advanced clocking networks, including regional clock buffers and source- synchronous support 11.1 Gbps RocketIO™ Multi-Gigabit Transceiver (MGT) blocks Enhanced PowerPC™ processor blocks
Basic FPGA Architecture 2 - 34
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 35
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Review Questions • •
List the primary slice features List the three ways a LUT can be configured
Basic FPGA Architecture 2 - 36
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Answers •
List the primary slice features –
– –
– –
•
Look-up tables and function generators (two per slice, eight per CLB) Registers (two per slice, eight per CLB) Dedicated multiplexers (MUXF5, MUXF6, MUXF7, MUXF8) Carry logic MULT_AND gate
List the three ways a LUT can be configured – – –
Combinatorial logic Shift register (SRL16CE) Distributed memory
Basic FPGA Architecture 2 - 37
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary •
Slices contain LUTs, registers, and carry logic –
–
• •
•
LUTs are connected with dedicated multiplexers and carry logic LUTs can be configured as shift registers or memory
IOBs contain DDR registers SelectIO™ standards and DCI enable direct connection to multiple I/O standards while reducing component count Virtex™-II memory resources include the following: –
–
Distributed SelectRAM™ resources and distributed SelectROM (uses CLB LUTs) 18-kb block SelectRAM resources
Basic FPGA Architecture 2 - 38
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary •
•
The Virtex™-II devices contain dedicated 18x18 multipliers next to each block SelectRAM™ resource Digital clock managers provide the following: – – –
Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)
Basic FPGA Architecture 2 - 39
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Where Can I Learn More? •
User Guides –
•
Application Notes –
•
www.xilinx.com → Documentation → User Guides
www.xilinx.com → Documentation → Application Notes
Education resources – –
Designing with the Virtex-4 Family course Spartan-3E Architecture free Recorded e-Learning
Basic FPGA Architecture 2 - 40
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline • • • •
•
• • •
Basic FPGA Architecture 2 - 41
Overview Slice Resources I/O Resources Memory and Clocking Spartan-3, Spartan3E, and Virtex-II Pro Features Virtex-4 Features Summary Appendix © 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Double Data Rate Registers •
DDR registers can be clocked – –
By Clock and NOT(Clock) if the duty cycle is 50/50 By the CLK0 and CLK180 outputs of a DCM Clock
D1
Reg DDR MUX
OCK1
OBUF
PAD
D2
Reg OCK2 •
FDDR
If D1 = “1” and D2 = “0”, the output is a copy of Clock –
Use this technique to generate a clock output that is synchronized to DDR output data
Basic FPGA Architecture 2 - 42
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Dual-Port Block RAM Configurations •
•
ConfigurationsConfigurati available on onx 1 16k each port
16 kb
Data Bits 1
Parity Bits 0
8k x 2
8 kb
2
0
4k x 4
4 kb
4
0
2k x 9
2 kb
8
1
1k x 18
1 kb
16
2
512 x 36
512
32
4
Independent configurations on ports A and B –
Depth
IN 8 bit
Supports data-width conversion, including parity bits
Basic FPGA Architecture 2 - 43
© 2005 Xilinx, Inc. All Rights Reserved
Port A: 8 bits
Port B: 32 bits
OUT 32 bit
For Academic Use Only
Clock Buffer Configurations •
Clock buffer (BUFG) –
Low-skew clock distribution I
•
Clock enable buffer (BUFGCE) –
–
–
Holds the clock output Low when Clock Enable (CE) is inactive CE can be active-High or active-Low Changes in CE are only recognized when the clock input is Low to avoid glitches and short clock pulses
Basic FPGA Architecture 2 - 44
© 2005 Xilinx, Inc. All Rights Reserved
I
BUFG
BUFGCE
CE
For Academic Use Only
O
O
•
Clock multiplexer (BUFGMUX) –
–
–
I0
Switches from one clock to another, glitch-free After a change on S, the BUFGMUX waits for the currently selected clock input to go Low The output is held Low until the newly selected clock goes Low, then switches
Basic FPGA Architecture 2 - 45
I1
BUFGMUX
Clock Buffer Configurations O
S
S
Wait for low
I0 I1
Switch
O
© 2005 Xilinx, Inc. All Rights Reserved
For Academic Use Only