Quadrics Qsnetiii An Hpc Interconnect For Petascale Systems - Presentation

  • Uploaded by: Federica Pisani
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Quadrics Qsnetiii An Hpc Interconnect For Petascale Systems - Presentation as PDF for free.

More details

  • Words: 886
  • Pages: 19
QsNetIII An HPC Interconnect for PetaScale Systems Duncan Roweth, Quadrics Ltd ISC08 Dresden June 2008

Quadrics Background

• Develops interconnect products for the HPC market – HPC Linux systems – AlphaServer SC systems

• Quadrics is owned by the Finmeccanica group • Quadrics will be 12 years old in July

Interconnect Network – QsNet • QsNetIII Network – – – –

Multi-stage switch network Evolution of the QsNetII design Increased use of commodity hardware Increasing support for standard software

• QsNetIII Components – – – –

ASICs Elan5 and Elite5 Adapters, switches, cables Firmware, drivers, libraries Diagnostics, documentation

Elan5 Adapter Overview

• • • •

QsNetIII

2 × 25 Gbit/s links PCIe, PCIe2 host interface Multiple packet engines 512KB of high bandwidth on chip local memory • SDRAM interface to optional local memory • Buffer manager, object cache

CX4/ QSNetIII

CX4/ QSNetIII

Link

Link

Packet Engine

Packet Engine

Packet Engine

Packet Engine

Packet Engine

Packet Engine

Packet Engine

16K inst cache 9K data buffers

16K inst cache 9K data buffers

16K inst cache 9K data buffers

16K inst cache 9K data buffers

16K inst cache 9K data buffers

16K inst cache 9K data buffers

16K inst cache 9K data buffers

Elan5 Adapter

Fabric x8

Host I/F

Local Memory

Object Cache Tags Buffer Manager

Cmd Launch

Free List

PCIe SERDES

PCIe 16 Lanes

Bridge

Local Functions

TLB

External cache SDRAM i/f

Ext i/f

16K x 8 x 8 banks = 1MB ECC RAM

PLL

External EEPROM DDRII

Clocks

QsNetIII Adapter Overview

• • • •

QM700 PCIe x16 128MB adapter memory 2 QSFP links Half height low profile

• Adapters variants – PCIe Gen2 – Blade formats – 10Gbit/s Ethernet 10GBase-CX4

Elite5 - Overview

• Physical layer DDR XAUI – 4 x 6.25Gbit/s (2.5Gbytes/s) in each direction

• • • • • •

32-way crosspoint router 32 virtual channels per link Fat tree or mesh topologies Adaptive routing Broadcast & barrier support Memory mapped stats & error counters accessed via control network

QsNetIII Adaptive Routing • Packet by packet dynamic routing – Single cycle routing decision

• Selects route based on – Link state, errors etc – Number of pending acks

• High radix switches – 2 routing decisions for 2048 nodes

• More flexible than QsNetII – Operates on groups of links – Can adaptively route up or down

Bandwidth scalability – 1024 nodes

• Bandwidth achieved when 1024 nodes all communicate at the same time • QsNetII provides better average bandwidth and much narrower spread in best to worst case performance

System

Interconnect

Min

Max

Average

Atlas

Infiniband

95

762

263

Thunder

QsNetII

248

403

369

Data from Lawrence Livermore National Lab, published at the Sonoma OpenFabrics workshop June 2007

QsNetIII Device Overview

Elan

Elite

Manufacturing partner LSI/TSMC G90 process Semi custom ASICs, 500MHz system clock

High performance BGA package 672 pin

982 pin

17W

18W

QsNetIII – Federated Network Switches

• Node switch chassis – 128 links up 128 down

• Same chassis provides multiple top switch configurations: – – – –

644 512-way systems 328 1024-way systems 1616 2048-way systems 832 4096-way systems

QsNetIII Network 4096–way

QsNetIII cables

• QSFP connectors throughout • Optical cables (e.g.Luxtera), 5-300m – PVDF Plenum rated – LSZH available as an option

• Active copper cables (Gore), 8-20m • Copper cables (Gore) 1-10m • No longer Quadrics proprietary • Bit error rates are a big issue at 5 Gbps and above – Optical cables between switches – Short copper cables from nodes

QsNetIII for HP BladeSystem

Elan5 mezzanine adapter 2 QsNet links PCI-E x8 (initially) 128 MB of memory

Elite5 switch module Full bandwidth 16 links to the blades (via backplane) 16 links to back of the module

2048-way QsNetIII BladeSystem Network

Building a 16K node system in 2009/10

• Single water cooled rack will provide 1000-2000 standard cores ~12-25 TF.

• 8 Blade switches per rack • Connect 128 of these racks with 1024-way top switches

• Single fibre cable per node for full bi-section bandwidth.

QsNetIII Fault Tolerance • All of the QsNetII Features – – – – –

CRCs on every packet Automatic retransmission Adaptive routing avoids failed links Redundant routes Redundant, hot plugable, PSUs and fans

+ Full line rate testing of each link as it comes up – Switches generate CRPAT, CJPAT or PRBS packets – Links are only added to the route tables when they are (a) up, (b) connect to the right place, and (c) can transfer data without error.

Software Model – Firmware & Drivers

• Base firmware in the ROMs • Firmware modules loadable with the device driver – Elan, OpenFabrics, 10GE Ethernet, …

• Kernel modules – elan5, elan, rms

• Device dependent library (libelan5) • Device independent library (libelan) • User libraries

Software Model – Elan Libraries • Point-to-point message passing • One-sided put/get • Transparent rail striping

• Optimised collectives • Locks and atomics ops • Global memory allocation

Why Quadrics?

• Focus on the most demanding HPC applications • Delivers large system scalability – All nodes achieve host adapter bandwidth at the same time – Minimal spread between best and worst case performance – Low and uniform latency – Highly optimised collectives • Single supplier of interconnect hardware, software, support • Stability of our products • Track record of delivering production systems • European company

Related Documents


More Documents from ""