The Once and Future SciDAC with apologies to T. H. White
Thom H. Dunning, Jr. National Center for Supercomputing Applications and Department of Chemistry University of Illinois at Urbana-Champaign
QuickTimeª and a TIFF (Uncompressed) decompressor are needed to see this picture.
University of Illinois at Urbana-Champaign
National Center for Supercomputing Applications
SciDAC: The Program
“Advances in the simulation of complex scientific and engineering systems provide an unparalleled opportunity for solving major problems that face the nation in the 21st Century.”
National Center for Supercomputing Applications
SciDAC Goals Create a Scientific Computing Software Infrastructure that bridges the gap between applied mathematics & computer science and computational science in the physical, chemical, biological, and environmental sciences:
•
Scientific Application Codes – Develop mathematical models, computational methods, and scientific codes to take full advantage of the capabilities of terascale computers
•
Computing Systems and Mathematical Software – Develop software infrastructure to accelerate the development of scientific codes, achieve maximum efficiency on high-end computers, and enable a broad range scientists to use simulation in their research
•
Collaboratory Software – Develop network technologies and collaboration tools to link geographically separated researchers, to facilitate movement of large (petabyte) data sets, and to ensure that academic scientists can fully participate in these activities National Center for Supercomputing Applications
SciDAC Goals II Create a Scientific Computing Hardware Infrastructure that is robust, agile, and flexible:
•
Flagship Computing Facility – To provide computing resources to address a broad range of scientific problems
•
Topical Computing Facilities – To ensure that the most effective and efficient resources are used to solve each class of problems
•
Experimental Computing Facilities – To guide advances in computer technology to ensure that scientific computing has the resources that it needs in the future
•
ESNet – To support research in a connected world National Center for Supercomputing Applications
SciDAC: Circa 2001 Hardware Infrastructure
Software Infrastructure
O P S E Y R S A T T E I M N G
C O L L A B O R A T O R I E S
D A T A G R I D S
ASCR
COMPUTING SYSTEMS SOFTWARE
Data Analysis & Visualization Programming Environments Scientific Data Management Problem-solving Environments
M A T H E M A T I C S
S C I E N T I F I C
S I M U L A T I O N
C O D E S
BES, BER FES, HENP
SciDAC Score Card Goal
Status Comments
Scientific Challenge Codes
Excellent progress in selected areas, but many areas poorly supported or even neglected
Computing & Math Software
Excellent progress, but some areas need additional support
Collaboratory Software
Good progress, but little used
Flagship Computing Facility
Two facilities established, NERSC and NLCF, but …
Topical Computing Facilities
QCDOC and MSCF, but many opportunities still unexplored
Experimental Computing Facilities
Little progress
National Center for Supercomputing Applications
After 5 Years Is SciDAC Still Needed?
Yes!
After 5 Years Does SciDAC Need More Funding?
Yes!
Central Dogma The central dogma of SciDAC is the close coupling between computer hardware and computer software
Hardware Porting Revision Rewriting
SciDAC Enhanced Multidisplinary Teams Performance (can be dramatic)
Software Changes in computer hardware requires changes, often major changes, in computer software. Responding to such changes in a timely manner requires a multidisciplinary approach. National Center for Supercomputing Applications
The Coming Revolution in Computing “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software” Herb Sutter in Dr. Dobb’s Journal 30(3), March 2005
The GHz Race At the 2000 IEEE International Electron Devices Meeting, Intel announced that it expected to produce a 10 GHz microprocessor by 2005. The fastest Intel microprocessor today runs at 3.8 GHz (Intel Pentium 4). It was introduced six months ago. At its presentation of the 6XX series of Prescott, Intel stated that it is committed to “adding value beyond GHz.” National Center for Supercomputing Applications
Increasing Computer Performance
•
Increasing Clock Frequency – Pentium: 60 MHz to 3,800 MHz in 12 years – Resulted in ~80% of performance increase
National Center for Supercomputing Applications
The Heat Problem Rocket Nozzle
1000
Watts/cm2
Nuclear Reactor
Pentium 4 (Prescott)
100
Pentium 4 (Willamette)
10
Hot Plate
Pentium Pro Pentium i486
i386
1 1.5µ
Pentium III Pentium II
1.0µ
0.7µ
0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ
Increasing Frequency Courtesy of Bob Colwell National Center for Supercomputing Applications
Managing the Heat Load
Liquid cooling system in Apple G5s
Heat sinks in 6XX series Pentium 4s
National Center for Supercomputing Applications
Leakage Current
From Minor Nuisance to Chip Killer Dissipated Power ~ CV2f
Power (W)
300 250 200
Dynamic Power
150
Leakage Power
100 50 0 250
180
130
90
70
Process Technology (nm)
National Center for Supercomputing Applications
Means of Increasing Performance
•
Increasing Clock Frequency – From 60 MHz to 3,800 MHz in 12 years – Has resulted in ~80% of performance increase
•
Execution Optimization – More powerful instructions – Execution optimization (pipelining, branch prediction, execution of multiple instructions, reordering instruction stream, etc.)
National Center for Supercomputing Applications
Microarchitecture Trends
MIPS
106
Multi-Threaded, Multi-Core
105
Pentium 4 and Xeon Architecture with HT Multi-Threaded
104
Pentium 4 Architecture Trace Cache Pentium Pro Architecture Speculative Out-of-Order
103 102
Era of Instruction Parallelism
Pentium Architecture Super Scalar
101 1980
1985
1990
Era of Thread Parallelism
1995
2000
2005
2010
Adapted from Johan De Gelas, Quest for More Processing Power, AnandTech, Feb. 8, 2005. National Center for Supercomputing Applications
Means of Increasing Performance
•
Increasing Clock Frequency – From 60 MHz to 3,800 MHz in 12 years – Has resulted in ~80% of performance increase
•
Execution Optimization – More powerful instructions – Execution optimization (pipelining, branch prediction, execution of multiple instructions, reordering instruction stream, etc.)
•
Larger Caches – On-chip caches to ameliorate the growing disparity between processor speed and memory latency and bandwidth National Center for Supercomputing Applications
Moore’s Law Still Holds 1011
2G 4G
Transistors Per Die
1010 Memory Microprocessor
109 108 107
4M
1M
6
10
64K 4K 16K
5
10
104
i486™ 80286
Pentium III Pentium® II Pentium®
i386™
8080
1K
103
256K
512M 1G 256M 128M Itanium® 64M Pentium® 4 16M ®
4004
8086
102 101 100
’60
’65
’70
’75
’80
’85
’90
’95
’00
’05
Source: Intel National Center for Supercomputing Applications
’10
Increasing Caches: Montecito
National Center for Supercomputing Applications
Means of Increasing Performance
•
Increasing Clock Frequency – From 60 MHz to 3,800 MHz in 12 years – Has resulted in ~80% of performance increase
•
Execution Optimization – More powerful instructions – Execution optimization (pipelining, branch prediction, execution of multiple instructions, reordering instruction stream, etc.)
•
Larger Caches – On-chip caches will continue to increase in size and help mitigate disparities in computer subsystem performance National Center for Supercomputing Applications
New Technologies for Computers
•
Low power processors
National Center for Supercomputing Applications
IBM Blue Gene Systems
•
LLNL BG/L – 360 teraflops – 64 racks • 65,536 nodes • 131,072 processors
•
Node – Two 2.8 Gflops processors • System-on-a-Chip design • 700 MHz • Two fused multiply-adds per cycle
– Up to 512 Mbytes of memory – 27 Watts National Center for Supercomputing Applications
Technologies for Petascale Computers
•
Low Power Processors – Need unprecedented application software scalability • Application codes must scale to 100,000s of processors – Need ability to recover from continual processor loss
National Center for Supercomputing Applications
New Technologies for Computers
•
Low Power Processors – Need unprecedented scalability • Application codes must scale to 100,000s of processors – Need ability to recover from processor loss
•
Multicore Chips
National Center for Supercomputing Applications
Architecture of Dual-Core Chips
•
IBM Power5 – Shared 1.92 Mbyte L2 cache
•
AMD Opteron – Separate 1 Mbyte L2 caches – CPU0 and CPU1 communicate through the SRQ
•
Intel Pentium 4 – “Glued” two processors together National Center for Supercomputing Applications
Intel Processor Roadmap
National Center for Supercomputing Applications
New Technologies for Computers
•
Low Power Processors – Need unprecedented scalability • Application codes must scale to 100,000s of processors – Need ability to recover from processor loss
•
Multicore chips – Need to better understand a number of architectural issues • Memory bandwidth • Cache contention •…
National Center for Supercomputing Applications
Other Promising Technologies •
Field Programmable Gate Arrays (FPGAs) – Capabilities increasing rapidly (riding silicon technology curve) – Need efficient software development tools
•
Heterogeneous Computer Systems – Different types of processors in single system
•
Vector processors, superscalar processors, FPGAs
– High speed interconnect linking all processors – May be especially advantageous for some applications, e.g., multiphysics applications
•
Many Other New Ideas – DARPA: High Productivity Computing System program – Universities: Sterling, Dally, …
National Center for Supercomputing Applications
SciDAC: Pathway to the Future
Questions?