Next Generation Systems Architectures A Sun Perspective
Franz Haberhauer
Technical Director Chief Technologist Client Solutions Germany
[email protected]
Innovation Matters
$1.9B R&D
Sun's Three Franchises
Sun Microsystems A True Systems Company Developer Apps / Webservices Middleware Web
App
Dir
Msg
Msg
Operating System Server Network
Storage
...
The New Software Platform Shrink-Wrap App
Application scales from 1 to perhaps 100's of users
O/S Processor OS400 AS400
DGUX 88K
VMS VAX
Aegis 68K
Irix
MIPS
370
Solaris
Alpha
TM
SPARC®
Linux Ultrix
For Desktop: 1 copy SW = 1 copy HW
MVS
x86
AIX
Power
Server HP-UX Windows x86
Precision
The New Software Platform Comp
Shrink-Wrap
Comp
App
Comp
O/S
MW
Processor OS400 AS400
DGUX 88K
VMS VAX
Aegis 68K
Irix
MIPS
Alpha
MW
Networked Components
Comp MW
MVS 370
Solaris
TM
SPARC®
Linux Ultrix
MW
x86
AIX
Power
Server HP-UX Windows x86
Precision
Service must scale to 10,000's or millions of simultaneous users!
The New Software Platform Comp Comp
App
Comp
O/S
MW
MW
MW Comp MW
Processor OS400 AS400
DGUX 88K
VMS VAX
Aegis 68K
Irix
MIPS
MVS Solaris
Alpha
TM
TM
SPARC®
Linux Ultrix
The Java Platform
370
x86
AIX
Power
Server HP-UX Windows x86
Precision
XML The Windows Platform
Application Architecture Evolution
Services = Graphs of Interacting Components Client/Server
Objects
Web Application
Web Service
XDR CORBA RPC
RMI
COM+
XML
Next
Next Next
Directories Security Models Administration
XML
XML
Services Graphs
XML EJB App SQL DB
JSP Web LDAP Dir
cache/ filter
X 106 MSG
SLA Capability
Capacity
Computing Pools
Connectivity
Internet/ Intranet Storage Virtualization
Storage Pools
Midlets J2ME
New Challenges Virtualization ● Provisioning ● Infrastructure ● Services ● Telemetry ● Observability ●
Service Point Architecture SAN
SAN
NAS
NAS
Storage Network Presentation App
DB
Directory
Internet/ Intranet
Security
Policy
Management
Vertical Scaling / Scale Up / Data Facing
Multiprocessor Cluster /Blade Server Proc
Proc
Mem
Mem
I/O
Memory Switch
Cache coherant SMPs
●
Large main memory
●
One pool of processors ●
Cluster Management
Tightly coupled ●
High bandwidth, low latency
●
Flexible, dynamic scheduling by OS
●
Scaling within the node
●
Highly available nodes
●
Powerful I/O subsystems
Loosly coupled
●
Cheap, small nodes ●
I/O
●
●
Multiple OS Instances
1-4 way, few I/O slots
●
Scaling by the number of nodes
●
Load scheduling by a separate component (HW or SW) ●
Loadbalancer, Oracle RAC, App Server, Grid Job Queing System
●
Usually only when TA/Session/Job is startet
●
Highly parallel/parallelizable loads (Web, HPTC)
●
RAS through replication
Proc
Proc
Proc
Mem
Mem
Mem
I/O
I/O
I/O
Network Switch
Horizontal Scaling / Scale Out / Network Facing
Deutsch's 7 Fallacies of Networking ● ● ● ● ● ● ●
The Network is reliable Latency is zero Bandwidth is infinite The network is secure Topology doesn't change There is one administrator Transport cost is zero
Blade Shelf
A Horizontally Scaled System
Blade Server Evolution Platform Wave 1: Evolving thin server design to provide increased density, improved environmentals and better system management for enhanced throughput processing Platform Wave 2: Adopts the latest interconnect technology and optimizes the blade design to deliver superior transaction processing Evolution between Waves is achieved through enhanced service management
Market Adoption
Wave 2 Wave 1
2003
2005
A New Meaning of “System” What we did inside the E10K box…
●
●
We are doing to the network…
● ●
True scalability: Add performance without adding management complexity! “Soft configuration” and “Soft cabling” Multiple, secure domains But with a big difference: Heterogeneous elements Network becomes like SMP backplane
Trend: Traditional Tiers are Disaggregated and Recomposed Disaggregation of server state and function Computation, storage, network, software
Recomposition into optimized entities Appliances, intelligent storage, compute engines, EJB™ components, software services, load-balancers, etc.
Entities live in the network Collectively, they are the (new) computer
N1 Grid: The Computer that is the Network
Moore's Law
130 nm (100-400M) 90 nm (200-800M) 65 nm (300M-1B)
1T 1T
1B 1B 42M
1M 1M 4-bit 2.25K
32-bit 275K
64-bit 3.2M
Transistor density doubles every 18-24 months
8-bit 5K
1K
1
16-bit 29K
Planar transistor
1959
1970
1980
1990
2000
2010
2020
2030
UltraSPARC IV Die Data Instruction Cache Instruction Cache Data Cache Cache Instruction Logic Instruction Logic
ECC for L2 Tags
Core 0
DTLB
L2 Tags
FPU
DTLB Core 1
FPU
L2 Tags
ECC for L2 Tags
Challenges in Processor Design
Memory
Power
Complexity
Memory Bottleneck Relative Performance 10000 CPU Frequency DRAM Speeds 1000
2x Every 2 Years
Gap
100 10
2x Every 6 Years 1 1980
1985
1990
1995
2000
2005
Typical Complex High Frequency Processor Time Saved Thread
C
M
C
M
C
M Time
Memory Latency
Thread
C
M
C
M
Compute
C
M Time
Memory Latency
HURRY UP AND WAIT!
Compute
Note: Up to 75% Cycles Waiting for Memory
Throughput Computing ●
Basic idea –
●
Maximum application-level work performed for throughput
Exploit rich TLP in modern workloads – – –
Parallelism Pipeline simplicity Latency tolerance
Chip Multithreading (CMT) C
Thread 4
C
Thread 3
C
Thread 2 Thread 1
C
M M
M M
C C
C C
M M
M M
C C
C C
M M
M M Time
Memory Latency
Compute
CMT – A Simpler Design ●
Large caches
●
Superscalar design
●
Out-of-order execution
●
Very high clock rates
●
Deep pipelines
●
Speculative prefetches Simple Core: Step and Repeat
Faster time-to-market
Simpler by Design
Limited number of unique transistors
CMT—Multiple Multithreaded Cores Core 8 Core 7 Core 6 Core 5 Core 4 Core 3 Core 2 Core 1
Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1
Time Memory Latency
Compute
How Can CMT Deliver? Relative Performance Per Core
1.0
0.5
0.5X
10%
1X
100%
Core Size vs. Die Usage
How Can CMT Deliver?
100%
100% x 1 = 1x
10%
100%
50% x 10 = 5x
Throughput Networking: Changing the Rules Again Two tasks on two hardware threads Increase packet Hardware C C C C processing by devoting Thread 1 silicon and threads to Hardware Thread 2 network tasks Network Processing
Compute
Packet Processing
No missed packets
Context Switching/Interrupts
Chip Multithreading: Increase Network Efficiency
Niagara http://blogs.sun.com/roller/page/jonathan/20040910
That's what we call a system. A system built for internet workloads. Not for the expedience of a press release. ... Silicon for our Project Niagara chip: 8 cores * 4 threads per core = a 32-way computer. On a chip.
(And before you ask, yes, we are planning a nicer box when we ship :)
Throughput Gains: An Order of Magnitude Ahead SPARC: The Next Generation
SPARC Processor Roadmap Data Intensive
Rock
Today SPARC APL
UltraSPARC IV
UltraSPARC III 1050
1200
1X
UltraSPARC IV+
8X
1200
4X
2X
Niagara
Network Intensive UltraSPARC IIIi 1000
1Y
UltraSPARC IIIi+
1280
2Y
15Y
SPARC APL+
30X
Joining SPARC Forces Advanced Product Line (APL)
Ult ra
SPA R
C IV
Ul
Throu traSPARC gh IV+ Comp uting put Excel Design lence
Sun Fire
Thousands of Applications itical ge r c n o Missi ing Herita ut Comp
Fujitsu PRIMEPOWER S
C64 R A P
2004
64 ARC
V
V+
●
●
●
Optimized to address all network computing workloads Multiple product families (low, mid, high) Systems based on SPARC64 (jointly developed) and Niagara/Rock (Sun developed)
SP
2005
2006
Sun x86 System Roadmap
Beyond
Blade Systems 8 Socket Systems Nauticus Network Switch Sun W1100z W2100z Sun Fire V40z Sun Fire V20z Sun/AMD Alliance
Nov. 17, 2003
Coming Soon Q3 2004
July 30, 2004
July 30, 2004 Feb. 10, 2004
Coming Soon
AMD64 Benefits Besides 64 Bits ●
Full 32-bit compatibility –
●
Benefits: 32-bit Apps on 64-bit OS – – – – –
●
Both for i386 ABI and Linux More Memory - Full 4GB virtual adress space System Call and Library Optimizations PROT_EXEC Faster Kernel – read/write, mmap, etc. ● Large segmap, SEG_KPM, etc. Seamless operation
Apps ported to 64 bits can be faster – – – –
Twice # of Integer Registers Increased # of SSE Registers (128-bit register) Improved calling conventions PIC code no longer causes speed impact (Position Independent Code)
AMD Opteron & HyperTransport Advantages IA32 and IA64
AMD64
CPU
CPU
CPU
CPU
CPU
PCI-X bridge
PCI-X bridge
CPU
CPU
CPU
NorthBridge
Southbridge
●
Shared Frontside Bus between memory traffic and I/O is performance bottleneck
●
Modern, distributed architecture
Suns Multi-Platform OS-Strategy
●
SPARC and x86 maybe Power, Itanium 64 and 32-Bit –
● ●
Horizontal and vertical scaling
●
Sun and 3rd party hw
●
Established ISV acceptance
●
●
● ● ●
●
Runs Linux apps unchanged (x86) Defined release cycle Binary compatibility “write once – run forever” Directed innovation –
●
●
●
Complements Solaris on the x86 platform 32 and 64-Bit Horizontal scaling Sun and 3rd party hw – wide variety Starting ISV acceptance – Red Hat – SuSE Fokus on open source less binary compatibility – Solaris (UNIX) interoperability
●
Interoperability
Innovation in Operating Systems Solaris 10
Advanced Tracing
Predictive Self Healing Reduced System Downtime Automatic Service Restart
Extreme System Performance Fast TCP/IP Stack
N1 Grid Containers Software Partitioning
Live Monitoring of Production Systems
Trusted Solaris Military-Grade Security
Server Virtualization with N1 Grid Containers Network 192.9.9.1
www
192.9.9.2
192.9.9.3
store 192.9.9.4
appserver
B2B 192.9.9.5 oltp
Domain 1
Resource Management + Security Isolation Fault Isolation
Independent Users Separate Networks Independent Storage Isolated Containers
Disk Storage and File Systems
N1 Grid Containers Zones foo zone (foo.net)
beck zone (beck.org)
zone root: /aux0/blueslugs
zone root: /aux0/foonet
zone root: /aux0/beck
web services
login services
web services
enterprise services
network services
network services
core services
core services
core services
(BIND 9.2, sendmail)
zoneadmd
(inetd, ldap_cachemgr)
ce0:1
/usr
zcons
(ypbind, inetd, rpcbind)
zoneadmd
hme0:2
(BIND 8.3, sendmail)
hme0:1
zcons
/usr
/opt/yt
(ypbind, automountd)
(Apache 2.0)
zcons
(Oracle, Application Server)
(OpenSSH sshd 3.4)
/usr
(Apache 1.3.22, J2SE)
zoneadmd
zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...) core services
(inetd, rpcbind, ypbind, automountd, snmpd, dtlogin, sendmail, sshd, ...)
network device (hme0)
remote admin/monitoring platform administration (SNMP, SunMC, WBEM)
network device (ce0)
(syseventd, devfsadm, ...)
storage complex
Virtual Platform
blue zone (blueslugs.com)
Application Environment
(serviceprovider.com)
ce0:2
global zone
Virtual Machines versus Containers
HyperVisor Host OS
Hardware IBM LPAR HP VPAR EMC VMware
Container
Container
{
Container
Kernel
4000 tested on a V880
Kernel
One OS Instance
<1% Overhead
Kernel
OS
OS
{
OS
5-15%? Overhead
Multiple OS Instances
Solaris 10 OS Kernel
SPARC or x86
Server Virtualization
N1 Grid Containers - “N1 Grid In a Box” ●
●
●
Server Container 1
Container 2 Container 3
Container 4
Container 5
Solaris Domain 1
Solaris Domain 2
Dynamic Reconfiguration Dynamic System Domains N1 Grid Containers
N1 Grid – One System to Manage
The Datacenter as a Single System “n computers operating as 1”
Sun Microsystems A True Systems Company Developer Java Systems Apps / Webservices Middleware Web
App
Dir
Msg
Msg
Server Network
Storage
N1 Grid Operator
Operating System
...
Scaling the N1 Grid N1 Grid Distributed Datacenter
N1 Grid System
N1 Grid Server
N1 Grid System Workload Workflow, Monitoring & Policy Automation
N1 Grid Engine N1 Grid Service Provisioning System
Business Application Services N1 Grid Provisioning Server for Blades N1 Grid Containers N1 Grid Data Platform
System Infrastructure
N1 Grid Architectural Concepts ●
Service Virtualization ● ●
●
Efficient scaling of the number of instances ●
●
Separating the service from the platform "Movable" service components
Cloning reference installations
Service level management ● ●
Performance monitoring and modelling Capacity planning 48
There May Still Be A Lockin
Middleware App
Dir
Msg
Msg
Operating System Server Network
Storage
...
Binary Lockin
Web
Administrative Lockin
Apps / Webservices
Sun Software Product Stack Open Source Software Infrastructure Services Operating Systems Hardware
SUN JAVA SYSTEMS TM
Java Enterprise System
Java Desktop System
Java Mobility System
Java Card System
Java Studio N1 Grid Solaris SPARC
TM
Linux x86
Java Enterprise System TM
Q1 Directory Identity/Access TM
J2EE Application Web Portal E-Mail/Messaging Calendar Server Instant Messaging Collaboration Availability MPEG Streaming Grid Virtualization Solaris
Q2
Q3
Q4
Q1
Q2
Q3
Q4
S
Dev. Environment
Portal Server - Userman., Authentication, Authorization, SSO XML/JSP - Presentation, Transformation, Desktop, Peronalization - Secure Access (Search, Files, Mail, Applications, VPN...) Portlet Portlet Portlet Portlet Portlet Portlet - Device aware
SecureJava - Personalized ES Portal Server – Centralized + Access SRAP to+Services Mobile Access and Information + PKP
Access Manager & Directory Server Server
Java LDAP ES SSO UDDI Central Directory - User Profiles and - Passwords - Service Access Profiles ...Manager PKI
Web/App. Server
A
S S S SS S Web S S
Services
S
S
S CRM
S
S S Shop
Java ES Workflow Rules Identity Manager Adapters Adapters Adapters
S S SS SS
S S Server SAP Web Container EJB Container Java ESS WebWeb & Application S CMS S SServer S Portal
Web Server Web Container EJB Container
Identity Manager
Sync/Provisioning Engine
Java JavaES Studio IDE
Message Queue Message Oriented Middleware
Java ES Mess. Queue (JMS, SOAP, XML)
Comm. Collab. Server Comm. & & Collab. Server
Mail Server
Java ES Communication Calendar Server MTA and Mail Server Collaboration IM Server Calendar Server Services Address Book IM Server
Java ES Verfügbarkeit Cluster Cluster & & Grid Computing Performance Grid Engine
Java Enterprise System TM
Delivery Pricing Licensing
$100/Employee/Year
Summary ●
Sun's franchises –
●
Choice – –
●
Innovations matters and pays SPARC or x86-based systems Solaris or Linux on x86-based systems from Sun
Sun as a true systems vendor –
Pre-integrated Java Systems ●
– –
Licensing model fits to throughput computing and N1 Grid
Other components integratable N1 Grid to manage n systems as 1
Sun The Network is the Computer Franz Haberhauer
[email protected]