Energy Efficient Computing: from milliwatt to megawatt Feng Zhao, Assistant Managing Director Microsoft Research Asia http://research.microsoft.com/~zhao Joint work with Aman Kansal, Jie Liu, Suman Nath, Bodhi Priyantha Talk at Santa Barbara Summit on Energy Efficiency, May 20, 2009
The Power Spectrum • Sensors, embedded networks: – running on AA batteries for months to years
• Data centers with 100,000s of servers: – often located near large hydro power plants
Computing on a dime
102W
Computing in a warehouse
9 orders of magnitude in power difference. Tradeoffs in energy and performance across the scale
107W
Modular sensor platform • Lego-like kit to explore design of mobile devices (e.g. cell-phones) with multiple processors, radios, storage Processor “brick” (ARM) Scalable interconnect
Radio “brick” (WiFi) Low Power Processor “brick” (MSP430)
• Optimizing system wide energy consumption for applications Lymberopoulos, Priyantha, Zhao, "mPlatform: A Reconfigurable Architecture and Efficient Data Sharing Mechanism for Modular Sensor Nodes." IPSN’07. Lymberopoulos, Priyantha, Goraczko, Zhao, "Towards Energy Efficient Design of Multi-Radio Platforms for Wireless Sensor Networks.“ IPSN’08.
Map tasks to components • Multiple processors – Radio MAC processor, app processor, DSP – Each has multiple P-states/C-states
Map application to components and power states Static
Dynamic
(design time)
(run time)
Task Allocation • Task processing requirements may only become known at run time – Varying sensed event rate, varying application mix supported by platform
• Solution: Adapt power state usage dynamically – E.g. Trade-off responsiveness for power savings through increased sleep state usage
Data centers Data centers are often over provisioned • Low average CPU utilization • Over-cooling due to hot spots/large thermal gradient (D15-25F on front)
Load fluctuation
6
Data Center Genome Saving energy and improving operation efficiency by networked sensing, data mining, and control.
MSR Genomotes
Collect, archive, and understand operations data
Cooling Systems
Power Systems
Operation monitoring, Capacity planning, Device provisioning, Resource control Networking
Server load
Messenger Connection Services Clients
Login Requests
Pick a CS
Dispatch Server
Connection
Load reporting
Connection Server
Connection Server
Connection Server
Backend Servers: Authentication, address book, etc.
Workload and power Login Rate 5 Connections
1200
4
1000 3 800 2
600
1
400 200
0
20
40
60
80 100 Time in hours
120
140
160 0
Typical Server Power Consumption 180 160
Power Consumption (Watts)
Login rate (per second)
1400
Number of connections (millions)
Weekly Windows Live Messenger traffic on 60 servers
Intel(R) 2CPU 2.4GHz Intel(R) 2CPU 3GHz
140 120 100 80 60 40 20 0
Sleep
Idle
20%
40% 40$
60%
80%
100%
CPU Utilization
• Server loads fluctuate over time • Servers are installed to handle peak load • Shutting down unused servers can yield significant energy saving
Load Dispatching Strategies Load Balancing
Load Skewing
controls convergence rate
round robin over busiest server as long as Ni Ntgt e.g. Ntgt 0.9 N max
Starve a server before shutting down
Declare Ni Ntail as shut down candidates
pi
1 1 N ( i ) K K N tot
Ltot (t )
Ltot (t )
User requests
Load Dispatcher
Load Dispatcher Li (t )
Li (t )
N i (t )
N i (t )
Di (t )
User requests
Di (t )
N tgt
Load Dispatching Strategies Load Balancing
Load Skewing
controls convergence rate
round robin over busiest server as long as Ni Ntgt e.g. Ntgt 0.9 N max
Starve a server before shutting down
Declare Ni Ntail as shut down candidates
pi
1 1 N ( i ) K K N tot
at steady states
Ltot (t )
Ltot (t )
User requests
Load Dispatcher
Load Dispatcher Li (t )
Li (t )
N i (t )
N i (t )
Di (t )
User requests
Di (t )
N tgt
Energy Saving and Performance Tradeoffs With vs. Without Forecasting
kWh 440
420
400 Forecasting Balancing Starving 380
Reactive skewing
Nt:10K
Reactive Balancing Starving
360
Nt:40K 340
S: 2hr
S: 4hr
320
SIDs 0
200000
400000
600000
800000
1000000
• Accurate forecasting gives better energy saving with less SIDs. • When loads are not predictable, reactive skewing can perform well. Gong Chen, Wenbo He, Jie Liu, Suman Nath, Leonidas Rigas, Lin Xiao, and Feng Zhao, "Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services,“ NSDI'08, April 16–18, 2008, San Francisco.
JouleMeter: Profiling Application Energy Using Performance Events Detailed Energy Profile
Source code
Run-time control (Eg. Reduce QoS)
JouleMeter Auto-optimize (Eg. Compiler warning: Change line 136 in Program.c to save …)
Workload
Designer Insights (Eg. Network is holding up the CPU in method X …)
Optimizations Aman Kansal and Feng Zhao, "Fine-Grained Energy Profiling for Power-Aware Application Design,“ HotMetrics08, June 6, 2008, Annapolis, MD.
How it works Trace Collection (events, counters, power)
Trace Processing
Energy Data
Test Workload Execution Error Analysis <
• Power tracing only required for learning phase or error analysis
Energy Profiles Estimation Error
Component Energy
Energy Model Error
Component Dynamic Energy
60
20 Measured Estimate Error
16 14
40
CPU
Watts
Watts
12 30
Disk
10 8
20
Memory
6 4
10
2 0
0
50
100
150 Time(s)
Time
200
250
0
300
0
50
Application Dynamic Energy
150 Time(s)
200
250
300
Component Dynamic Energy (Cumulative)
25
2500 Total App 1 App 2
Controlled Load
20
CPU Memory Disk
2000
1500 Joules
15
System
10
1000
5
0
100
Application/Component
Application Energy
Watts
Watts
50
CPU Memory Disk
18
500
0
50
100
150 Time(s)
200
250
300
0
Total
App 1
App 2
Making energy a first class citizen in design Many energy saving opportunities • Exist at multiple layers of systems and apps • Need to discover, expose and exploit relevant power knobs
Consider “energy complexity” • Not just algorithmic complexity, but also tradeoff between energy, performance and other system metrics
Think holistically • Optimize across workload, performance, energy • Relating energy saving to end user experiences