Peformance
Performance Indices
Memory Usage Power
80% of time is spent in standby mode 80% of operating time is a subset of peripherals and functions used
80/20 20% of time is spent in operating mode
20% of operating time is spent with all functions in use
Von Neumann
Architecture
Harvard Saving power at boot time
Software Techniques
Scaling voltage and frequency Using sleep modes and idling clock domain Coordinating sleep and scaling
Power
Reduce power supply voltage
Embedded processor architecture
Run at low clock frequency
Chip level
Disable function units with control signals when not in use Disconnect part from power supply when not in use
Optimal performance key algorithm Low power for key algorithm
Pick Criteria
Efficient management of memory (cache) Register Memory hierarchy
Different levels of cache Main memory Disc space
Loosely coupled to OS kernel Set of library coutines; execute in client's context Power Manager OS layer
Configurable as necessary Use platform-specific adaptation library for V/F scaling Application, drivers, CLK register for notifications Application trigger actions
Algorithm Prevent obsessive optimization Inline functions Table lookups
Optimization for Embedded System
Developing efficient code
Hand coded assembly Register variables Polling Fixed point arithmetic
Layers
Avoid standard library routine Decreasing code size
Native word size C++ Subtopic Architecture SW to have natural "idle" points Use interrupt-driven programming Code and data placement close to processor to minimize off-chip accesses Smart placement to allow frequencyly accessed code/data close to CPU Size optimizations to reduce footprint, memory, and corresponding leakage
Top 10 SW power optimization
Optimize for speed for more CPU idle mode or reduce CPU frequency Don't over calculate Use DMA for efficient transfer Use co-processors to efficiently handle/accerlate frequent/specialized processing
Code
Use more buffering and batch processing to allow more computation at once and more time in low power modes Simplifies control flow Allocates variables to registers -o0 (register)
Eliminates unused code Simplifies expressions and statements Expands calls to inline functions Performs local copy/constant propagation
-o1 (local)
Removes unused assignments Eliminates local common expressions Performs local loop optimizations
Optimization levels -o2 (global)
Eliminates global common sub-expressions Eliminates global unused assignments Perform loop unrolling Removes functions that are never called Simplifies functions with return value that are never used
-o3 (file)
Reorder functions so that attributes of called function are known when caller is optimized Identifies file-level variable characteristics
Optimization for Embedded System.mmap - 2007/8/26 -