Hyper Threading Concepts & Architecture
Hyperthreading – Concepts & Architecture
Agenda
Technical Journey to Hyperthreading
Hardware & Software Requirements
Performance Issues
Hyperthreading
Timeline Now, Parallel Computing is available on Single Processor
Cluster
Hyperthreading
Symmetric Multi Processing (SMP)
Processor
Parallel Computing - Goals Parallel computing is when a program uses concurrency to either:
decrease the runtime for the solution to a problem.
Increase the size of the problem that can be solved.
Hyperthreading
Single-threaded Processor Parts of Processor: Front-end: fetching/decoding/reordering Execution core: Concurrency actual execution Illusion
Multiple programs in memory Only one executes at a time 4-issue CPU with bubbles 7-unit CPU with pipeline bubbles Time-slicing via context switching
Hyperthreading
Single-threaded SMP What is SMP?
Two threads execute at once, so threads spend less time waiting Twice as much speed and twice as much waste Hyperthreading
“Symmetric MultiProcessors” Tolerably, mislabeled as “Shared-Memory Processors” Processors all connected to a (large) memory UMA: Uniform Memory Access, makes is easy to program Symmetric: all memory is equally close to all processors Cache Coherence via “snoopy caches”
Super-threading [Time-Slice Multithreading]
Principle: the processor can execute more than one thread at a time
Requires more hardware cleverness logic switches at each cycle
Leads to less Waste Just a finer grain of interleaving
BUT, each stage of the front end or the execution core only runs instructions from ONE thread!
Does not help with poor instruction parallelism within one thread Hyperthreading
Simultaneous Multi Threading (SMT)
Principle: the processor can execute more than one thread at a time, even within a single clock cycle!!
Requires even more hardware cleverness
logic switches within each cycle
Finest level of interleaving
From the OS perspective, there are two “logical” processors
Hyperthreading
Evolution of Hyper-Threading
Two ways of faster computing
Clock Speed cannot be increased beyond certain limit
Increase Clock Speed Better utilization of resources
Lot of heat generation
Better utilization of resources is now the choice
Memory access takes relatively more time During this interval, CPU resources can be used by other threads This requires – Out-Of-Order Execution, Register Re-naming,...
Hyperthreading
Hyper Threading With these points in mind, Intel came up with its version of Simultaneous Multi Threading (SMT) called Hyper Threading (HT)
Hyperthreading
Hardware Requirements
Because the additional threads all run on the same CPU elements (FPU, ALU) the only additions that are needed are the initial scheduling process.
Although hyper-threading might seem like a pretty large departure from the kind of conventional, process-switching multithreading done on a single-threaded CPU, it actually doesn't add too much complexity to the hardware.
Intel reports that adding hyper-threading to their Xeon processor added only 5% to its die area. Hyperthreading
Intel Xeon – Case Study
Capable of executing at most two threads in parallel on two logical processors.
Must be able to maintain information for two distinct and independent thread contexts.
Done by dividing up the processor's micro-architectural resources into three types:
Hyperthreading
replicated partitioned shared
Intel Xeon – Resources Division •
Replicated
•
•
•
•
Partitioned
•
•
•
Shared
•
•
•
Hyperthreading
Register renaming logic Instruction Pointer ITLB Return stack predictor Various other architectural registers
Re-order buffers (ROBs) Load/Store buffers Various queues, like the scheduling queues, uop queue, etc.
Caches: trace cache, L1, L2, L3 Micro-architectural registers Execution Units
Replicated Resources
Some resources have to be replicated like
Instruction Pointer
1 Instruction Pointer for each Logical Processor. Xeon: 2 Instruction Pointer
Register Allocation Table
Hyperthreading
For mapping architectural registers (8 integers and 8 floating-point) onto 128 General Purpose Registers and 128 Floating Point Registers Replicated Resource managing a Shared Resource
Partitioned Resources
Queues are partitioned resources
Statically Partitioned Queue
Hyperthreading
Dynamically Partitioned Queue
Shared Resources
Heart of Hyperthreading: More Shared Resources => More Efficient Hyperthreading <= squeezing maximum amount of computing power out of the minimum amount of die space
Such resources are: registers, load/store units
SMT unaware
Hyperthreading
Hyper-Threading Architecture Overview
Hyperthreading
Hyperthreading
Confusing Notions Is Hyper-threaded Processor same as Dual Core? Answer: NO
Hyper-Threaded = 2 Logical Processors Dual-Core = 2 Actual Processors on single chip
Hyperthreading
HT – System Requirements
HT enabled Processor
HT enabled Chipsets
Pentium 4 3.06 GHz, Xeon Intel 945G Express
HT enabled System BIOS HT enabled Operating System
Windows 2000, XP, Linux 2.4.12
Hyperthreading
HT – Requirements from User
Enable HT in BIOS To utilize HT
Use multi-threaded applications OR Run multiple applications at same time
Hyperthreading
Performance Issues - 1
2 Logical Processors != Double Power
Lesser CPU intensive programs may not show much any gain
Reported gains are 20-40%
Hyperthreading
Performance Issues - 2 Death-Traps
Main cause: Shared Resource Xeon Philosophy <->Cooperative Multitasking OS Cases:
Floating Point Unit (FPU):
Cache
One floating-point intensive thread takes up the FPU; Another similar thread contending for same FPU gets stalled No cache-coherency problem as in SMP But, cache conflict between two logical processors Worst-Case: Two threads accessing different parts of memory and sharing no data => Lot of thrashing
Benchmarks Results: Non-SMT may perform better With the wrong mix of code, hyper-threading decreases performance
Hyperthreading
HT Hardware Hands-On
Need to Enable Hyperthreading through BIOS
Simple Test: Do together with and without HT Compress 1GB File Play Windows Media Player with Visualization plug-in Analyze the time taken in 2 cases
Good Benchmark: Embarrassing Parallel (EP) from NASA
Hyperthreading
4 Processors View in Task Manager
Hyperthreading
Key Point
Hyper-Threading Technology gives better utilization of processor resources
Hyper-Threading Technology gives more computing power for multithreaded applications Thread Level Parallelism on single processor
Hyperthreading
References
"Hyper-Threading Technology." Intel. Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty, J. Alan Miller, Michael Upton. "Hyper-Threading Technology Architecture and Microarchitecture." Intel Susan Eggers, Hank Levy, Steve Gribble. Simultaneous Multithreading Project. University of Washington Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. "Simultaneous Multithreading: A Platform for Next-generation Processors." IEEE Micro, September/October 1997, pages 12-18. Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm, and Dean Tullsen. "Converting Thread-Level Parallelism Into Instruction-Level Parallelism via Simultaneous Multithreading." ACM Transactions on Computer Systems, August 1997, pages 322-354.
Hyperthreading
Thank You E-mail:
[email protected] You have downloaded this presentation from: http://www.zainvi.tophonors.com
Hyperthreading