Memory Hierachy Design (1)
Memory Hierarchy Design – Part 1 Memory Hierarchy and Cache ABC What is memory Hierarchy? • multiple levels of memories from CPU with – increasing size – decreasing speed and cost • Hardware-controlled caching Why Memory Hierarchy? • Slow memory and fast CPU • increased gap between memory and CPU performances
Computer Architecture
11-1
Memory Hierachy Design (1)
11-2
3000
2000
1000 Performance 100
10
00
99
20
98
19
97
19
96
19
95
19
94
19
93
19
92
19
91
19
90
19
89
19
88
19
87
19
86
19
85
19
84
19
83
19
82
19
81
19
19
19
80
0
Year Memory
CPU
FIGURE 5.1 Starting with 1980 performance as a baseline, the performance of memory and CPUs are plotted over time.
Computer Architecture
Memory Hierachy Design (1)
Cache ABC • Where can a block be placed in the upper level (Placement policy) • How is a block found if it is in the upper level (Block identification) • Which block should be replaced on a miss? (Block replacement) • What Happens on a write? (Write strategy) Placement Policy • Direct Mapped: (Block number) mod (N umber of cache blocks) • Set Associative: (Block number) mod (N umber of sets) • Fully Associative: A block can go to anywhere in the cache. • Associativity (m-way): the number of blocks in a set is m. • m = 1 implies direct mapping • m = total number of blocks in cache implies fully associative cache. Computer Architecture
11-3
Memory Hierachy Design (1)
11-4
Fully associative: block 12 can go anywhere
Block no.
0 1 2 3 4 5 6 7
Direct mapped: block 12 can go only into block 4 (12 mod 8) Block no.
0 1 2 3 4 5 6 7
Set associative: block 12 can go anywhere in set 0 (12 mod 4) Block 0 1 2 3 4 5 6 7 no.
Cache
Set Set Set Set 0 1 2 3 Block frame address Block no.
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Memory
FIGURE 5.2 This example cache has eight block frames and memory has 32 blocks.
Computer Architecture
Memory Hierachy Design (1)
11-5
Block Identification • Cache data are identified with blocks. • Multiple blocks can be mapped in the same cache block. – Cache block needs to include tag to distinguish them. • Three portions of memory address Block address Tag
Index
Block offset
FIGURE 5.3 The three portions of an address in a set-associative or direct-mapped cache.
• Let index be b bits. We have 2b sets. • Decreasing b implies increasing the associativity. • tag bits = block address bits − index bits. Computer Architecture
Memory Hierachy Design (1)
Replacement Policy • Random • Least-Recently Used (LRU) Write Strategy • Write-through: both cache and lower memory are updated on a write hit. • Write-back: only cache is updated on a write hit and the lower memory is updated when the block is replaced. Write Miss • Write allocate: the blocked is allocated on write miss. • Write around: the blocked is not allocated on write miss.
Computer Architecture
11-6
Memory Hierachy Design (1)
11-7
An Example – Alpha AXP 21064 Block Block address offset <8> <5> <21> Tag
1
CPU address Data Data in out
Index 4 Valid <1>
(256 blocks)
Data <256>
Tag <21>
2
=?
3
4:1 Mux Write buffer
Lower level memory
FIGURE 5.5 The organization of the data cache in the Alpha AXP 21064 microprocessor.
Computer Architecture
Memory Hierachy Design (1)
• 1-way set associative (direct mapped) cache (8KB) • 28 sets • block size = 32 (bytes) • write-through with write buffer Write buffer with write merging • each buffer entry has four words • write merging reduces the traffic and saves the buffer space (Figure 5.6)
Computer Architecture
11-8
Memory Hierachy Design (1)
Cache Performance Average access time = Hit time + Miss rate × MissPenulty Improving Cache Performance • Reduce Miss Rate • Reduce Miss Penalty • Reduce Hit Time
Computer Architecture
11-9