Computer Architecture Chapter 5 Memory Hierarchy Design
Chapter Overview 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory
Introduction 5.1 Introduction
The Big Picture: Where are We Now? The Five Classic Components of a Computer
5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty
Processor Input Control Memory
5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory
Datapath
Output
Topics In This Chapter: SRAM Memory Technology DRAM Memory Technology Memory Organization
Introduction Capacity Access Time Cost CPU Registers 100s Bytes 1s ns Cache K Bytes 4 ns 1-0.1 cents/bit Main Memory M Bytes 100ns- 300ns $.0001-.00001 cents Disk /bit G Bytes, 10 ms (10,000,000 ns) 105- 10 6 cents/bit Tape infinite secmin8 10
The Big Picture: Where are We Now?
Levels of the Memory Hierarchy
Upper Level faste r
Staging Xfer Unit
Register s Instr. Operands Cach e Block s Memor y Page s Dis k File s Tap e
prog. /compiler 1-8 bytes cache cntl 8-128 bytes OS 512-4K bytes user/operato r Mbytes
Large Lower r Level
The ABCs of Caches 5.1 Introduction
In this section we will:
5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty
Learn lots of definitions about caches – you can’t talk about something until you understand it (this is true in computer science at least!)
5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory
Answer some fundamental questions about caches: Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)
Cache Memory The purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory. Although cache is much smaller than main memory, its access time is a fraction of that of main memory. Unlike main memory, which is accessed by address, cache is typically accessed by content; hence, it is often called content addressable memory . Because of this, a single large cache memory isn’t always desirable-- it takes longer to search.
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module
Cache/Main Memory Structure
Cache operation – overview CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
Cache Read Operation - Flowchart
Comparison of Cache Sizes Processor
Type
L1 cachea
L2 cache
L3 cache
Mainframe
Year of Introduction 1968
IBM 360/85
16 to 32 KB
—
—
PDP-11/70
Minicomputer
1975
1 KB
—
—
VAX 11/780
Minicomputer
1978
16 KB
—
—
IBM 3033
Mainframe
1978
64 KB
—
—
IBM 3090
Mainframe
1985
128 to 256 KB
—
—
Intel 80486
PC
1989
8 KB
—
—
Pentium
PC
1993
8 KB/8 KB
256 to 512 KB
—
PowerPC 601
PC
1993
32 KB
—
—
PowerPC 620
PC
1996
32 KB/32 KB
—
—
PowerPC G4
PC/server
1999
32 KB/32 KB
256 KB to 1 MB
2 MB
IBM S/390 G4
Mainframe
1997
32 KB
256 KB
2 MB
IBM S/390 G6
Mainframe
1999
256 KB
8 MB
—
Pentium 4
PC/server
2000
8 KB/8 KB
256 KB
—
IBM SP
2000
64 KB/32 KB
8 MB
—
CRAY MTAb
High-end server/ supercomputer Supercomputer
2000
8 KB
2 MB
—
Itanium
PC/server
2001
16 KB/16 KB
96 KB
4 MB
SGI Origin 2001
High-end server
2001
32 KB/32 KB
4 MB
—
Itanium 2
PC/server
2002
32 KB
256 KB
6 MB
IBM POWER5
High-end server
2003
64 KB
1.9 MB
36 MB
CRAY XD-1
Supercomputer
2004
64 KB/64 KB
1MB
—
The ABCs of Caches
Definitions The Principle of Locality
The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Three Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Sequential Locality : Sequential order of program execution except branch instructions.
A few terms Inclusion Property Coherence Property Access frequency Access time Cycle time Latency Bandwidth Capacity Unit of transfer
Definitions
The ABCs of Caches
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of Upper level access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Consider a memory with three levels Average memory access time (assuming hit at 3rd level) h1 * t1 + (1 – h1) [t1 + h2 * t2 + (1 – h2) * ( t2 + t3)] where t1, t2 and t3 are access times at the three levels Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi
Effective Access time =
(fi * ti)
The ABCs of Caches
Definitions Cache Measures
Hit rate : fraction found in that level So high that usually talk about Miss rate Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) Miss penalty : time to replace a block from lower level, including time to replace in CPU access time : time to lower level = f(latency to lower level) transfer time : time to transfer block =f(Bandwidth between upper & lower levels)
Measures CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) * Clock Cycle Time CPU clock cycles includes cache hit and CPU is stalled during miss
Memory Stall cycles = Number of misses * Miss penalty = IC * (Misses / Instruction) * Miss penalty = IC * (Memory Accesses / Instruction) * Miss Rate * Miss penalty Miss rate and miss penalties are different for reads and writes
Memory Stall Cycles = IC
* (Reads / Instruction) * Read Miss Rate * Read Miss penalty + IC * (Writes / Instruction) * Write Miss Rate * Write Miss penalty
Miss Rate = Misses / Instruction
= (Miss rate * Memory Accesses ) / Instruction Count = Miss rate * (Memory Accesses / Instruction)
Typical Cache Organization
Definitions
The ABCs of Caches Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
Memor y
Simplest Cache: Direct Mapped 4 Byte Direct Mapped Cache Cache Index 0 1 2 3
Location 0 can be occupied by data from: Memory location 0, 4, 8, ... etc. In general: any memory location whose 2 LSBs of the address are 0s Address<1:0> => cache index Which one should we place in the cache? How can we tell which one is in the cache?
Cache Memory Where can a block be placed in the Cache?
Block 12 is placed in an 8 block cache: Fully associative, direct mapped, 2-way set associative S.A. Mapping = Block Number Modulo Number Sets
Fully associative: block 12 can go anywhere Bloc k no.
0123456 7
Direct mapped: block 12 can go only into block 4 (12 mod 8)
Bloc k no.
0123456 7
Set associative: block 12 can go anywhere in set 0 (12 mod 4)
Bloc k no.
Block-frame address
Bloc k no.
111111111122222222223 0 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0123456 7
Set Set Set Set 0 1 2 3
The ABCs of Caches
How is a block found if it is in the cache?
Each entry in the cache stores words Tag on each block No need to check index or block offset
Address Tag
Byte Offset Index
The ABCs of Caches
How is a block found if it is in the cache?
Each entry in the cache stores words Tag on each block No need to check index or block offset
Address Tag
Byte Offset Index
Cache Memory Take advantage of spatial locality Store multiple words
The diagram below is a schematic of what cache looks like.
Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101. The other two blocks are not valid.
Cache Memory As an example, suppose a program generates the address 1AA. In 14-bit binary, this number is: 00000110101010. The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and the final 3 bits indicate the word within the block.
Cache Memory
Block address
31 Cache Tag
Example: 0x50
Cache Organizations I : DirectMapped Cache
9 Cache Index Ex: 0x01
4 Byte Select Ex: 0x00
0
Stored as part of the cache “state” Valid Bit
Cache Tag 0x50
:
Cache Data Byte 31 : Byte 63 :
Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3
:
: Byte 1023
:
Byte 992 31
Tag s-r
Direct Mapping Address LineStructure or Slot r
Word w
14
8
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
2
Direct Mapping Cache Line Table Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache Organization
Direct Mapping pros & cons Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive
Fully Associative Cache Organization
Associative Mapping Address Structure
Tag 22 bit
Word 2 bit
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF
Cache Memory
Block address
31 Cache Tag
Example: 0x50
Cache Organizations II : SetAssociative Cache
9
8
4 Byte Select Ex: 0x00
Cache Index Ex: 0x01 mod 16
0
Stored as part of the cache “state” Valid Bit
Cache Data Byte 31 :
Cache Tag
0x50
:
Byte 63
:
Byte 1
Byte 0 Set 0
Byte 33 Byte 32
Set 1
:
: Byte 1023
:
Byte 992
Set 15
Set Associative Mapping Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i
e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set
Set Associative Mapping Example 13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set
Two Way Set Associative Cache Organization
Set Associative Mapping Address Structure Tag 9 bit
Set 13 bit
Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF
Word 2 bit
Two Way Set Associative Mapping Example
Replacement Algorithms (1) Direct mapping No choice Each block only maps to one line Replace that line
Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru?
First in first out (FIFO) replace block that has been in cache longest
Least frequently used replace block which has had fewest hits
Random
Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly
Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!
Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes
Cache Memory
Let’s Do An Example: The Memory Addresses We’ll Be Here’s a number of addresses. We’ll be asking for the data at these Using addresses and see what happens to the cache when we do so. Address
1090 1440 5000 1470
Tag 3 1 3 1 3 1 3 1
Set 9 8
Offset 5
0000000000000000000001 001 0 9 8 0 5 0000000000000000000001 0
9 8
xxxxxxxxxxxxxxxxxxxxxxx 9 8
xxxxxxxxxxxxxxxxxxxxxx x
Cache: 1. Is Direct Mapped 2. Contains 512 bytes. 3. Has 16 sets. 4. Each set can hold 32 bytes or 1 cache line.
110 1 xxx x xxx x
5
5
4
Result 0
0001 0 4 0 0000 0 4
0100 0 4
0
0
xxxxx
Miss Miss
Set Address
Initially the cache is empty. Cache: 1. Is Direct Mapped 2. Contains 512 bytes. 3. Has 16 sets. 4. Each set can hold 32 bytes or 1 cache line.
Here’s the Cache We’ll Be V Tag Touching Data (Can hold a 32-byte cache line.)
0 (0000)
N
1 (0001)
N
2 (0010)
N
3 (0011)
N
4 (0100)
N
5 (0101)
N
6 (0110)
N
7 (0111)
N
8 (1000)
N
9 (1001)
N
10 (1010)
N
11 (1011)
N
12 (1100)
N
13 (1101)
N
14 (1110)
N
15 (1111)
N
Cache Memory We want to READ data from address 1090 = 010|0010|00010 Add.
Tag
Set
Offset
256
0000
1000
00000
512
0001
0000
00000
Doing Some Cache Action Set Address
V
Tag
Data (Always holds a 32-byte cache line.)
0 (0000)
N
1 (0001)
N
2 (0010)
Y 00000….10 Data from memory loc. 1088 - 1119 N
3 (0011)
N
4 (0100)
N
5 (0101)
N
6 (0110)
N
1024
0010
0000
00000
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
N
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
12 (1100)
N
1620
0011
0010
10100
13 (1101)
N
2048
0100
0000
00000
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
Cache Memory We want to READ data from address 1440 = 010|1101|00000 Offset
Doing Some Cache Action Set Address
V
0 (0000)
N
1 (0001)
N
2 (0010)
Y
Tag
Data (Always holds a 32-byte cache line.)
Add.
Tag
Set
256
0000
1000
00000
3 (0011)
N
512
0001
0000
00000
4 (0100)
N
1024
0010
0000
00000
5 (0101)
N
6 (0110)
N
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
N
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
1620
0011
0010
10100
12 (1100)
N
13 (1101)
N Y 00000….10 Data from memory loc. 1440 - 1471
2048
0100
0000
00000
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
00000….10
Data from memory loc. 1088 - 1119
Cache Memory We want to READ data from address 5000 = 1001|1100|01000 Offset
Doing Some Cache Action Set Address
V
0 (0000)
N
1 (0001)
N
2 (0010)
Y
3 (0011)
N
4 (0100)
N
5 (0101)
N
Add.
Tag
Set
256
0000
1000
00000
512
0001
0000
00000
1024
0010
0000
00000
6 (0110)
N
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
N
1440
0010
1101
00000
9 (1001)
N
10 (1010)
N
11 (1011)
N
Tag
00000…….10
Data (Always holds a 32-byte cache line.)
Data from memory loc. 1088 - 1119
1470
0010
1101
11110
1600
0011
0010
00000
12 (1100)
Y 00000….1001 Data from memory loc. 4992 - 5023 N
1620
0011
0010
10100
13 (1101)
Y
2048
0100
0000
00000
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
00000…0010
Data from memory loc. 1440 - 1471
Cache Memory We want to READ data from address 1470 = 0010|1101|11110 Add.
Tag
Set
Offset
256
0000
1000
00000
512
0001
0000
00000
Doing Some Cache Action Set Address
V
0 (0000)
N
1 (0001)
N
2 (0010)
Y
3 (0011)
N
4 (0100)
N
5 (0101)
N
6 (0110)
N
7 (0111)
N
Tag
Data (Always holds a 32-byte cache line.)
00000…….10
Data from memory loc. 1088 - 1119
00000….1001
Data from memory loc. 4992 - 5023
1024
0010
0000
00000
1090
0010
0010
00010
1099
0010
0010
01011
8 (1000)
N
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
1620
0011
0010
10100
12 (1100)
Y
2048
0100
0000
00000
13 (1101)
Y 00000….0010 00000…00010 Data from memory 1440 - 1471 Data from memory loc. 1440loc. - 1471
4096
1000
0000
00000
14 (1110)
N
15 (1111)
N
5000
1001
1100
01000
Cache Memory We want to READ data from address 1600 = 0011|0010|00000 Offset
Doing Some Cache Action Set Address
V
Tag
Data (Always holds a 32-byte cache line.)
0 (0000)
N
1 (0001)
N
2 (0010)
Y 00000…….10 Data from memory 1060 - 1091 Y 00000….0011 Data from memory loc. 1600loc. - 1631
Add.
Tag
Set
256
0000
1000
00000
3 (0011)
N
512
0001
0000
00000
4 (0100)
N
1024
0010
0000
00000
5 (0101)
N
6 (0110)
N
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
N
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
1620
0011
0010
10100
12 (1100)
Y
00000….1001
Data from memory loc. 4992 - 5023
13 (1101)
Y
00000…00010
Data from memory loc. 1440 - 1471
2048
0100
0000
00000
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
Cache Memory We want to WRITE data to address 256 = 0000|1000|00000 Offset
Doing Some Cache Action Set Address
V
0 (0000)
N
1 (0001)
N
2 (0010)
Y
Tag
Data (Always holds a 32-byte cache line.)
Add.
Tag
Set
256
0000
1000
00000
3 (0011)
N
512
0001
0000
00000
4 (0100)
N
1024
0010
0000
00000
5 (0101)
N
1090
0010
0010
00010
6 (0110)
N
7 (0111)
N
1099
0010
0010
01011
8 (1000)
Y 00000….0000 Data from memory loc. 256 - 287 N
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
1620
0011
0010
10100
12 (1100)
Y
00000….1001
Data from memory loc. 4992 - 5023
2048
0100
0000
00000
13 (1101)
Y
00000…00010
Data from memory loc. 1440 - 1471
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
00000….0011
Data from memory loc. 1600 - 1631
Cache Memory We want to WRITE data to address 1620 = 0011|0010|10100 Offset
Doing Some Cache Action Set Address
V
Tag
Data (Always holds a 32-byte cache line.)
0 (0000)
N
1 (0001)
N
2 (0010)
Y 00000…….10 Data from memory 1060 - 1091 Y 00000….0011 Data from memory loc. 1600loc. - 1631
Add.
Tag
Set
256
0000
1000
00000
3 (0011)
N
512
0001
0000
00000
4 (0100)
N
1024
0010
0000
00000
5 (0101)
N
6 (0110)
N
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
Y
1440
0010
1101
00000
9 (1001)
N
1470
0010
1101
11110
10 (1010)
N
1600
0011
0010
00000
11 (1011)
N
1620
0011
0010
10100
12 (1100)
2048
0100
0000
00000
4096
1000
0000
00000
5000
1001
1100
01000
00000….0000
Data from memory loc. 256 - 287
Y
00000….1001
Data from memory loc. 4992 - 5023
13 (1101)
Y
00000…00010
Data from memory loc. 1440 - 1471
14 (1110)
N
15 (1111)
N
Cache Memory We want to WRITE data to address 1099 = 0010|0010|01011 Add.
Tag
Set
Offset
Doing Some Cache Action Set Address
V
Tag
Data (Always holds a 32-byte cache line.)
0 (0000)
N
1 (0001)
N
2 (0010)
Y 00000….0010 Data from memory loc. 1088loc. - 1119 Y 00000…00011 Data from memory 1600 - 1631
3 (0011)
N
4 (0100)
N
256
0000
1000
00000
512
0001
0000
00000
5 (0101)
N
1024
0010
0000
00000
6 (0110)
N
1090
0010
0010
00010
7 (0111)
N
1099
0010
0010
01011
8 (1000)
Y
1440
0010
1101
00000
9 (1001)
N
10 (1010)
N
11 (1011)
N
00000….0000
Data from memory loc. 256 - 287
1470
0010
1101
11110
1600
0011
0010
00000
12 (1100)
Y
00000….1001
Data from memory loc. 4992 - 5023
1620
0011
0010
10100
13 (1101)
Y
00000…00010
Data from memory loc. 1440 - 1471
2048
0100
0000
00000
14 (1110)
N
4096
1000
0000
00000
15 (1111)
N
5000
1001
1100
01000
Cache Memory What happens on a write? Write through —The information is written to both the block in the cache and to the block in the lower-level memory. Write back —The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty?
WT always combined with write buffers so that don’t wait for lower level memory
Cache Memory Write Buffer for Write Through Cache Processor
DRA M
Write Buffer
A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer; Memory controller: write contents of the buffer to memory.
Write buffer is just a FIFO: Typical number of entries: 4; Must handle bursts of writes;
Cache Memory
Write-miss Policy: Write Allocate vs . Not Allocate
Assume: a 16-bit (sub-block) write to memory location 0x0 and causes a miss. Do we allocate space in cache and possibly read in the block?
Yes: Write Allocate (Write back caches) No: Not Write Allocate (Write through)
Example: WriteMem[100] WriteMem[100] ReadMem[200] WriteMem[200] WriteMem[100] NWA: four misses and one hit WA: two misses and three hits
Pentium 4 Cache 80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions
Pentium III – L3 cache added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative
L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative
L3 cache on chip
Intel Cache Evolution Problem
Solution
Processor on which feature first appears
External memory slower than the system bus.
Add external cache using faster memory technology.
386
Increased processor speed results in external bus becoming a bottleneck for cache access.
Move external cache on-chip, operating at the same speed as the processor.
486
Internal cache is rather small, due to limited space on chip
Add external L2 cache using faster technology than main memory
486
Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.
Create separate data and instruction caches.
Pentium
Increased processor speed results in external bus becoming a bottleneck for L2 cache access.
Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.
Pentium Pro
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Move L2 cache on to the processor chip.
Pentium II
Add external L3 cache.
Pentium III
Move L3 cache on-chip.
Pentium 4
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory 5.8 Protection and Examples of Virtual Memory
Classifying Misses: 3 Cs
Compulsory —The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses . (Misses in even an Infinite Cache) Capacity —If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size X Cache) Conflict —If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses . (Misses in N-way Associative, Size X Cache)