CS433: Computer System Organization Main Memory Virtual Memory Translation Lookaside Buffer
Acknowledgement Slides
from previous CS433 semesters TLB figures taken from http://www.cs.binghamton.edu/~kang/cs350/Chap Memory
figures from http://larc.ee.nthu.edu. tw/~cthuang/courses/ee3450/lectures/flecture /F0713.gif
Topics Today
Main Memory (chap 5.8, 5.9)
Simple main memory Wider memory Interleaved memory Memory technologies
Virtual Memory (chap 5.10, 5.11)
Basics Address translation
Simple Main Memory
Consider a memory with these paremeters:
1 cycle to send an address 6 cycles to access each word 1 cycle to send word back to CPU/cache
What is the miss penalty for a 4-word block? (1 + 6 + 1) per word x 4 words = 32 cycles
Wider Main Memory
Make the memory wider
Higher bandwidth by reading more words at once
Same problem…
1 cycle to send an address 6 cycles to access each doubleword 1 cycle to send word back to CPU/cache Miss penalty: (1 + 6 + 1) x 2 doublewords = 16
Wider Main Memory
Make the memory wider
Higher bandwidth Read more words in parallel
Cost:
Wider bus Larger expansion size Minimum increment must correspond to width Double width minimum increment must be doubled
Interleaved Main Memory
Organize memory in banks
Subsequent words map to different banks Example: word A in bank (A mod M) Within a bank, word A in location (A div M)
Interleaved Main Memory
Same problem…
1 cycle to send an address 6 cycles to access each word 1 cycle to send word back to CPU/cache Miss penalty: 1 + 6 + 1 x 4 = 11 cycles
Memory Technologies Dynamic
Optimized for density, not speed
One transistor per cell
Cycle time about twice the access time
Random Access Memory (DRAM)
Destructive reads
Must refresh every few ms
Access every row
Memory Technologies Static
Optimized for speed, then density
Random Access Memory (SRAM)
About 4 – 6 transistors per cell Separate address pins
Static = no refresh Access time = cycle time
Memory Technologies ROM
Read-only 1 transistor per cell Nonvolatile
Flash
Allows memory to be modified Nonvolatile Slow write
Virtual Memory Motivation:
Give each running program its own private address
Restrict process from modifying other processes Want programs to be protected from each other bug in one program can’t corrupt memory in another program
Programs can run in any location in physical memory
Simplify loading the program
Virtual Memory Motivation:
Want programs running simultaneously to share underlying physical memory Want to use disk as another level in the memory hierarchy
Treat main memory as a cache for disk
Virtual Memory The
program sees virtual addresses Main memory sees physical addresses Need translation from virtual to physical
Virtual Memory vs Cache Virtual
Memory
Longer miss penalty Handled by OS Size dependent of physical address space
Cache
Handled by hardware Size independent of physical address space
Alias
2 virtual addresses map to the same physical address Consistency problems
Virtual Memory Block
replacement strategy
LRU
Write-back
strategy
With dirty bit Allows blocks to be written to disk when the block is replaced
Memory-Management Scheme Segmentation Paging Segmentation
+ Paging
Segmentation Supports user view of memory. Virtual memory is divided into variable length regions called segments Virtual address consists of a segment number and a segment offset
<segment-number, offset>
Fragmentation Memory
allocation is a dynamic process that uses a best fit or first fit algorithm Fragmentation is NOT segmentation
External Internal Data
Internal vs External
External fragmentation:
free space divided into many small pieces result of allocating and deallocating pieces of the storage space of many different sizes one may have plenty of free space, it may not be able to all used, or at least used as efficiently as one would like to Unused portion of main memory
Internal fragmentation:
result of reserving a piece of space without ever intending to use it Unused portion of page
Paging Virtual
memory is divided into fixed-size blocks called pages
typically a few kilobytes should be a natural unit of transfer to/from disk
Page
LRU, MRU, Clock… etc
Page
replacement placement
Fully associative - efficient
Paging Page
Identification
Virtual address is divided into
The virtual page number is translated into a physical page number Provides indirection
Indirection is always good
Translation cached in a buffer
Paging vs Segmentation
Paging:
Block replacement easy Fixed-length blocks
Segmentation:
Block replacement hard Variable-length blocks Need to find contiguous, variablesized, unused part of main memory
Paging vs Segmentation
Paging:
Invisible to application programmer No external fragmentation There is internal fragmentation Unused portion of page Units of code and data are broken up into separate pages
Segmentation:
Visible to application programmer No internal fragmentation Unused pieces of main memory There is external fragmentation Keeps blocks of code or data as single units
Segmentation + Paging Segmentation
is typically combined with
paging Paging is transparent to the programmer Segmentation is visible to the programmer Each segment is broken into fixed-size pages
Simplify page replacement Memory doesn’t have to be contiguous
Page Table Page
table is a collection of PTEs that maps a virtual page number to a PTE
PTEs = page table entry The page table tells where a particular virtual page address is stored on disk and in the main memory
Each
process has its own page table Size depends on # of pages, page size, and virtual address space
Page Table
virtual address is divided into
virtual page number page offset
Page Table
Page-table base-register tells where the page table starts Page offset directly maps to physical address page offset Reference bit is set when a page is accessed
Page Table
Each virtual memory reference can cause two physical memory accesses
One to fetch the page table One to fetch the data
Page Table
Address translation must be performed for every memory access
Need optimization!
TLB Problems:
Every virtual memory reference 2 physical memory accesses Every memory access -> address translation
To
overcome this problem a high-speed cache is set up for page table entries
Called a Translation Lookaside Buffer (TLB)
TLB The
Translation Lookaside Buffer
a small cache contains translations for the most recently used pages
The
processor consults the TLB first when accessing memory
Address Translation
Virtual page number (Tag, Index) Index tells which row to access
Address Translation
TLB is a cache Fully associative for efficiency
TLB Given
a virtual address, processor examines the TLB If page table entry is present (TLB hit)
the frame number is retrieved and the real address is formed
TLB If
page table entry is not found in the TLB (TLB miss)
Hardware checks page table and loads new Page Table Entry into TLB Hardware traps to OS, up to OS to decide what to do
OS knows which program caused the TLB fault the software handler will get the address mapping
TLB Miss If
page is in main memory (page hit) If the mapping is in page table, we add the entry to the TLB, evicting an old entry from the TLB
TLB If
page is on disk (page fault) load the page off the disk into a free block of memory update the process's page table
TLB
TLB