Architecture

July 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Architecture as PDF for free.

More details

Words: 1,353
Pages: 5

Preview
Full text

Q1Suppose that a 2M x 16 main memory is built using 256K x 8 RAM chips and memory is word-addressable. a. How many RAM chips are necessary? b. How many RAM chips are there per memory word? c. How many address bits are needed for each RAM chip? d. How many address bits are needed for all of memory? a. 16 (8 rows of 2 columns) b. 2 c. 256K = 218, so 18 bits d. 2M = 221, so 21 bits Q5 The parameters of a hierarchical memory system are specified as follows: Main memory size = 8K blocks Cache memory size = 512 blocks Block size = 16 words Determine the size of the tag field under the following conditions: a. Fully associative mapping b. Direct mapping c. Set associative mapping with 16 blocks/set TAG SET/BLOCK a) 13 0 b) 4 9 c) 8 5

WORD

Q6What is the average access time of a system having three levels of memory hierarchy: a cache memory, a semiconductor main memory, and magnetic disk secondary memory. The access times of these memories are 20 ns, 200 ns, and 2 ms, respectively. The cache hit ratio is 80 per cent and the main memory hit ratio is 99 per cent. Q7Processor X has a clock speed of 1 GHz, and takes 1 cycle for integer operations, 2 cycles for memory operations, and 4 cycles for floating point operations. Empirical data shows that programs run on Processor X typically are composed of 35% floating point operations, 30% memory operations, and 35% integer operations. You are designing Processor Y, an improvement on Processor X which will run the same programs and you have 2 options to improve the performance: 1. Increase the clock speed to 1.2 GHz, but memory operations take 3 cycles 2. Decrease the clock speed to 900 MHz, but floating point operations only take 3 cycles Compute the speedup for both options and decide the option Processor Y should take. Answer: First, compute the CPI for each processor: Processor X: 1 * 0.35 + 2 * 0.30 + 4 * 0.35 = 0.35 + 0.60 + 1.40 = 2.35 cycles/instruction Y- Option 1: 1 * 0.35 + 3 * 0.30 + 4 * 0.35

4 4 4

= 0.35 + 0.90 + 1.40 = 2.65 cycles/instruction Y- Option 1: 1 * 0.35 + 2 * 0.30 + 3 * 0.35 = 0.35 + 0.60 + 1.05 = 2.00 cycles/instruction Next, compute how long it would take to execute an “average” instruction. This is done by dividing CPI (cycles/instruction) by clock rate (cycles/second) to give (seconds/instruction): Processor X: 2.35 / 1.0 = 2.35 ns/instruction Y- Option 1: 2.65 / 1.2 = 2.21 ns/instruction Y- Option 2: 2.00 / 0.9 = 2.22 ns/instruction Then, compute the speedups: Y- Option 1: 2.35/2.21= 1.063 Y- Option 2: 2.35/2.22= 1.059 So the speedups can either be phrased as “Option 1 is 6.3% faster than X, and Option 2 is 5.9% faster than X” or “Option 1 is 1.063 times faster than X, and Option 2 is 1.059 times faster than X”. Q9 A disk has a rotational speed of 6000 RPM, a seek time of 15ms, and negligible controller overhead. Each track has 256 sectors and each sector is 512B. The disk is connected to memory via an I/O bus capable of transferring 4MB/s data. The disk contains a cache to buffer in-flight data, and this cache allows the disk to overlap data transfer over the I/O bus with the next disk access. (a). (4 points) What is the maximum bandwidth of the disk? What is the minimum amount of time (in seconds) that a program could possibly scan 40MB of data transferred from the disk? Answer: In order to achieve maximum bandwidth, the disk must read sequential data from a track with no seek overhead or delay to wait for the proper sector. The disk can read one full track (128KB) in the time that it takes for one rotation (10 ms). 128KB divided by 10 ms = 12,800 KB/sec or 12.5 MB/sec. However, the I/O bus is the bottleneck in the system (as it only has a bandwidth of 4MB/sec bandwidth), resulting in minimum of 10 seconds to scan 40MBs. (b). (5 points) How long does it take to transfer 128KB data from disk to memory assuming the data is found sequentially on one track (assume the disk still must seek and rotate to find the start of the data)? 2 Answer: We are given that seek time (tseek) is 15ms. 6000RPM is equivalent to 100RPS, or 10ms per rotation. On average, the disk must wait for half of a rotation (trotation), or 5 ms. The transfer time is the time to read the whole track from the disk is 10ms. However, the transfer bandwidth is constrained by the I/O bus, so ttransfer is 31.25ms (a 128KB transfer at 4MB/s is 31.25 ms). Therefore, the total time for the disk to read 128KB of sequential data is tseek + trotation + ttransfer = 15ms + 5ms + 31.25ms =

51.25ms. (c). (5 points) How long does it take to transfer 128KB data from disk to memory assuming the data is found in sectors which are randomly scattered across the disk? Answer: As before, we are given that seek time (tseek) is 15ms. 6000RPM is equivalent to 100RPS, or 10ms per rotation. On average, the disk must wait for half of a rotation (trotation), or 5 ms. 128KB data needs 2(17−9) = 28, 256 sectors. The time to read a single sector (ttransfer) is 10ms * (1/256) = 0.04ms. The time for I/O bus to transfer a single sector is 512B / 4MB * 1s = 0.128ms. Although it takes longer time for I/O bus to transfer 512B, this extra time can be overlapped with the seek and rotation time for next sector. The total disk access time for first 255 sector is tseek + trotation + ttransfer = 15 + 5 + 0.04 = 20.04ms The total disk access time for last sector is tseek+trotation+ttransfer = 15+5+0.128 = 20.128ms Noticed, the last sector will take a little bit longer, since it can not overlap its I/O bus tranfer time with next seek and rotation time, the total time to move 128KB of randomly spaced data from the disk to memory is 20.04 * 255 + (15 + 5 + 0.128) = 5130.328ms. QFor this problem, assume that you have a processor with a cache connected to main memory via a bus. A successful cache access by the processor (a hit) takes 1 cycle. After an unsuccessful cache access (a miss), an entire cache block must be fetched from main memory over the bus. The fetch is not initiated until the cycle following the miss. A bus transaction consists of one cycle to send the address to memory, four cycles of idle time for main-memory access, and then one cycle to transfer each word in the block from main memory to the cache. Assume that the processor continues execution only after the last word of the block has arrived. In other words, if the block size is B words (at 32 bits/word), a cache miss will cost 1 + 1 + 4 + B cycles. The following table gives the average cache miss rates of a 1 Mbyte cache for various block sizes:

Write an expression for the average memory access time for a 1-Mbyte cache and a B-word block size (in terms of the miss ratio m and B).

Average access time = (1-m)(1 cycle) + (m)(6 + B cycles) = 1 + (m)(5+B) cycles What block size yields the best average memory access time?

If bus contention adds three cycles to the main-memory access time, which block size yields the best average memory access time?

If bus width is quadrupled to 128 bits, reducing the time spent in the transfer portion of a bus transaction to 25% of its previous value, what is the optimal block size? Assume that a minimum one transfer cycle is needed and don't include the contention cycles introduced in part (C).

Architecture

Overview

More details

Related Documents

Architecture

Architecture

Architecture

Architecture

Architecture

Zivah Architecture