Ca 2 Marks & Big Ques.pdf

  • Uploaded by: sathya priya
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Ca 2 Marks & Big Ques.pdf as PDF for free.

More details

  • Words: 25,781
  • Pages: 96
CS8491-COMPUTER ARCHITECTURE QUESTION BANK( 2 Marks & 13 Marks)

CS8491

COMPUTER ARCHITECTURE

L T P C 3 0 03

OBJECTIVES: • To learn the basic structure and operations of a computer. • To learn the arithmetic and logic unit and implementation of fixed-point and floating point arithmetic unit. • To learn the basics of pipelined execution. • To understand parallelism and multi-core processors. • To understand the memory hierarchies, cache memories and virtual memories. • To learn the different ways of communication with I/O devices. UNIT I BASIC STRUCTURE OF A COMPUTER SYSTEM 9 Functional Units – Basic Operational Concepts – Performance – Instructions: Language of the Computer – Operations, Operands – Instruction representation – Logical operations – decision making – MIPS Addressing. UNIT II ARITHMETIC FOR COMPUTERS 9 Addition and Subtraction – Multiplication – Division – Floating Point Representation – Floating Point Operations – Subword Parallelism UNIT III PROCESSOR AND CONTROL UNIT 9 A Basic MIPS implementation – Building a Datapath – Control Implementation Scheme – Pipelining – Pipelined datapath and control – Handling Data Hazards & Control Hazards – Exceptions. UNIT IV PARALLELISIM

9

Parallel processing challenges – Flynn„s classification – SISD, MIMD, SIMD, SPMD, and Vector Architectures - Hardware multithreading – Multi-core processors and other Shared Memory Multiprocessors - Introduction to Graphics Processing Units, Clusters, Warehouse Scale Computers and other Message-Passing Multiprocessors. UNIT V MEMORY & I/O SYSTEMS

9

Memory Hierarchy - memory technologies – cache memory – measuring and improving cache performance – virtual memory, TLB„s – Accessing I/O Devices – Interrupts – Direct Memory Access – Bus structure – Bus operation – Arbitration – Interface circuits - USB. TOTAL : OUTCOMES: On Completion of the course, the students should be able to:  Understand the basics structure of computers, operations and instructions.  Design arithmetic and logic unit.

45 PERIODS

 Understand pipelined execution and design control unit.  Understand parallel processing architectures.  Understand the various memory systems and I/O communication. TEXT BOOKS: 1. David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Fifth Edition, Morgan Kaufmann / Elsevier, 2014. 2. Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian, Computer Organization and Embedded Systems, Sixth Edition, Tata McGraw Hill, 2012.

PART-A-TWO MARK QUESTIONS AND ANSWERS UNIT-I BASIC STRUCTURE OF A COMPUTER SYSTEM 1. Define Computer Architecture.  It is concerned with the structure and behavior of the computer.  It includes the information formats, the instruction set and techniques for addressing memory. 2. Define Computer Organization.  It describes the function and design of the various units of digital computer that store and process information.  It refers to the operational units and their interconnections that realize the architectural specifications. 3. What are the components of a computer.  Input unit  Memory unit  Arithmetic and Logic Unit  Output unit  Control unit 4. Draw the block diagram of computer.

5. What is Execution time/Response time? Response time also called execution time. The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on. 6. What is CPU execution time, user CPU time and system CPU time? CPU time : The actual time the CPU spends computing for a specific task. user CPU time: The CPU time spent in a program itself. system CPU time: The CPU time spent in the operating system performing tasks on behalf the program. 7.What is clock cycle and clock period? clock cycle :The time for one clock period, usually of the processor clock, which runs at a constant rate. clock period :The length of each clock cycle. 6. Define CPI The term Clock Cycles Per Instruction Which is the average number of clock cycles each instruction takes to execute, is often abbreviated as CPI. 7. State and explain the performance equation? N denotes number of machine Instructions, Suppose that the average number of basic steps needed to execute one machine instruction is S, where each basic step is completed in one clock cycle. If the clock cycle rate is R cycles per second, the processor time is given by T = (N x S) / R This is often referred to as the basic performance equation. 8. Define MIPS . MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second) MIPS=Instruction count/(Execution time x1000000).This MIPS measurement is also called Native MIPS to distinguish it from some alternative definitions of MIPS. 9. Define MIPS Rate: The rate at which the instructions are executed at a given time. 10. Define Throughput and Throughput rate.  Throughput -The total amount of work done in a given time.  Throughput rate-The rate at which the total amount of work done at a given time. 11. What are the various types of operations required for instructions? • Data transfers between the main memory and the CPU registers • Arithmetic and logic operation on data • Program sequencing and control • I/O transfers 12. What is a Program? A program is a set of instructions that specify the operations, operands and the sequence by which processing has to occur. 13. What is a Computer Instruction?

A Computer instruction is a binary code that specifies a sequence of micro operations for the computer. 14. What is a Instruction Code? An instruction code is a group of bits that instruct the computer to perform a specific operation. 15. What is a Operation Code (Opcode)? The operation code of an instruction is a group of bits that define operations as add, subtract, multiply, shift and complement etc. 16. Define Instruction Format. Instructions are represented as numbers .Therefore, entire programs can be stored in memory to be read or written just like numbers(data).Thus simplifies software/Hardware of computer systems. Each instruction is encoded in binary called machine code. 17. What are the Most Common Fields Of An Instruction Format?  An operation code field that specifies the operation to be performed.  An address field that designates, a memory address or register.  A mode field that specifies the way the operand or the effective address is determined 18. Explain the following the address instruction? • Three-address instruction-it can be represented as ADD A, B, C Operands a,b are called source operand and c is called destination operand. • Two-address instruction-it can be represented as ADD A, B • One address instruction-it can be represented as LOAD A ADD B STORE C 19. What is the straight-line sequencing? The CPU control circuitry automatically proceed to fetch and execute instruction, one at a time in the order of the increasing addresses. This is called straight line sequencing. 20.Wrie down the MIPS Assembly language notation for arithmetic operations. Category Arithmetic

Instruction add subtract add immediate

Example

Meaning

add $s1,$s2,$s3 $s1 = $s2 + $s3 sub $s1,$s2,$s3 $s1 = $s2 – $s3 addi $s1,$s2,20 $s1 = $s2 + 20

Comments Three register operands Three register operands Used to add constants

21.Wrie down the MIPS Assembly language notation for data transfer operations. Category

Data transfer

Instruction load word store word load half load half unsigned store half load byte load byte unsigned store byte load linked word

Example lw sw lh lhu sh lb lbu sb ll

$s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2) $s1,20($s2)

Meaning $s1 = Memor y[$s2 + 20] Memor y[$s2 + 20] = $s1 $s1 = Memor y[$s2 + 20] $s1 = Memor y[$s2 + 20] Memor y[$s2 + 20] = $s1 $s1 = Memor y[$s2 + 20] $s1 = Memor y[$s2 + 20] Memor y[$s2 + 20] = $s1 $s1 = Memor y[$s2 + 20]

Comments Word from memor y to register Word from register to memor y Halfword memor y to register Halfword memor y to register Halfword register to memor y Byte from memor y to register Byte from memor y to register Byte from register to memor y Load word as 1st half of atomic swap

store condition. wordupper immed. load

sc $s1,20($s2) lui $s1,20

Memory[$s2+20]=$s1;$s1=0 or 1 $s1 = 20 * 216

Store word as 2nd half of atomic swap Loads constant in upper 16 bits

22.Wrie down the MIPS Assembly language notation for Logical operations. Category

Logical

Instruction

Example

and or nor and immediate or immediate shift left logical

and or nor andi ori sll

shift right logical

srl

$s1,$s2,$s3 $s1,$s2,$s3 $s1,$s2,$s3 $s1,$s2,20 $s1,$s2,20 $s1,$s2,10

Meaning $s1 = $s1 = $s1 = $s1 = $s1 = $s1 =

$s2 & $s3 $s2 | $s3 ~ ($s2 | $s3) $s2 & 20 $s2 | 20 $s2 << 10

$s1,$s2,10 $s1 = $s2 >> 10

Comments Three reg. operands; bit-by-bit AND Three reg. operands; bit-by-bit OR Three reg. operands; bit-by-bit NOR Bit-by-bit AND reg with constant Bit-by-bit OR reg with constant Shift left by constant Shift right by constant

23.Wrie down the MIPS Assembly language notation for conditional branch operations. Category

Instruction

Example

Meaning

branch on equal

Conditional branch

if ($s1 == $s2) go to beq $s1,$s2,25 PC + 4 + 100 branch on not equal bne $s1,$s2,25 if ($s1!= $s2) go to PC + 4 + 100 set on less than slt $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0 set on less than sltu $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; unsigned else $s1 = 0 set less than if ($s2 < 20) $s1 = 1; slti $s1,$s2,20 immediate else $s1 = 0 set less than sltiu $s1,$s2,20 if ($s2 < 20) $s1 = 1; immediate unsigned else $s1 = 0

Comments Equal test; PC-relative branch Not equal test; PC-relative

Compare less than; for beq, bne Compare less than unsigned Compare less than constant Compare less than constant unsigned

24.Wrie down the MIPS Assembly language notation for Unconditional branch operations. Category

Instruction

jump Unconditional jump register jump jump and link

Example j 2500 jr $ra jal 2500

Meaning go to 10000 go to $ra $ra = PC + 4; go to 10000

Comments Jump to target address For switch, procedure return For procedure call

25. What is Addressing Modes? The different ways in which the location of an operand is specified in an instruction is called as Addressing mode. 26. What are the different types of addressing Modes?  Immediate addressing mode  Register addressing mode  Base or displacement addressing mode  PC-relative addressing mode  Pseudo-direct addressing mode 27.Define Register mode and Absolute Mode with examples. Register mode The operand is the contents of the processor register. The name (address) of the register is given in the instruction. Absolute Mode(Direct Mode):  The operand is in new location.

 The address of this location is given explicitly in the instruction. Eg: MOVE LOC,R2 The above instruction uses the register and absolute mode. The processor register is the temporary storage where the data in the register are accessed using register mode. The absolute mode can represent global variables in the program. Mode Register mode Absolute mode

Assembler Syntax Ri LOC

Addressing Function EA=Ri EA=LOC

Where EA-Effective Address 28.What is a Immediate addressing Mode? The operand is given explicitly in the instruction. Eg: Move 200 immediate ,R0 It places the value 200 in the register R0.The immediate mode used to specify the value of source operand. In assembly language, the immediate subscript is not appropriate so # symbol is used. It can be rewritten as Move #200,R0 Assembly Syntax: Addressing Function Immediate #value Operand =value

UNIT-2 ARITHMETIC FOR COMPUTERS 1. State the principle of operation of a carry look-ahead adder. The input carry needed by a stage is directly computed from carry signals obtained from all the preceding stages i-1,i-2,…..0, rather than waiting for normal carries to supply slowly from stage to stage. An adder that uses this principle is called carry look-ahead adder. 2. What are the main features of Booth’s algorithm? 1) It handles both positive and negative multipliers uniformly. 2) It achieves some efficiency in the number of addition required when the multiplier has a few large blocks of 1s. 3. How can we speed up the multiplication process?(CSE Nov/Dec 2003) There are two techniques to speed up the multiplication process: 1) The first technique guarantees that the maximum number of summands that must be added is n/2 for n-bit operands.

2) The second technique reduces the time needed to add the summands. 4. What is bit pair recoding? Give an example. Bit pair recoding halves the maximum number of summands. Group the Booth-recoded multiplier bits in pairs and observe the following: The pair (+1 -1) is equivalent to the pair (0 +1). That is instead of adding -1 times the multiplicand m at shift position i to +1 ( M at position i+1, the same result is obtained by adding +1 ( M at position i. Eg: 11010 – Bit Pair recoding value is 0 -1 -2 5. What is the advantage of using Booth algorithm? 1) It handles both positive and negative multiplier uniformly. 2) It achieves efficiency in the number of additions required when the multiplier has a few large blocks of 1‟s. 3) The speed gained by skipping 1‟s depends on the data. 6. Write the algorithm for restoring division. Do the following for n times: 1) Shift A and Q left one binary position. 2) Subtract M and A and place the answer back in A. 3) If the sign of A is 1, set q0 to 0 and add M back to A. Where A- Accumulator, M- Divisor, Q- Dividend. 7. Write the algorithm for non restoring division. Do the following for n times: Step 1: Do the following for n times: 1) If the sign of A is 0 , shift A and Q left one bit position and subtract M from A; otherwise , shift A and Q left and add M to A. 2) Now, if the sign of A is 0,set q0 to 1;otherwise , set q0 to0. Step 2: if the sign of A is 1, add M to A. 8. When can you say that a number is normalized? When the decimal point is placed to the right of the first (nonzero) significant digit, the number is said to be normalized. 9. Explain about the special values in floating point numbers. The end values 0 to 255 of the excess-127 exponent E( are used to represent special values such as: When E(= 0 and the mantissa fraction M is zero the value exact 0 is represented. When E(= 255 and M=0, the value ( is represented. When E(= 0 and M (0 , denormal values are represented. When E(= 2555 and M(0, the value represented is called Not a number. 10. Write the Add/subtract rule for floating point numbers. 1) Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to the difference in exponents. 2) Set the exponent of the result equal to the larger exponent. 3) Perform addition/subtraction on the mantissa and determine the sign of the result 4) Normalize the resulting value, if necessary. 11. Write the multiply rule for floating point numbers. 1) Add the exponent and subtract 127.

2) Multiply the mantissa and determine the sign of the result . 3) Normalize the resulting value , if necessary. 12. What is the purpose of guard bits used in floating point arithmetic Although the mantissa of initial operands are limited to 24 bits, it is important to retain extra bits, called as guard bits. 13. What are the ways to truncate the guard bits? There are several ways to truncate the guard bits: 1) Chooping 2) Von Neumann rounding 3) Rounding 14. Define carry save addition(CSA) process. Instead of letting the carries ripple along the rows, they can be saved and introduced into the next roe at the correct weighted position. Delay in CSA is less than delay through the ripple carry adder. 15. What are generate and propagate function? The generate function is given by Gi=xiyi and The propagate function is given as Pi=xi+yi. 16. What is floating point numbers? In some cases, the binary point is variable and is automatically adjusted as computation proceeds. In such case, the binary point is said to float and the numbers are called floating point numbers. 17. In floating point numbers when so you say that an underflow or overflow has occurred? In single precision numbers when an exponent is less than -126 then we say that an underflow has occurred. In single precision numbers when an exponent is less than +127 then we say that an overflow has occurred. 18. What are the difficulties faced when we use floating point arithmetic? Mantissa overflow: The addition of two mantissas of the same sign may result in a carryout of the most significant bit Mantissa underflow: In the process of aligning mantissas ,digits may flow off the right end of the mantissa. Exponent overflow: Exponent overflow occurs when a positive exponent exceeds the maximum possible value. Exponent underflow: It occurs when a negative exponent exceeds the maximum possible exponent value. 19.In conforming to the IEEE standard mention any four situations under which a processor sets exception flag. Underflow: If the number requires an exponent less than -126 or in a double precision, if the number requires an exponent less than -1022 to represent its normalized form the underflow occurs. Overflow: In a single precision, if the number requires an exponent greater than -127 or in a double precision, if the number requires an exponent greater than +1023 to represent its normalized form the underflow occurs. Divide by zero: It occurs when any number is divided by zero. Invalid: It occurs if operations such as 0/0 are attempted.

20. Why floating point number is more difficult to represent and process than integer?(CSE May/June 2007) An integer value requires only half the memory space as an equivalent.IEEEdouble-precision floatingpoint value. Applications that use only integer based arithmetic will therefore also have significantly smaller memory requirement A floating-point operation usually runs hundreds of times slower than an equivalent integer based arithmetic operation. 21.Give the booth’s recoding and bit-pair recoding of the computer. 1000111101000101(CSE May/June 2006) Booth‟s recoding 1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0 -1

0

0

+1

0

0

0

1

1

1

-1 +1 -1

0

0

+1 -1

0

0

0

+1 -1

Bit-Pair recoding: 1

0

0

0

1

0

1

1

0

1

0

-2 +1 0 -1 +1 0 +1 1 22.Draw the full adder circuit and give the truth table (CSE May/June 2007) Inputs A 0 0 0 0 1 1 1 1

B 0 0 1 1 0 0 1 1

C 0 1 0 1 0 1 0 1

Outputs Carry 0 0 0 1 0 1 1 1

Sum 0 1 1 0 1 0 0 1

23.What is a guard bit and what are the ways to truncate the guard bits? The first of two extra bits kept on the right during intermediate calculations of floating point numbers, called guard bit. There are several ways to truncate the guard bits  Chopping  Von-Neumann rounding  Rounding 24. Write the Add/subtract rule for floating point numbers.

 Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to the difference in exponents.  Set the exponent of the result equal to the larger exponent.  Perform addition/subtraction on the mantissa and determine the sign of the result



Normalize the resulting value, if necessary.

25.Why floating point number is more difficult to represent and process than integer? An integer value requires only half the memory space as an equivalent. IEEE double- precision floating point value. Applications that use only integer based arithmetic will therefore also have significantly smaller memory requirement A floating-point operation usually runs hundreds of times slower than an equivalent integer based arithmetic operation. 26.Define sub-word parallelism? A sub-word is a lower precision unit of data contained within a word. In sub-word parallelism, multiple sub-words are packed into a word and then process whole words. With the appropriate subword boundaries this technique results in parallel processing of sub-words. Since the same instruction is applied to all sub-words within the word, this is a form of SIMD (Single Instruction Multiple Data) processing. 27.Define ALU. Arithmetic and logic unit is the brain of the computer, thedevice that performs the arithmetic operations like addition, subtraction, and multiplication or logical operations like OR, AND, NOT and EX-OR. 28.What is arithmetic overflow? In a digital computer, the amount that a calculated value is greater than a given register or storage location can store or represent such condition is called arithmetic overflow. 29.What is floating point numbers? In floating point numbers when so you say that an underflow or overflow has occurred? In some cases, the binary point is variable and is automatically adjusted as computation proceeds. In such case, the binary point is said to float and the numbers are called floating point numbers. In single precision numbers when an exponent is less than -126 then we say that an underflow has occurred. In single precision numbers when an exponent is less than +127 then we say that an overflow has occurred. 30. How overflow occur in subtraction? Overflow occurs because computer arithmetic is not closed with respect to subtraction. Overflow cannot occur in subtraction if the operands have different (same) signs. To detect and compensate for overflow, one need n+1 bits, if a n-bit number representation is employed.

UNIT-3

PROCESSOR AND CONTROL UNIT

1. Define pipelining. What are the steps required for a pipelined processor to process the instruction? What are the advantages of Pipeline? Pipelining is a technique in which multiple instructions are overlapped in execution. It is a technique of decomposing a sequential process into sub operations with each sub process being executed in a special dedicated segment that operates concurrently with all other segments.

    

IF – Instruction Fetch ID – Instruction Decode EX – Execution or address calculation MEM – Data Memory Access WB – Write Back Advantages Main advantage of pipelining is that the cycle time of the processor is reduced and it can increase the instruction throughput.

2. What are Hazards? State different types of hazards that can occur in pipeline. A hazard is also called as hurdle. The situation that prevents the next instruction in the instruction stream from executing during its designated Clock cycle. Stall is introduced by hazard. (Ideal stage) The types of hazards that can occur in the pipelining were, 1. Data hazards. 2. Instruction hazards. 3. Structural hazards. 3. What is meant by branch prediction? Branch prediction is a method of resolving a branch hazard. It assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome. Dynamic branch prediction: Dynamic branch prediction is a prediction of branches at runtime using runtime information 4. What is Branch prediction buffer?  Branch prediction buffer also called branch history table.  It is a small memory that is indexed by lower portion of the address of the branch instruction.  It contains one or more bits indicating whether the branch was recently taken or not. 5. What is exception? Exception is an unscheduled event that disrupts program execution and used to detect overflow. Exceptions are created to handle unexpected events from within the processor. There are two kinds of exceptions are available in MIPS architecture such as  Execution of an undefined instruction  An arithmetic overflow exception 6. Define Sign extend. In the load and store or branch operations, the intermediate is added to a 32-bit operand (register or PC contents to create an operand or perhaps a new address, for instance in a branch predict instruction. Since the immediate may be + or -, then to odd it to a 32-bit argument, we must sign extend it to 32-bit. The sign extension unit performs this.

7. Define Data hazards A data hazard is any condition in which either the source or the destination operands of an instruction are not available at the time expected in pipeline. As a result some operation has

to be delayed, and the pipeline stalls. 8. Define Instruction hazards The pipeline may be stalled because of a delay in the availability of an instruction. For example, this may be a result of miss in cache, requiring the instruction to be fetched from the main memory. Such hazards are called as Instruction hazards or Control hazards. 9. Define Structural hazards? The structural hazards is the situation when two instructions require the use of a given hardware resource at the same time. The most common case in which this hazard may arise is access to memory. 10. What are the classification of data hazards? Classification of data hazard: A pair of instructions can produce data hazard by referring reading or writing the same memory location. Assume that i is executed before J. So, the hazards can be classified as, 1. RAW hazard 2. WAW hazard 3. WAR hazard 11. Define RAW hazard : ( read after write) Instruction „j‟ tries to read a source operand before instruction „i‟ writes it. 12. Define WAW hazard :( write after write) Instruction „j‟ tries to write a source operand before instruction „i‟ writes it. 13.Define WAR hazard :( write after read) Instruction „j‟ tries to write a source operand before instruction „i‟ reads it. 14. How data hazard can be prevented in pipelining? Data hazards in the instruction pipelining can prevented by the following techniques. a)Operand Forwarding b)Software Approach 15.How Compiler is used in Pipelining? A compiler translates a high level language program into a sequence of machine instructions. To reduce N, we need to have suitable machine instruction set and a compiler that makes good use of it. An optimizing compiler takes advantages of various features of the target processor to reduce the product N*S, which is the total number of clock cycles needed to execute a program. The number of cycles is dependent not only on the choice of instruction, but also on the order in which they appear in the program. The compiler may rearrange program instruction to achieve better performance of course, such changes must not affect of the result of the computation.

16. How addressing modes affect the instruction pipelining? Degradation of performance is an instruction pipeline may be due to address dependency where operand address cannot be calculated without available informatition needed by addressing mode for e.g. An instructions with register indirect mode cannot proceed to fetch the operand if the previous instructions is loading the address into the register. Hence operand access is delayed degrading the performance of pipeline.

17. What is locality of reference? Many instruction in localized area of the program are executed repeatedly during some time period and the remainder of the program is accessed relatively infrequently .this is referred as locality of reference. 18. What is the need for reduced instruction chip? • Relatively few instruction types and addressing modes. • Fixed and easily decoded instruction formats. • Fast single-cycle instruction execution. • Hardwired rather than micro programmed control 19. Define memory access time? The time that elapses between the initiation of an operation and completion of that operation ,for example ,the time between the READ and the MFC signals .This is Referred to as memory access time. 20. Define memory cycle time. The minimum time delay required between the initiations of two successive memory operations, for example, the time between two successive READ operations. 21.Define Static Memories. Memories that consist of circuits capable of retaining the state as long as power is applied are known as static memories. 22. List out Various branching technique used in micro program control unit? a) Bit-Oring b) Using Conditional Variable c) Wide Branch Addressing 23. How the interrupt is handled during exception? * CPU identifies source of interrupt * CPU obtains memory address of interrupt handles * pc and other CPU status information are saved * Pc is loaded with address of interrupt handler and handling program to handle it. 24. List out the methods used to improve system performance. The methods used to improve system performance are 1. Processor clock 2.Basic Performance Equation 3.Pipelining 4.Clock rate 5.Instruction set 6.Compiler

UNIT-4

PARALLELISM

1. What is meant by ILP? Pipelining exploits the potential parallelism among instructions. This parallelism is called instruction-level parallelism (ILP). There are two primary methods for increasing the potential amount of instruction-level parallelism. 1. Increasing the depth of the pipeline to overlap more instructions. 2. Multiple issue. 2. What is Multithreading? Multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion. To permit this sharing, the processor must duplicate the independent state of each thread. The two main approaches to multithreading are  Fine-grained multithreading Fine grained multithreading is a version of hardware multithreading that implies switching between threads after every instruction.  Coarse-grained multithreading It switches threads only on costly stalls. Thus it is much less likely to slow down the execution an individual thread.

3. What is meant by speculation? One of the most important methods for finding and exploiting more ILP is speculation. It is an approach whereby the compiler or processor guesses the outcome of an instruction to remove it as dependence in executing other instructions. For example, we might speculate on the outcome of a branch, so that instructions after the branch could be executed earlier. 4. Define Superscalar processor. Superscalar is an advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution. Dynamic multiple-issue processors are also known as superscalar processors, or simply superscalars. 5. Differentiate UMA from NUMA. Uniform memory access (UMA) is a multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access. Non uniform memory access (NUMA) is a type of single address space multiprocessor in which some memory accesses are much faster than others depending on which processor asks for which word. 6. What is strong and weak Scaling? There are two basis ways to measure the parallel performance of a given application, depending on whether or not one is CPU-bound or memory bound. These are referred to a strong and wek scaling respectively. Strong scaling problem size stays fixed but the number of processing elements are increased. This is used as justification for programs that take a long time to run. Weak scaling the problem size assigned to each processing element stays constant and additional elements are used to solve a larger total problem. Therefore, this type of measurement is justification for programs that take a lot of memory or other system resources.

7.

Explain various types of Dependences in ILP.  Data Dependences  Name Dependences  Control Dependences 8. What are multiprocessors? Mention the categories of multiprocessors? Multiprocessor are used to increase performance and improve availability. The different categories are SISD, SIMD, MIMD. 9. What are two main approaches to multithreading?  Fine-grained multithreading  Coarse-grained multithreading 9. a. What is the need to use multiprocessors? 1. Microprocessors as the fastest CPUs Collecting several much easier than redesigning 1 2. Complexity of current microprocessors Do we have enough ideas to sustain 1.5X/yr? Can we deliver such complexity on schedule? 3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS) 4. Emergence of embedded and server markets driving microprocessors in addition to desktops Embedded functional parallelism, producer/consumer model 5. Server figure of merit is tasks per hour vs. latency. 10. Write the advantages of Multithreading. If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed. If a thread cannot use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle. If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values. 11. Write the disadvantages of Multithreading. Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multi processing. 12. What is CMT? Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from

a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become prohibitive in all but water-cooled systems. 13. What is SMT? Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. 14. Write the advantages of CMP? CMPs have several advantages over single processor solutions energy and silicon area efficiency i. By Incorporating smaller less complex cores onto a single chip ii. Dynamically switching between cores and powering down unused cores iii. Increased throughput performance by exploiting parallelism iv.Multiple computing resources can take better advantage of instruction, thread, and process level 15. What are the Disadvantages of SMT? Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. 16. What are the types of Multithreading?  Block multi-threading  Interleaved multi-threading 17. What Thread-level parallelism (TLP)?  Explicit parallel programs already have TLP (inherent)  Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in hardware. 18. List the major MIMD Styles Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor") Decentralized memory (memory module CPU) get more memory bandwidth, lower memory Drawback: Longer communication latency Drawback: Software model more complex 19. Distinguish between shared memory multiprocessor and message-passing multiprocessor.  A multiprocessor with a shared address space, that address space can be used to communicate data implicitly via load and store operations is shared memory multiprocessor.  A multiprocessor with a multiple address space, communication of data is done by explicitly passing message among processor is message-passing multiprocessor. 20. Draw the basic structure of Basic Structure of a Symmetric Shared Memory Multiprocessor

21. What is multicore'? At its simplest, multi-core is a design in which a single physical processor contains the core logic of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several such processor "cores" and packages them as a single physical processor.The goal of this design is to enable a system to run more tasks simultaneously and thereby achieve greater overall system performance. 22. Write the software implications of a multicore processor? Multi-core systems will deliver benefits to all software, but especially multi-threaded programs.All code that supports HT Technology or multiple processors, for example, will benefit automatically from multicore processors, without need for modification. Most server-side enterprise packages and many desktop productivity tools fall into this category. 23. What is coarse grained multithreading? It switches threads only on costly stalls. Thus it is much less likely to slow down the execution an individual thread.

UNIT-5 MEMORY AND I/O SYSTEMS 1. What are the multimedia applications which use caches? Some Multimedia application areas where cache is extensively used are *Multimedia Entertainment *Education *Office Systems *Audio and video Mail *Computer Architecture - Set 6 2. Explain virtual memory technique. Techniques that automatically move program and data blocks into the physical memory when they are required for execution are called virtual memory technique 3. What are virtual and logical addresses? The binary addresses that the processor issues for either instruction or data are called virtual or logical addresses. 4. Define translation buffer. Most commercial virtual memory systems incorporate a mechanism that can avoid the bulk of the main memory access called for by the virtual to physical addresses translation buffer. This may be done with a cache memory called a translation buffer. 5. What is branch delay slot? The location containing an instruction that may be fetched and then discarded because of the branch is called branch delay slot. 6. What is optical memory? Optical or light based techniques for data storage, such memories usually employ optical disk which resemble magnetic disk in that they store binary information in concentric tracks on an electromechanically rotated disks. The information is read as or written optically, however with a laser replacing the read write arm of a magnetic disk drive. Optical memory offer high storage capacities but their access rate is are generally less than those of magnetic disk. 7. What are static and dynamic memories? Static memory are memories which require periodic no refreshing. Dynamic memories are memories, which require periodic refreshing. 8. What are the components of memory management unit? A facility for dynamic storage relocation that maps logical memory references into physical memory addresses. A provision for sharing common programs stored in memory by different users . 9. What is the role of MAR and MDR? The MAR (memory address register) is used to hold the address of the location to or from which data are to be transferred and the MDR(memory data register) contains the data to be written into or read out of the addressed location. 10. Distinguish Between Static RAM and Dynamic RAM? Static RAM are fast, but they come at high cost because their cells require several transistors. Less expensive RAM can be implemented if simpler cells are used. However such cells do not retain their state indefinitely; Hence they are called Dynamic RAM.

11. Distiguish between asynchronies DRAM and synchronous RAM. The specialized memory controller circuit provides the necessary control signals, RAS And CAS ,that govern the timing. The processor must take into account the delay in the response of the memory. Such memories are referred to as asynchronous DRAMS.The DRAM whose operations is directly synchronized with a clock signal. Such Memories are known as synchronous DRAM. 12. What do you mean associative mapping technique? The tag of an address received from the CPU is compared to the tag bits of each block of the cache to see if the desired block is present. This is called associative mapping technique. 13. What is SCSI? Small computer system interface can be used for all kinds of devices including RAID storage subsystems and optical disks for large- volume storage applications. 14. What are the two types of latencies associated with storage? The latency associated with storage is divided into 2 categories 1. Seek Latencies which can be classified into Overlapped seek,Mid transfer seek and Elevator seek 2. Rotational Latencies which can be reduced either by Zero latency read or Write and Interleave factor. 15. What do you mean by Disk Spanning? Disk spanning is a method of attaching drives to a single host uadapter. All drives appear as a single contiguous logical unit. Data is written to the first drive first and when the drive is full, the controller switches to the second drive, then the second drive writes until its full. 16. What is SCSI? Small computer system interface can be used for all kinds of devices including RAID storage subsystems and optical disks for large- volume storage applications. 17. Define the term RELIABILITY “Means feature that help to avoid and detect such faults. A realible system does not silently continue and delivery result that include interrected and corrupted data, instead it corrects the corruption when possible or else stops 18.Define the term AVAILABLITY: “Means features that follow the systerm to stay operational even offen faults do occur. A highly available systerm could dis able do the main functioning portion and continue operating at the reduced capacity” 19. How the interrupt is handled during exception? * cpu identifies source of interrupt * cpu obtains memory address of interrupt handles * pc and other cpu status information are saved * Pc is loaded with address of interrupt handler and handling program to handle it 20. What is IO mapped input output? A memory reference instruction activated the READ M (or)WRITE M control line and does not affect the IO device. Separate IO instruction are required to activate the READ IO and WRITE IO lines ,which cause a word to be transferred between the address aio port and the CPU. The memory and IO address space are kept separate. 21.Specify the three types of the DMA transfer techniques? --Single transfer mode(cyclestealing mode)

--Block Transfer Mode(Brust Mode) --Demand Transfer Mode --Cascade Mode 22. What is an interrupt? An interrupt is an event that causes the execution of one program to be suspended and another program to be executed. 23.What are the uses of interrupts? *Recovery from errors *Debugging *Communication between programs *Use of interrupts in operating system 24. Define vectored interrupts. In order to reduce the overhead involved in the polling process, a device requesting an interrupt may identify itself directly to the CPU. Then, the CPU can immediately start executing the corresponding interrupt-service routine. The term vectored interrupts refers to all interrupthandling schemes base on this approach. 25. Name any three of the standard I/O interface. *SCSI (small computer system interface),bus standards *Back plane bus standards *IEEE 796 bus (multibus signals) *NUBUS & IEEE 488 bus standard 26. What is an I/O channel? An i/o channel is actually a special purpose processor, also called peripheral processor.The main processor initiates a transfer by passing the required information in the input output channel. the channel then takes over and controls the actual transfer of data. 27. What is a bus? A collection of wires that connects several devices is called a bus. 28. Define word length? Each group of n bits is referred to as a word of information and n is called the word length. 29. Why program controlled I/O is unsuitable for high-speed data transfer? In program controlled i/o considerable overhead is incurred..because several program instruction have to be executed for each data word transferred between the external devices and MM.Many high speed peripheral; devices have a synchronous modes of operation.that is data transfer are controlled by a clock of fixed frequency, independent of the cpu. 30. What is the function of i/o interface? The function is to coordinate the transfer of data between the cpu and external devices. 31. What is NUBUS? A NUBUS is a processor independent, synchronous bus standard intended for use in 32 bitmicro processor system. It defines a backplane into which upto 16 devices may be plugged each in the form of circuit board of standard dimensions.

32. Name some of the IO devices. *Video terminals *Video displays *Alphanumeric displays *Graphics displays * Flat panel displays *Printers *Plotters 33. What are the steps taken when an interrupt occurs? *Source of the interrupt *The memory address of the required ISP * The program counter &cpu information saved in subroutine *Transfer control back to the interrupted program 34. Define interface. The word interface refers to the boundary between two circuits or devices 35. What is programmed I/O? Data transfer to and from peripherals may be handled using this mode. Programmed I/O operations are the result of I/O instructions written in the computer program. 36. Define Synchronous bus. - Synchronous bus on other hand contains synchronous clock that is used to validate each and every signal. - Synchronous buses are affected noise only when clock signal occurs. - Synchronous bus designers must control with meta stability when attempting different clock signal Frequencies - Synchronous bus of meta stability arises in any flip flop. when time will be violated. 37. Define Asynchronous bus. - Asynchronous buses can mistake noise pulses at any time for valid handshake signal. - Asynchronous bus designer must deal with events that like synchronously. - It must contend with meta stability when events that drive bus transaction. -When flip flop experiences effects can occur in downstream circuitry unless proper design technique which are used 38. Differentiate Programmed I/O and Interrupt I/O. Programmed I/O is the process of I/O instruction written in computer program. In programmed I/O technique to transfer data required constant monitoring on peripheral by CPU once data transfer is initiated, CPU have to wait for next transfer. Interrupt I/O is done by using interrupt and some special command. In interrupt initiated I/O once data transfer initiated, CPU execute next program without wasting time and interface keep monitoring the device. When the interface determines that the device is ready to transfer data, it generates an interrupt request, CPU stop the current task, execute the transferring process and then return to the previous on processing task. 39. Define Memory Interleaving. In computing, interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM) or core memory, by spreading memory addresses evenly across memory banks.

40. What is the purpose of Dirty/Modified bit in cache memory? Dirty bit set indicates that the associated cache line has been changed since it was read from memory(“dirty”), meaning that the processor has written data to that line and the new value has not propagated all the way to the main memory. 41. Define Memory Hierarchy. What is the need to implement memory as a hierarchy? In computer architecture the memory hierarchy is a concept used for storing & discussing performance issues in computer architectural design, algorithm prediction, and the lower level programming constructs such as involving locality of reference. The memory hierarchy In computer storage distinguishes each level in the hierarchy by response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Computer pioneers correctly predicted that programmers would want unlimited amounts of fast memory. An economical solution to that desire is a memory hierarchy which take advantage of locality and cost 42. What are the various memory technologies? There are four primary technologies used in memory hierarchies  SRAM technology(main memory)  DRAM technology(main memory)  Flash Semiconductor technology  Magnetic disks. 43. Define Hit ratio. Hit ratio is the fraction of memory accesses found in the upper level. It is often used as a measure of the performance of the memory hierarchy. 44. Define Virtual memory. State the advantages of Virtual memory. Techniques that automatically move program and daa blocks into physical memory when they are required for execution are called virtual memory technique. Advantages  It can eliminate external fragmentation  Pages are mapped appropriately any way  It has more efficient swapping  Allocating memory is easy and cheap. 45. Define cache and cache miss. Cache was the name chosen to represent the level of the memory hierarchy between the processor and main memory. Cache miss is a request for data from the cache that cannot be filled because the data is not present in the cache.

PART-B

1. Explain various Memory Technologies There are four primary technologies used in the memory hierarchy. 1. Main memory is implemented by DRAM (Dynamic Random Access Memory) 2. Upper level cache memory (Closer to processor) is implemented by SRAM (Static Random Access Memory) 3. Flash memory with nonvolatile 4. Lowest level in memory hierarchy with slow and large storage called Magnetic Disk Static Memories Memory circuit which is capable of retaining their state as long as power is applied knows as static memory.

   

Two transistor inverters are cross connected to implement a basic flip-flop. The cell is connected to one word line and two bits lines by transistors T1 and T2. When word line is at ground level, the transistors are turned off and the latch retains its state. Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b‟.  Write operation – the state of cell is set by placing the appropriate bit (b or b‟) and then activating the word line. Static RAMs (SRAMs):  Consist of circuits that are capable of retaining their state as long as the power is applied.  Volatile memories, because their contents are lost when power is interrupted.  Access times of static RAMs are in the range of few nanoseconds.  However, the cost is usually high. Advantage of SRAM  SRAM can be accessed quickly  Access times of just few nanoseconds. Disadvantage of SRAM  High cost  Volatile memory Dynamic memory system

Large dynamic memory systems can be implemented using DRAM chips in a similar way to static memory systems. Placing large memory systems directly on the motherboard will occupy a large amount of space. Also, this arrangement is inflexible since the memory system cannot be expanded easily. Packaging considerations have led to the development of larger memory units known as SIMMs (Single In-line Memory Modules) and DIMMs (Dual In-line Memory Modules). Memory modules are an assembly of memory chips on a small board that plugs vertically onto a single socket on the motherboard.  Occupy less space on the motherboard.  Allows for easy expansion by replacement. Dynamic RAMs (DRAMs):  Do not retain their state indefinitely.  Contents must be periodically refreshed.  Contents may be refreshed while accessing them for reading.  Less expensive RAMS can be implemented if simpler cells are used. Memory Technology Typical Access Time Price Per Bit in 2012 SRAM Semiconductor Memory 0.5 – 2.5 ns $500 - $1000 DRAM Semiconductor Memory 50 – 70 ns $10 - $20 Flash Semiconductor Memory 5000 – 50,000 ns $0.75 - $1.0 Magnetic Disk 5,000,000 – 20,000,000 ns $0.05 - $0.10 During Read operation the information stored in the cells selected by the word line and transmit this information to the output data line.

Example for 16 X 8 Organization i.e. 16 wordline of 8 bit each During write operation the sense/write circuit receives input information and store it in the cells of selected word line. There are two control signals (R/W and CS) R/W – Read/Write CS – Chip Select The circuit can store 128-bits and require 14 external connections for address, data and control lines and also need two lines for power supply and ground connection.

The below table shows that, how the density, cost, and access time of DRAMs have changed over the years. Synchronous DRAMs  Operation is directly synchronized with processor clock signal.  The outputs of the sense circuits are connected to a latch.  During a Read operation, the contents of the cells in a row are loaded onto the latches.  During a refresh operation, the contents of the cells are refreshed without changing the contents of the latches.  Data held in the latches correspond to the selected columns are transferred to the output.  For a burst mode of operation, successive columns are selected using column address counter and clock.  CAS signal need not be generated externally. A new data is placed during raising edge of the clock.

Timing diagram

 Row address is latched under the control of RAS signals and the memory takes 2 or 3 cycle to activate the selected row.  Column address is latched under the control of CAS signals and after a delay of „1‟ clock cycle, the first set of data bit is places on the data lines.  The SDRAM automatically increment the column address to address to access the next three set of bits in the selected row which are placed on the data lines in the next „3‟ clock cycle.

 SDRAM have build-in refresh circuitry. This circuit behaves as a refresh counter, which provides the address of the rows that are selected for refreshing. Each row must be refreshed at least every 64-ms  SDRAM can be used with clock speed above 100MHZ Latency, Bandwidth, and DDRSDRAMs Memory latency is the time it takes to transfer a word of data to or from memory. Memory bandwidth is the number of bits or bytes that can be transferred in one second. DDRSDRAMs (Double Data Rate Synchronous Dynamic RAM) SDRAM performs all actions on the rising edge of the clock signals. The faster version is called DDRSDRAM to perform data transfer on the both edges of the clock i.e. rising and falling edge. Their bandwidth is essentially doubled for long burst transfer but latency of these devices is similar to that of the SDRAM. The cell array is organized in two banks. Each bank can be accessed separately. Rambus memory A very wide bus is expensive and requires a lot of space on a motherboard. An alternative approach is to implement a narrow bus that is much faster. This approach (was used by Rambus Inc.) is called Rambus. Key features Fast signaling method used to transfer information between chips. Advantage  Rambus channel – provide a complete specification for the design of such communication link.  Rambus allow a clock frequency of 400MHZ.  If data transfer on both edges of the clock then data transfer rate is 800MHZ.  Rambus use multiple banks of cell array to access more than one word at a time such memory chip is called RDRAM (Rambus DRAM).  RDRAM chip can be assembled into the large modules called RIMM (Rambus in-line Memory Module.  It uses master-slave relationship. Master is a processor and slave is a RDRAM Flash memory:  Has similar approach to EEPROM.  One difference  EEPROM – It is possible to read and write the content of a single cell  Flash memory - It is possible to read the content of a single cell but only possible to write an entire block of cells.  Flash devices have greater density.  Higher capacity and low storage cost per bit.  Power consumption of flash memory is very low, making it attractive for use in equipment that is battery-driven.  Single flash chips are not sufficiently large, so larger memory modules are implemented using flash cards and flash drives. Flash card – Large module is mount flash chip on a small card. Card is simply plug-in to accessible slot. These cards also vary in memory size. Flash drives – similar to hard disk, different is hard disk can store many Gigabytes (GB). But flash drivers can store less than one GB. Hard disk provides an extremely low cost per bit. But flash drivers are higher cost per bit.

Magnetic Disk The magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400 to 15,000 revolutions per minute. The metal platters are covered with magnetic recording material on both sides, similar to the material found on a cassette or videotape. To read and write information on a hard disk, a movable arm containing a small electromagnetic coil called a read-write head is located just above each surface. The entire drive is permanently sealed to control the environment inside the drive, which, in turn, allows the disk heads to be much closer to the drive surface.

Each disk surface is divided into concentric circles, called tracks. There are typically tens of thousands of tracks per surface. Each track is in turn divided into sectors that contain the information; each track may have thousands of sectors. Sectors are typically 512 to 4096 bytes in size. The sequence recorded on the magnetic media is a sector number, a gap, the information for that sector including error correction code, a gap, the sector number of the next sector, and so on. The disk heads for each surface are connected together and move in conjunction, so that every head is over the same track of every surface. The term cylinder is used to refer to all the tracks under the heads at a given point on all surfaces.

Seek time – time required to move the read/write head to the proper track. Rotational delay/Rotational Latency time – the amount of time it takes for the desired sector of a disk (i.e., the sector from which data is to be read or written) to rotate under the read-write heads of the disk drive. This is the time for half a rotation of the disk.

Disk access time - The sum of seek time and latency time is called disk access time. 2. Discuss in detail about Cache Memory. Explain mapping functions in cache memory to determine how memory blocks are placed in cache. Processor is much faster than the main memory  As a result, the processor has to spend much of its time waiting while instructions and data are being fetched from the main memory.  To achieving good performance Speed of the main memory cannot be increased beyond a certain point. Cache memory is an architectural arrangement which makes the main memory appear faster to the processor than it really is. Cache memory is based on the property of computer programs known as “locality of reference”. Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while the others are accessed relatively less frequently. These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. This is called “locality of reference”. Temporal locality of reference:  Recently executed instruction is likely to be executed again very soon. Spatial locality of reference:  Instructions with addresses close to a recently instruction are likely to be executed soon.



Processor issues a Read request; a block of words is transferred from the main memory to the cache, one word at a time.  Subsequent references to the data in this block of words are found in the cache.  At any given time, only some blocks in the main memory are held in the cache. Which blocks in the main memory are in the cache is determined by a “mapping function”.  When the cache is full, and a block of words needs to be transferred from the main memory, some block of words in the cache must be replaced. This is determined by a “replacement algorithm”. Cache Hit Existence of a cache is transparent to the processor. The processor issues Read and Write requests in the same manner. If the data is in the cache it is called a Read or Write hit.  Read hit:  The data is obtained from the cache.  Write hit:  Cache has a replica of the contents of the main memory.  Contents of the cache and the main memory may be updated simultaneously. This is the write-through protocol.

 Update the contents of the cache, and mark it as updated by setting a bit known as the dirty bit or modified bit. The contents of the main memory are updated when this block is replaced. This is write-back or copy-back protocol. Cache Miss If the data is not present in the cache, then a Read miss or Write miss occurs. Read miss:  Block of words containing this requested word is transferred from the memory.  After the block is transferred, the desired word is forwarded to the processor.  The desired word may also be forwarded to the processor as soon as it is transferred without waiting for the entire block to be transferred. This is called load-through or early-restart. Write-miss:  Write-through protocol is used, and then the contents of the main memory are updated directly.  If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desired word is overwritten with new information. Cache Coherence Problem  The copies of the data in the cache and the main memory are different. This is called the cache coherence problem.  One option is to force a write-back before the main memory is updated from the disk. Mapping functions Mapping functions determine how memory blocks are placed in the cache.  A simple processor example:  Cache consisting of 128 blocks of 16 words each.  Total size of cache is 2048 (2K) words.  Main memory is addressable by a 16-bit address.  Main memory has 64K words.  Main memory has 4K blocks of 16 words each.  Three mapping functions:  Direct mapping  Associative mapping  Set-associative mapping. Direct mapping Block j of the main memory maps to j modulo 128 of the cache. i.e. .0 maps to 0, 129 maps to 1 as so on. More than one memory block is mapped onto the same position in the cache. This may lead to contention for cache blocks even if the cache is not full. Resolve the contention by allowing new block to replace the old block, leading to a trivial replacement algorithm. Memory address is divided into three fields:  Low order 4 bits determine one of the 16 words in a block.  When a new block is brought into the cache, the next 7 bits determine which cache block this new block is placed in.  High order 5 bits determine which of the possible 32 blocks is currently present in the cache. These are tag bits. This mapping is simple to implement, but not very flexible.

Direct Mapping Associative mapping Main memory block can be placed into any cache position.

Associative mapping Memory address is divided into two fields:  Low order 4 bits identify the word within a block.  High order 12 bits or tag bits identify a memory block when it is resident in the cache.  Flexible, and uses cache space efficiently. Replacement algorithms can be used to replace an existing block in the cache when the cache is full. Cost is higher than direct-mapped cache because of the need to search all 128 patterns to determine whether a given block is in the cache. Set-Associative mapping Blocks of cache are grouped into sets. The mapping function allows a block of the main memory to reside in any block of a specific set. Divide the cache into 64 sets, with two blocks per set. Memory block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.

Set Associative mapping Memory address is divided into three fields:  6 bit field determines the set number.  High order 6 bit fields are compared to the tag fields of the two blocks in a set. Set-associative mapping is a combination of direct and associative mapping.  Various replacement algorithm‟s are used  First-in-first-out replacement algorithm  Random replacement algorithm  Least-recently used replacement algorithm  Most-recently used replacement algorithm 3. Explain in detail about Virtual memories. When a program does not completely fit into the main memory the parts of it not currently being executed are stored in secondary memory and transferred to main memory when it is required. Here, the virtual memory technique is used to extend the apparent size of the physical memory. It uses the secondary storage such as disks to extend the apparent size of the physical memory and automatically transfer the program and data block into the physical memory when they are required file execution is called virtual memory techniques. Virtual memory organization The segments which are currently being executed are kept in main memory and remaining segments are stored in secondary storage such as magnetic disk. If the required segments which are not in main memory, the executed segments is replaced by the required segments. In modern computer OS perform these task. Processor generates virtual or logical address. Logical address Translate by using Physical address hardware or software Virtual addresses are translated into physical addresses by a combination of hardware and software (MMU) subsystems.  If virtual address refers to a part of the program that is currently in the main memory, it is accessed immediately.  If the address refers to a part of the program that is not currently in the main memory, it is first transferred to the main memory before it can be used.

Memory management unit (MMU) translates virtual addresses into physical addresses. If the desired data or instructions are in the main memory they are fetched as described previously. If the desired data or instructions are not in the main memory, they must be transferred from secondary storage to the main memory. MMU causes the operating system to bring the data from the secondary storage into the main memory. Address translation All program and data are composed of fixed length called pages. The pages commonly range from 2K to 16K bytes in length. Whenever the page translation mechanism determines the pages are moved between the main memory and disk i.e. swapping is required. Cache bridges the speed gap between the processor and main memory and implemented in hardware. Virtual memory mechanism bridges the size and speed gaps between main memory and secondary memory. Each virtual address generated by the processor, whether it is for an instruction fetch or an operand fetch/store operations, is interpreted as a virtual page number (high order) followed by offset (low-order). Page table Information about the main memory location of each page. Information Main memory addresses where the page is stored and the status of the page. Page frame An area in the main memory that can hold one page. Page table base register The starting address of the page table is kept in a page table in base register. Address of the corresponding entry in the page table By adding the virtual page number and the content of the page table base register

Control Bits: It gives the status of the page. Valid Bits: - Whether the page is actually loaded in main memory. Dirty Bits: - Whether page is modified or not. Access Bits: - Give the permission for that page.(ie) Read /Write permissions etc…. Translation Look-aside Buffer (TLB) A small cache called as Translation Look-aside Buffer (TLB) is included in the MMU.  TLB holds page table entries of the most recently accessed pages.



  

In addition to the above for each page, TLB must hold the virtual page number for each page. High-order bits of the virtual address generated by the processor select the virtual page. These bits are compared to the virtual page numbers in the TLB. If there is a match, a hit occurs and the corresponding address of the page frame is read. If there is no match, a miss occurs and the page table within the main memory must be consulted. Set-associative mapped TLBs are found in commercial processors Upon detecting a page fault by the MMU, following actions occur:  MMU asks the operating system to intervene by raising an exception.  Processing of the active task which caused the page fault is interrupted.  Control is transferred to the operating system.  Operating system copies the requested page from secondary storage to the main memory.  Once the page is copied, control is returned to the task which was interrupted. Servicing of a page fault requires transferring the requested page from secondary storage to the main memory. This transfer may incur a long delay. While the page is being transferred the operating system may:  Suspend the execution of the task that caused the page fault.

 Begin execution of another task whose pages are in the main memory.  Enables efficient use of the processor.  interrupt will be accepted. 4. Discuss DMA controller with block diagram. Direct Memory Access (DMA)  A special control unit may be provided to transfer a block of data directly between an I/O device and the main memory, without continuous intervention by the processor.  Control unit which performs these transfers is a part of the I/O device‟s interface circuit. This control unit is called as a DMA controller.  DMA controller performs functions that would be normally carried out by the processor:  For each word, it provides the memory address and all the control signals.  To transfer a block of data, it increments the memory addresses and keeps track of the number of transfers.  DMA controller can transfer a block of data from an external device to the processor, without any intervention from the processor.  However, the operation of the DMA controller must be under the control of a program executed by the processor. That is, the processor must initiate the DMA transfer.  To initiate the DMA transfer, the processor informs the DMA controller of:  Starting address,  Number of words in the block.  Direction of transfer (I/O device to the memory, or memory to the I/O device).  Once the DMA controller completes the DMA transfer, it informs the processor by raising an interrupt signal.  Two register are used to store starting address and the word count. Status and control flag Register  R/W – Direction of transfer  Read – transfers data from memory to I/O devices  Done flag is „1‟ - DMA completes data transfer  Causes controller to raise an interrupt – IE „1‟ and IRQ – request an interrupt

DMA controller connects a high-speed network to the computer bus. Disk controller, which controls two disks also, has DMA capability. It provides two DMA channels. It can perform two independent DMA operations, as if each disk has its own DMA controller. The registers to store the memory address, word count and status and control information are duplicated.

 Processor and DMA controllers have to use the bus in an interwoven fashion to access the memory.  DMA devices are given higher priority than the processor to access the bus.  Among different DMA devices, high priority is given to high-speed peripherals such as a disk or a graphics display device.  Processor originates most memory access cycles on the bus.  DMA controller can be said to “steal” memory access cycles from the bus. This interweaving technique is called as “cycle stealing”.  An alternate approach is to provide a DMA controller an exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block or burst mode. 5. Explain in detail about the bus arbitration techniques in detail. Bus arbitration Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The device that is allowed to initiate transfers on the bus at any given time is called the bus master. When the current bus master relinquishes its status as the bus master, another device can acquire this status. The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration.  Centralized arbitration:  A single bus arbiter performs the arbitration.  Distributed arbitration:  All devices participate in the selection of the next bus master. Centralized Bus Arbitration Bus arbiter may be the processor or a separate unit connected to the bus. Normally, the processor is the bus master, unless it grants bus membership to one of the DMA controllers. DMA controller requests the control of the bus by asserting the Bus Request (BR) line. In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free. BG1 signal is connected to all DMA controllers in a daisy chain fashion. BBSY signal is 0; it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can acquire control of the bus.

Distributed arbitration  All devices waiting to use the bus share the responsibility of carrying out the arbitration process.  Arbitration process does not depend on a central arbiter and hence distributed arbitration has higher reliability.

 Each device is assigned a 4-bit ID number.  All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for the Start-Arbitration signal.  To request the bus a device:  Assert the Start-Arbitration signal.  Places its 4-bit ID number on the arbitration lines.

 The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines. 6. Explain in detail about hardware multithreading. Hardware-level support for storing state for multiple threads, permitting fast context switches. There are two types of hardware multithreading  Block or Coarse-grained multi-threading  Fine-grained or Interleaved multi-threading COARSE-GRAIN MULTITHREADING: The simplest type of multi-threading occurs when one thread runs until it is blocked by an event that normally would create a long latency stall. Such a stall might be a cache-miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of ready-to-run threads. For example:  Cycle i : instruction j from thread A is issued  Cycle i+1: instruction j+1 from thread A is issued  Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches  Cycle i+3: thread scheduler invoked, switches to thread B  Cycle i+4: instruction k from thread B is issued  Cycle i+5: instruction k+1 from thread B is issued Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in which tasks voluntarily give up execution time when they need to wait upon some type of the event. HARDWARE COSTS: The goal of multi-threading hardware support is to allow quick switching between a blocked thread and another thread ready to run. To achieve this goal, the hardware cost is to replicate the program visible registers as well as some processor control registers (such as the program counter). Switching from one thread to another thread means the hardware switches from using one register set to another. Many families of microcontrollers and embedded processors have multiple register banks to allow quick context switching for interrupts. Such schemes can be considered a type of block multithreading among the user program thread and the interrupt threads. Such additional hardware has these benefits: The thread switch can be done in one CPU cycle. It appears to each thread that it is executing alone and not sharing any hardware resources with any other threads. This minimizes the amount of software changes needed within the application as well as the operating system to support multithreading. In order to switch efficiently between active threads, each active thread needs to have its own register set. For example, to quickly switch between two threads, the register hardware needs to be instantiated twice. FINE-GRAINED MULTITHREADING (or) INTERLEAVED MULTI-THREADING:  Cycle i+1: an instruction from thread B is issued  Cycle i+2: an instruction from thread C is issued  The purpose of this type of multithreading is to remove all data dependency stalls from the execution pipeline.

 Since one thread is relatively independent from other threads, there's less chance of one instruction in one pipe stage needing an output from an older instruction in the pipeline.  Conceptually, it is similar to pre-emptive multi-tasking used in operating systems.  One can make the analogy that the time-slice given to each active thread is one CPU cycle. HARDWARE COSTS:  In addition to the hardware costs discussed in the Block type of multithreading, interleaved multithreading has an additional cost of each pipeline stage tracking the thread ID of the instruction it is processing.  Also, since there are more threads being executed concurrently in the pipeline, shared resources such as caches and TLBs need to be larger to avoid thrashing between the different threads. MULTITHREADED CATEGORIES:

Multicore Processor Single-core computer

Multi-core architectures

Multi-core CPU chip  The cores fit on a single processor socket  Also called CMP (Chip Multi-Processor) 6. Explain in detail about Flynn’s Classification of parallel hardware.  It was proposed by researcher Michael J. Flynn in 1966.

 It is the most commonly accepted taxonomy of computer organization.  In this classification, computers are classified by whether it processes a single instruction at a time or multiple instructions simultaneously, and whether it operates on one or multiple data sets.

Four categories of multiprocessor systems by their instruction and data streams Classification SISD – Single Instruction Stream Single Data Stream (Uniprocessor) SIMD – Single Instruction Stream Multiple Data Stream (Vector Processor) MISD – Multiple Instruction Stream Single Data Stream (No examples today) MIMD – Multiple Instruction Stream Multiple Data Stream (Multiple Processors) SPMD – Single Program Multiple Data (MIMD Programming Model) i.e. single program that runs on all processors of an MIMD computer.

Single Instruction, Single Data stream (SISD) A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (C) fetches single Instruction Stream (IS) from memory. The CU then generates appropriate control signals to direct single processing element (P) to operate on single Data Stream (DS) i.e. one operation at a time. Examples of SISD architecture are the traditional uniprocessor machines like a PC (currently manufactured PCs have multiple processors) or old mainframes. In computing, SISD (single instruction, single data) is a term referring to a computer architecture in which a single processor, a uniprocessor, executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von Neumann architecture. SISD is one of the four main classifications as defined in Flynn's taxonomy. In this system classifications are based upon the number of concurrent instructions and data streams present in the computer architecture. According to Michael J. Flynn, SISD can have concurrent processing characteristics. Instruction fetching and pipelined execution of instructions are common examples found in most modern SISD computers

Single Instruction Stream Multiple Data Streams (SIMD) A computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelized. The example for SIMD is an array processor or GPU. SIMD is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, SIMD works best with array in for loop to exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.

The vector architectures easily capture the flexibility in data widths, so it is easy to make a vector operation work on 32 64-bit data elements or 64 32-bit data elements or 128 16-bit data elements or 256 8-bit data elements. The parallel semantics of a vector instruction allows an implementation to execute these operations using a deeply pipelined functional unit, an array of parallel functional units, or a combination of parallel and pipelined functional units.

(a)Single add pipeline and can complete one addition per clock cycle (b) Four add pipeline or lanes and can complete four additions per clock cycle Vector arithmetic instructions usually only allow element N of one vector register to take part in operations with element N from other vector registers. By construction of a highly parallel vector unit, this can be structured as multiple parallel vector lanes. It can increase the peak throughput of a vector unit by adding more lanes.

Structure of a four-lane vector unit Four lanes can reduce the number of clocks per vector instruction by roughly a factor of four. The vector architectures are a very efficient way to execute data parallel processing programs and they are easier to evolve over time than the multimedia extensions to the x86 architecture.

Multiple Instruction Stream Single Data stream (MISD) Multiple instructions operate on a single data stream. Uncommon architecture, which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer.

In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type. Not many instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data parallel techniques. Specifically, they allow better scaling and use of computational resources than MISD does. However, one prominent example of MISD in computing is the Space Shuttle flight control computers Multiple Instruction Stream Multiple Data streams (MIMD) Multiple autonomous processors are simultaneously executing different instructions on different data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single shared memory space or a distributed memory space. A multi-core superscalar processor is an MIMD processor.

In computing, MIMD (multiple instructions, multiple data) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided

design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors access memory. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.

7. Explain in detail about Basic MIPS implementation. Draw the Block diagram for Mips Implementation diagram with all necessary Control lines and Multiplexers (13 Marks) 8. Discuss in detail about to building a data path. (13) Program Counter Instruction Memory Adder Register File ALU Data Memory Sign Extend Multiplexers Data Path Elements for Branch (8 Marks) Using this all fuctional units and building the data path Block diagram (5 Marks) 9. What is pipelining? Discuss about pipelined data path and control. Definition (2 Marks) Block Diagram with all control lines and all stages (11 Marks) 10. Explain in detail about various categories of hazards with examples. Structural Hazards (2 Marks) Data Hazards with avoiding Techniques (5 Marks) Control Hazards with Branching Techniques (6 Marks) 11. Explain how floating point addition is carried out in a computer system. Give an example for a binary floating point addition. Refer:2.7 Page No:2.50 12. Explain in detail about the multiplication algorithm with suitable example and diagram. Refer:2.4 Page No: 2.18 13. Discuss in detail about division algorithm in detail with diagram and examples. Refer:2.5 Page No: 2.32

2. Briefly explain about measuring and improving the performance of a computer? Running a program on two different desktop computers, you’d say that the faster one is the desktop computer that gets the job done first. If you were running a datacenter that had several servers running jobs submitted by many users, you’d say that the faster computer was the one that completed the most jobs during a day. As an individual computer user, you are interested in reducing response time—the time between the start and completion of a task— also referred as execution time.

Datacenter managers are often interested in increasing throughput or bandwidth—the total amount of work done in a given time. To maximize performance, we want to minimize response time or execution time for some task. Thus, we can relate performance and execution time for a computer X:

This means that for two computers X and Y, if the performance of X is greater than the performance of Y, we have

That is, the execution time on Y is longer than that on X, if X is faster than Y. In discussing a computer design, we often want to relate the performance of two different computers quantitatively. We will use the phrase “X is n times faster than Y”—or equivalently “X is n times as fast as Y”—to mean

If X is n times as fast as Y, then the execution time on Y is n times as long as it is on X:

Measuring Performance: The computer that performs the same amount of work in the least time is the fastest. Program execution time is measured in seconds per program. CPU execution time or simply CPU time, which recognizes this distinction, is the time the CPU spends computing for this task and does not include time spent waiting for I/O or running other programs. CPU time can be further divided into the CPU time spent in the program, called user CPU time, and the CPU time spent in the operating system performing tasks onbehalf of the program, called system CPU time. The term system performance to refer to elapsed time on an unloaded system and CPU performance to refer to user CPU time. CPU Performance and Its Factors

Alternatively, because clock rate and clock cycle time are inverses,

This formula makes it clear that the hardware designer can improve performance by reducing the number of clock cycles required for a program or the length of the clock cycle. As we will see in later chapters, the designer often faces a trade-off between the number of clock cycles needed for a program and the length of each cycle. Many techniques that decrease the number of clock cycles may also increase the clock cycle time. Instruction Performance

However, since the compiler clearly generated instructions to execute, and the computer had to execute the instructions to run the program, the execution time must depend on the number of instructions in a program.

The term clock cycles per instruction, which is the average number of clock cycles each instruction takes to execute, is often abbreviated as CPI. Since different instructions may take different amounts of time depending on what they do, CPI is an average of all the instructions executed in the program. The Classic CPU Performance Equation

The basic performance equation in terms of instruction count (the number of instructions executed by the program), CPI, and clock cycle time:

or, since the clock rate is the inverse of clock cycle time:

3. Briefly explain about manufacturing process of integrated chips with neat diagram? The manufacture of a chip begins with silicon, a substance found in sand. Because silicon does not conduct electricity well, it is called a semiconductor. With a special chemical

process, it is possible to add materials to silicon that allow tiny areas to transform into one of three devices: Excellent conductors of electricity (using either microscopic copper or aluminum wire) Excellent insulators from electricity (like plastic sheathing or glass) Areas that can conduct or insulate under special conditions (as a switch) Transistors fall in the last category. A VLSI circuit, then, is just billions of combinations of conductors,insulators, and switches manufactured in a single small package. Figure shows process for Integrated chip manufacturing. The process starts with a silicon crystal ingot, which looks like a giant sausage. Today, ingots are 8–12 inches in diameter and about 12–24 inches long. An ingot is finely sliced into wafers no more than 0.1 inches thick. These wafers then go through a series of processing steps, during which patterns of chemicals are placed on each wafer, creating the transistors, conductors, and insulators.

The simplest way to cope with imperfection is to place many independent components on a single wafer. The patterned wafer is then chopped up, or diced, into these components, called dies and more informally known as chips. To reduce the cost, using the next generation process shrinks a large die as it uses smaller sizes for both transistors and wires. This improves the yield and the die count per wafer. Once you’ve found good dies, they are connected to the input/output pins of a package, using a process called bonding. These packaged parts are tested a final time, since mistakes can occur in packaging,and then they are shipped to customers. The cost of an integrated circuit can be expressed in three simple equations:

4. Write short notes on operations and operands in computer hardware? The words of a computer’s language are called instructions, and its vocabulary is called an instruction set. Operations in MIPS: Every computer must be able to perform arithmetic. The MIPS assembly language notation add a, b, c instructs a computer to add the two variables b and c and to put their sum in a. This notation is rigid in that each MIPS arithmetic instruction performs only one operation and must always have exactly three variables. EXAMPLE, To add 4 variables, b,c,d,e and store it in a. add a, b, c # The sum of b and c is placed in a add a, a, d # The sum of b, c, and d is now in a add a, a, e # The sum of b, c, d, and e is now in a Thus, it takes three instructions to sum the four variables.

Name 32 registers

Example Comments $s0-$s7, $t0-$t9, Fast locations for data. In MIPS, data must be in registers to perform $zero, $a0-$a3, $v0- arithmetic. MIPS register $zero always equals 0. $gp (28) is the global $v1, $gp, $fp, $sp, $ra pointer, $sp(29) is the stack pointer, $fp (30) is the frame pointer, and $ra (31) is the return address. Memory [0], Accessed only by data transfer instructions. MIPS uses byte addresses, so

30

2 memory words Memory [4],…, sequential words differ by 4. Memory holds data structures, such as arrays, Memory[42949672920 and spilled register, such as those saved on procedure calls.

Operands in MIPS: The operands of arithmetic instructions are restricted; they must be from a limited number of special locations built directly in hardware called registers. The size of a register in the MIPS architecture is 32 bits; groups of 32 bits occur so frequently that they are given the name word in the MIPS architecture. Design Principle 2: Smaller is faster.

A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther. So, 32 registers were used in MIPS architecture. The MIPS convention is to use two-character names following a dollar sign to represent a register. eg: $s0, $s1 Example: f = (g + h) – (i + j); instructions using registers. add $t0,$s1,$s2 # register $t0 contains g + h add $t1,$s3,$s4 # register $t1 contains i + j sub $s0,$t0,$t1 # f gets $t0 – $t1, which is (g + h)–(i + j) Memory Operands: Programming languages have simple variables that contain single data elements, as in these examples, but they also have more complex data structures—arrays and structures. These complex data structures can contain many more data elements than there are registers in a computer. The processor can keep only a small amount of data in registers, but computer memory contains billions of data elements. So, MIPS must include instructions that transfer data between memory and registers. Such instructions are called data transfer instructions. To access a word in memory, the instruction must supply the memory address. In MIPS, words must start at addresses that are multiples of 4. This requirement is called an alignment restriction, and many architectures have it.(since in MIPS each 32 bits form a word in memory, so the address of one word to another jumps in multiples of 4) Byte addressing also affects the array index. To get the proper byte address in the code above, the off set to be added to the base register $s3 must be 4 x 8, or 32,(as per previous example). Above EXAMPLE: g = h + A[8]; (implemented based on byte address) To get A[8] from memory use lw and calculate (8 x4) = 32 which is the actual offset value, lw $t0,32($s3) # Temporary reg $t0 gets A[8] Use Result of A[8] i.e., stored in $t0, add $s1,$s2,$t0 # g = h + A[8] The instruction complementary to load is traditionally called store; it copies data from a register to memory. The format of a store is similar to that of a load: the name of the operation, followed by the register to be stored, then off set to select the array element, and finally the base register.

Once again, the MIPS address is specified in part by a constant and in part by the contents of a register. The actual MIPS name is sw, standing for store word. EXAMPLE: A[12] = h + A[8]; lw $t0, 32($s3) # Temporary reg $t0 gets A[8], note (8 x4) used. add $t0, $s2,$t0 # Temporary reg $t0 gets h + A[8] sw $t0, 48($s3) # Stores h + A[8] back into A[12], note (12 x 4) used. Constant or Immediate Operands: For example, to add the constant 4 to register $s3, we could use the code lw $t0, AddrConstant4($s1) # $t0 = constant 4 add $s3,$s3,$t0 # $s3 = $s3 + $t0 ($t0 == 4) Alternative, that avoids the load instruction is to offer versions of the arithmetic instructions in which one operand is a constant. example: add immediate or addi instructions. addi $s3, $s3, 4 # $s3 = $s3 + 4 Constant operands occur frequently, and by including constants inside arithmetic instructions, operations are much faster and use less energy than if constants were loaded from memory. 5. Write short notes on Instructions and its types that are used in MIPS? or List the three MIPS instruction formats used to represent the instructions? R-type Instructions J-type Instructions I-type Instructions 6. Write short Notes on Logical operators / operation?

7. Write short notes on Decision making and branching instructions? Branch and Conditional branches: Decision making is commonly represented in programming languages using the if statement, sometimes combined with go to statements and labels. MIPS assembly language includes two decision-making instructions, similar to an if statement with a go to.

8. Write short notes on addressing and addressing modes? OR What are addressing modes? Explain the various addressing modes with suitable examples. Multiple forms of addressing are generically called addressing modes. This defines how operands are identified for each addressing mode?. The MIPS addressing modes are the following: 1. Immediate addressing, where the operand is a constant within the instruction itself Example : Constant data specified in an instruction addi $s3, $s3, 4

2. Register addressing, where the operand is a register Example : add $t0, $s1, $s2 All the operands of the instruction are registers. It adds the value of $s1 and $s2 and store the result in &to. 3. Base or displacement addressing, where the operand is at the memory location whose address is the sum of a register and a constant in the instruction Example : lw $to , 32($s1) The address of operand is sum of offset value (32) and the register value ($s1). 4. PC-relative addressing, where the branch address is the sum of the PC (Program Counter ) and a constant in the instruction Example : beq $s1, $s2, Label 5. Pseudodirect addressing, where the jump address is the 26 bits of the instruction concatenated with the upper bits of the PC Example Direct addressing means specifying a complete 32 bit address in the instruction itself. However, since MIPS instructions are 32 bits, we can't do that. In theory, you only need 30 bits to specify the address of an instruction in memory. However, MIPS uses 6 bits for the opcode, so there's still not enough bits to do true direct addressing. Instead, we can do pseudo-direct addressing. This occurs in j instructions. Opcode

Target

B31-26

B25-0

ooo ooo

tt tttt tttt tttt tttt tttt tttt

26 bits are used for the target. This is how the address for pseudo-direct addressing is computed.

PC <- PC31-28::IR25-0::00 Take the top 4 bits of the PC, concatenate that with the 26 bits that make up the target, and concatenate that with 00. This produces a 32 bit address. This is the new address of the PC. This allows you to jump to 1/16 of all possible MIPS addresses (since you don't control the top 4 bits of the PC). Draw the Diagram.

UNIT II

2. Explain Multiplication Algorithm along with an example.Multiplication of decimal numbers in long hand, can be used to show the steps of multiplication and the names of the operands.

B.E/ B.TECH. DEGREE EXAMINATION, APRIL/MAY 2015 Third Semester Computer Science and Engineering CS 6303- COMPUTER ARCHITECTURE (Regulation 2013) Part - A (10 × 2 = 20 marks) 1. List the eight great ideas invented by computer architects. (i) Design for Moore’s Law (ii) Use Abstraction to Simplify Design (iii) Make the Common Case F a st (iv) Performance via Parallelism (v) Performance via Pipelining (vi) Performance via Prediction (vii) Hierarchy of Memories (viii) Dependability via Redundancy 2. Distinguish Pipelining from Parallelism.  Performance can be improved by doing operations in parallel. Parallel execution is easier when tasks are independent, but often they need to co-operate with each other.  Pipelining is a technique in which multiple instructions are overlapped in execution. It exploits parallelism among the instructions in a sequential instruction stream.

3. How overflow occur in subtraction. Overflow occurs in subtraction when we subtract a negative number from a positive number and get a negative result, or when we subtract a positive number from a negative number and get a positive result. It means a borrow occurred from the sign bit. 4. What do you mean by sub word Parallelism? By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to perform simultaneous operations on short vectors of sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit operands. The cost of such partitioned adders was small. This concept is called as subword parallelism. 5. What are R-Type instructions? R-Type instructions follows the following format. In this format, two registers are used as source register and one register is used as destination register. OP rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits The meaning of each name of the fields is as follows: op Basic operation of the instruction, traditionally called the opcode. rs The first register source operand. rt The second register source operand. rd The register destination operand. It gets the result of the operation.

shamt funct -

Shift amount. Function. This field, often called the function code, selects the specific variant of the operation in the op field.

6. What is a branch prediction buffer? A branch prediction buffer or branch history table is a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not. 7. Differentiate between Strong scaling and weak scaling. Strong scaling Speedup achieved on a multiprocessor without increasing the size of the problem is called strong scaling. Weak scaling Speedup achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors is called weak scaling. 8.

Compare UMA and NUMA multiprocessors. Uniform Memory Access(UMA) Latency to a word in memory does not depend on which processor asks for it. That is, the time taken to access a word is uniform for all processors. Non-Uniform Memory Access (NUMA)  Some memory access are much faster than others depending on which processor asks for which word  Here, the main memory is divided and attached to different microprocessors or to different memory controllers on the same chip.  Programming challenges are harder for NUMA than UMA.  NUMA machines can scale to larger size.  NUMAs can have lower latency to nearby memory

9. What is the need to implement memory as a hierarchy? Computers use different types of memory units for different types of purposes. Each memory unit has its own advantages and disadvantages. A structure that uses multiple levels of memories is called memory hierarchy. A memory hierarchy consists of multiple levels of memory with different speeds and sizes. The faster memories are more expensive per bit than the slower memories and thus are smaller. As the distance from the processor increases, the size of the memories and the access time both increases.

10. Point out how DMA can improve I/O speed. The CPU still responsible for initiating each block transfer. Then, the DMA interface controller can take the control and responsibility of transferring data. So that data can be transferred without the intervention of CPU. The CPU and IO controller interacts with each other only when the control of bus is requested.

PART - B (5 × 16 = 80 Marks)

11. (a) Discuss about the various techniques to represent instructions in a computer system. 2.4) (16)

(Refer Section

Or (b) What is the need for addressing in a computer system? Explain the different addressing modes with suitable examples. (16) (Refer Section 2.7) 12. (a) Explain the sequential version of multiplication algorithm and its hardware. (16) (Refer Section 3.3.1) Or (b) Explain how floating point addition is carried out in a computer system. Give an example for a binary floating point addition. (16) (Refer Section 3.5.4) 13. (a) Explain the different types of pipeline hazards with suitable examples (16) (Refer Section 4.5.1) Or (b) Explain in detail how exceptions are handled in MIPS architecture? (16) (Refer Section 4.9) 14. (a) Discuss about SISD, MIMD, SIMD, SPMD and VECTOR systems. (16) (Refer Section 5.4) Or (b) What is hardware multithreading? Compare and contrast Fine grained MultiThreading and coarse grained Multi- Threading. (16) (Refer Section 5.5) 15. (a) Elaborate on the various memory technologies and its relevance. (16) (Refer Section 6.3) Or (b) What is virtual memory? Explain the steps involved in virtual memory address translation (16) (Refer Section 6.5)

B.E/B.Tech. DEGREE EXAMINATION, MAY/JUNE 2016 Computer Science and Engineering CS 6303 – COMPUTER ARCHITECTURE (Regulation 2013) Time : Three Hours Maximum : 100 Marks Answer ALL questions PART – A(10*2=20 Marks) 1. How to represent Instruction in a Computer System? Instructions are kept in the computer as a series of high and low electronic signals and may be represented as numbers. Since registers are referred in instructions, there must be a convention to map register names into numbers. Three types of instruction formats are used in MIPS. They are:   

R-Type or R-Format (Register) I-Type or I - Format (Immediate) J- Type (Addressing in Branches and Jumps)

2. Distinguish between auto increment and auto decrement addressing mode. A special case of indirect register mode. The register whose number is included in the instruction code, contains the address of the operand. In Autoincrement Mode, after operand addressing , the contents of the register is incremented. Example: (R1)+ In Decrement Mode, before operand addressing, the contents of the register is decremented. Example: -(R1) 3. Define ALU. An arithmetic logic unit (ALU) is a digital circuit used to perform arithmetic and logic operations. It represents the fundamental building block of the central processing unit (CPU) of a computer. 4. What is Subword Parallelism? By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to perform simultaneous operations on short vectors of sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit operands. The cost of such partitioned adders was small. Given that the parallelism occurs within a wide word, the extensions are classified as subword parallelism. 5. What are the advantages of pipelining?  The cycle time of the processor is reduced;  Pipelining increases the instruction throughput. 6. What is Exception? Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution. It is also used to detect an overflow condition. Events other than branches or jumps that change the normal flow of instruction execution comes under exception. 7. State the need for Instruction Level parallelism? Pipelining improves performance by increasing the rate at which instructions can be executed. However, there are limits to how much pipelining can improve performance. As more and more pipeline stages are added to a processor, the delay of the pipeline register is also increased and thereby reducing the benefit of increasing the pipeline depth.

Instruction Level Parallelism enables more number of instructions that can be performed simultaneously during a single clock cycle. 8. What is Fine grained Multithreading? Fine-grained multithreading switches between threads after every instruction. It makes interleaved execution of multiple threads at the same time. This interleaving is done in round robin fashion on every clock cycle. 9. Define Memory hierarchy. Computers use different types of memory units for different types of purposes. Each memory unit has its own advantages and disadvantages. A structure that uses multiple levels of memories is called memory hierarchy. A memory hierarchy consists of multiple levels of memory with different speeds and sizes. The faster memories are more expensive per bit than the slower memories and thus are smaller. As the distance from the processor increases, the size of the memories and the access time both increases. 10. State the advantages of virtual memory. The primary advantage or objective of Virtual Memory systems is the ability to load and execute a process that requires a larger amount of memory than what is available by loading the process in parts and then executing them. It eliminates external fragmentation. PART - B (5*16=80 Marks) 11. (a) Discuss about the various components of a computer system. (16) Refer section 1.3 (b) Elaborate the different types of addressing modes with a suitable example (16) Refer section 2.7 12. (a) Explain briefly about floating point addition and Subtraction algorithms. (16) Refer section 3.5.4 (b) Define Booth Multiplication algorithm with suitable example. (16) (Out of syllabus) Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one of two predetermined values Aand S to a product P, then performing a rightward arithmetic shift on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the number of bits in m and r. 1. Determine the values of A and S, and the initial value of P. All of these numbers should have a length equal to (x + y + 1).  A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits with zeros.  S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the remaining (y + 1) bits with zeros.  P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the least significant (rightmost) bit with a zero. 2. Determine the two least significant (rightmost) bits of P.  If they are 01, find the value of P + A. Ignore any overflow.  If they are 10, find the value of P + S. Ignore any overflow.  If they are 00, do nothing. Use P directly in the next step.  If they are 11, do nothing. Use P directly in the next step. 3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal this new value. 4. Repeat steps 2 and 3 until they have been done y times. 5. Drop the least significant (rightmost) bit from P. This is the product of m and r.

13. (a) What is pipelining? Discuss about pipelined data path and control. Refer section 4.6 (b) Briefly explain about various categories of hazards with examples. Refer section 4.7 and 4.8 14. (a) Explain in detail about Flynn’s classification. Refer section 5.4 (b) Write shot Notes on : (i) Hardware multithreading Refer section 5.5 (ii) Multicore processors. Refer section 5.6

(16)

15. (a) Define Cache Memory ? Explain the various mapping techniques associated with cache memories Refer section 6.4.3 (b) Explain about DMA controller, with the help of a block diagram. Refer section 7.3

(16)

(16) (16) (16)

(16)

B.E/B.TECH. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2014. Third Semester Computer Science and Engineering CS 6303 – COMPUTER ARCHITECTURE (Regulation 2013) Time : Three hours Maximum : 100 marks Answer ALL questions PART A – (10 * 2 = 20 marks) 1. State Amdahl’s Law. Amdahl’s Law is used to find the execution time of a program after making the improvement. It can be represented in an equation as follows: 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑡𝑒𝑟 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑓𝑒𝑐𝑡𝑒𝑑 𝑏𝑦 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 = + 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑢𝑛𝑎𝑓𝑓𝑒𝑐𝑡𝑒𝑑 𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 Hence, Amdahl’s Law can be used to estimate performance improvements. 2. Brief about Relative addressing mode with an example. In this addressing mode, the branch address is the sum of the PC and a constant in the instruction. Figure depicts the format of PC-relative addressing.

6

5

5

16 bits

op

rs

rt

Address Memory

PC

+

Word

PC-relative Addressing Example: bne $s0,$s1,Exit

# go to Exit if $s0 ≠ $s1

In the above example branch address is calculated by adding the PC value with the constant in the instruction. 3. Define Little Endian arrangement. In Little - Endian type, the lower byte addresses are used for the less significant bytes (the rightmost bytes) of the word. 4. What is DMA? The CPU is responsible for only initiating each block transfer. Then, the interface controller can take the control and responsibility of transferring data. So that data can be transferred without the intervention of CPU. The CPU and IO controller interacts with each other only when the control of bus is requested. This level of IO control is called Direct Memory Access (DMA). 5. What is the need for Speculation?

Speculation is an approach that allows the compile or processor to guess about the outcome of an instruction, so as to enable execution to begin for other instruction that depend on speculated instruction. Speculation, performed by the compiler or the hardware, can increase the amount of ILP that can be exploited via prediction. For example, if branch is predicted earlier, instruction following branch can be executed earlier. 6. What is Exception? Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution. It is also used to detect an overflow condition. Events other than branches or jumps that change the normal flow of instruction execution comes under exception. 7. What is Flynn’s Classification? (Refer Section 5.4.2) Flynn’s classification is based on multiplicity of instruction streams and data streams observed by the CPU during program execution. One categorization of parallel hardware is based on the number of instruction streams and the number of data streams. Figure shows the various categories of hardware based on number of instruction streams and data streams. Data Streams Instruction Streams

Single

Multiple

Single

SISD : Intel Pentium 4

SIMD: SSE Instructions of x86

Multiple

MISD: No examples

MIMD: Intel Core I7

8. Brief about Multithreading. (Refer Section 5.5) Multithreading is first introduced to exploit thread-level paprallelism within a processor. Hardware multithreading allows multiple threads to share the functional units of a single processor in an overlapping fashion. 9. Difference Programmed I/O and Interrupt I/O. (Refer Section 7.2) Programmed-driven I/O means the program is polling or checking some hardware item e.g. mouse within a loop. It is easy to program and understand, but it is slow and inefficient. For Interrupt driven I/O, the same mouse will trigger a signal to the program to process the mouse event. It is fast and efficient, but difficult to implement.

10. What is the purpose of Dirty/Modified bit in Cache memory? It Indicates whether the page has been modified during residency in the memory. This information is needed to determine whether the page should be written back in the disk before it is removed from memory. PART B –(5 * 16 = 80 marks) 11. (a) (i) Assume a two address format specified as source, destination. Examine the following sequence of instructions and explain the addressing modes used and the operation done in every instruction. (10) (1) Move (r5)+, R0 (2) Add (R5)+, R0 (3) Move R0, R(5) (4) Move 16(R5), R3 (5) Add #40,R5 (Out of Syllabus)

(ii) Consider the computer with three instruction classes and CPI measurements as given below and Instruction counts for each instruction class for same program from two different compilers are given. Assume that the computer’s clock rate is 4GHZ. Which Code sequence will execute faster according to execution time? (6) Code from CPI for this Instruction Class A B C CPI 1 2 3 Code from Instruction Count for each Class A B C Compiler 1 2 1 2 Compiler 2 4 1 1 (Refer Example 1.4 ) Or (b) (i) Explain the components of a computer System. (8) (Refer Section 1.3) (ii) State the CPU performance equation and discuss the factors that affect performance. (8) (Refer Section 1.5) 12. (a) (i) Multiply the following pair of signed nos. using Booth’s bit-pair recording of the multiplier. A = +13 (Multiplicand) and B = -6 (Multiplier). (10) (Out of Syllabus) (ii) Briefly Explain Carry lookahead adder. (6) (Out of Syllabus) Or (b) Divide (12)10 By (3)10 using the Restoring and Non restoring division algorithm with step by step intermediate results and explain. (16) (Out of Syllabus) 13. (a) Explain Data path and its control in detail. (16) (Refer Section 4.3 & 4.4) Or (b) What is Hazard? Explain its types with suitable examples. (16) (Refer Section 4.5.1) 14. (a) Explain Instruction level Parallel Processing. State the challenges of parallel Processing (16) (Refer Section 5.2 & 5.3) Or (b) Explain the terms: (i) Multicore Processor. (Refer Section 5.6) (ii) Hardware Multithreading. (Refer Section 5.5) (16) 15. (a) (i) Explain mapping functions in cache memory to determine how memory blocks are placed in Cache. (8) (Refer Section 6.4.3) (ii) Explain in detail about the Bus Arbitration techniques in DMA (8) (Refer Section 7.3.4) Or (b) (i) Draw different memory address layouts and brief about the technique used to increase the average rate of fetching words from the main memory. (8) (Refer Section 6.3)

(ii) Explain in detail about any two Standard Input and Output Interface required to connect the I/O device to the Bus. (8) (Refer Section 7.3)

B.E/ B.TECH. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2015 Third Semester Computer Science and Engineering CS 6303- COMPUTER ARCHITECTURE (Regulation 2013) Part - A (10 × 2 = 20 marks) 1. What is instruction set architecture? The words of a computer’s language are called instructions, and its vocabulary is called an instruction set. It is also called as instruction set architecture. 2. How CPU execution time for a program is calculated? CPU execution time or simply CPU time, which recognizes this distinction, is the time the CPU spends computing for this task and does not include time spent waiting for I/O or running other programs. CPU time can be further divided into the CPU time spent in the program, called user CPU time, and the CPU time spent in the operating system performing tasks on behalf of the program, called system CPU time. 3. What are the overflow / underflow conditions for addition and subtraction? The following table shows the combination of operations, operands, and results that indicate an overflow. Operation

Operand A

Operand B

A+B A+B A–B A–B

≥0 <0 ≥0 <0

≥0 <0 <0 ≥0

Result indicating overflow < 0 ≥0 <0 ≥0

4. State the representation of double precision floating point number. A floating-point value represented in two 32-bit words is called Double precision. The representation of double precision floating-point numbers takes two MIPS words as shown below, , where s is the sign of the floating-point number (0 – positive, 1 – negative ), exponent is the value of the 11-bit exponent field, and fraction is the 52-bit number in the fraction field. 31 S 1 bit

30

….. Exponent 11 bits

20

19

….. Fraction 20 bits

0

Fraction (Continued) 32 bits 5. What is a hazard? What are its types? There are situations in pipelining when the next instruction cannot execute in the following clock cycle. These events are called hazards, and there are three different types: (i) Structural hazard (ii) Data hazard (iii) Control hazard 6. What is meant by branch prediction?

It is a method to resolve control hazard that assumes the outcome of a branch and proceeds from that assumption rather than waiting to determine the actual outcome. It is more suitable for loops in programming. At the bottom of loops are branches that jump back to the top of the loop. Since they are likely to be taken and they branch backward, we could always predict taken for branches that jump to an earlier address. 7. What is ILP? Instruction Level Parallelism (ILP) is a measure of number of instructions that can be performed simultaneously during a single clock cycle. The potential overlap among instructions is called as instruction level parallelism.

8.

Define Super scalar processor. Dynamic Multiple Issue Processor is also called as super scalar processors or simply superscalar. In the simplest superscalar processors, instructions are issued in certain order. The processor decides whether zero, one, or more instructions can be issued in a given clock cycle. More than one instruction is executed per clock cycle by selecting them during execution. Compiler tries to schedule instruction to move dependences separately, so as to improve instruction issue rate. 9. What are the various memory technologies? There are four primary technologies used now in memory hierarchies. They are: 1. Static Random Access Memory (SRAM) 2. Dynamic Random Access Memory (DRAM) 3. ROM and Flash memory 4. Magnetic disk 10. Define Hit ratio. The fraction of memory accesses found in a level of the memory hierarchy is called as hit rate or hit ratio. It is used as a measure of the performance of the memory hierarchy.

PART - B (5 × 16 = 80 Marks) 11. a) Explain in details the various components of computer system with neat diagram.

(16)

(Refer Section 1.3) (Page no. 1.6 – 1.9) (OR) b) What is an addressing mode? Explain the various addressing modes with suitable examples. (16) (Refer Section 2.7) (page no. 2.33 -2.36) 12.a) Explain in details about the multiplication algorithm with suitable examples and diagram.(16) (Refer Section 3.3.1) Page no 3.25 -3.28 (OR) b) Discuss in details about division algorithm in details with diagram and examples.

(16)

(Refer Section 3.4.1) Page no 3.34 -3.37 13. a) Explain the basics MIPS implementation with necessary multiplexers and control lines.( 16) Page nos. 4.1 – 4.3 , 4.8,4.16, figure 4.6

(OR) b) Explain how the instruction pipeline works? What are the various situations where an instruction pipeline can stall? Illustrate with an example (16) (Refer Section 4.5) Page no 4.19 – 4.24 14. a) Explain in details Flynn’s classification of parallel hardware.

(16)

(Refer Section 5.4) Page no. 5.14 – 5.21 till 5.4.6 (OR) b) Explain in detail about hardware Multithreading.

(16)

(Refer Section 5.5)page 5.23 -2.26 15. a) What is virtual memory? Explain in details about how virtual memory is implemented with neat (16)

diagram?

(Refer Section 6.5)page no 6.43 -6.48 (OR) b) Draw the typical block diagram of a DMA Controller and explain how it is used for direct data transfer between memory and peripherals? (16) (Refer Section 7.3) Page no. 7.7 – 7.11 till 7.3.3

Question Paper Code – 71675 B.E/ B.TECH. DEGREE EXAMINATION, APRIL/MAY 2017 Third/Fifth/Sixth Semester Computer Science and Engineering CS 6303- COMPUTER ARCHITECTURE (Common to ECE,E&I,I&C,R&AE,IT) (Regulation 2013) PART-A 1. List the major components of a computer system.  input unit 

output unit



memory unit



arithmetic and logic unit



control unit.

2. State the need for indirect addressing mode.Give an example. The effective address of the operand is the contents of a register or main memory location, location whose address appears in the instruction. The register or memory location that contains the address of the operand is a pointer. When an execution takes place in such mode, instruction may be told to go to a specific address. Once it's there, instead of finding an operand, it finds an address where the operand is located. 3. Subtract (11010)2-(10000)2 using 1’s complement and 2’s complement method. 1’s complement method (i) Take 1’s complement of subtrahend : 01111 (ii) Add with minuend: 11010 + 01111 = 1 01001 (iii) If there is a carry add with result: 01010 (iv) The result is 01010 2’s complement method (i) Take 2’s complement of subtrahend : 10000 (ii) Add with minuend: 11010 + 10000 = 1 01010 (iii) If there is a carry discard it : 01010 (iv) The result is 01010 4. Write the rules to perform addition on floating point numbers. In order to add, we need the exponents of the two numbers to be the same. Determine which exponent is the smaller exponent. Rewrite that number using the larger exponent, so that the two exponents are now the same. 5. Name the control signals required to perform arithmetic operations. 1. RegDst 2. RegWrite 3. ALUSrc

4. MemtoReg 6. Define hazard.Give an example for data hazard. When an instruction cannot execute in the next clock cycle, in a pipelining is called hazard these events are called hazards.  Data hazard occurs when the pipeline must be stalled because one step must wait for another to complete. For example, suppose we have an add instruction followed immediately by a subtract instruction that uses the sum ($s0): add $s0, $t0, $t1 sub $t2, $s0, $t3 7. What is instruction level parallelism? Instruction Level Parallelism (ILP) is a measure of number of instructions that can be performed simultaneously during a single clock cycle. The potential overlap among instructions is called as instruction level parallelism. 8. Distinguish implicit multithreading and explicit multithreading. Explicit multi threading concurrently execute instructions from different explicit threads. Implicit multi threading is concurrent execution of multiple threads extracted from single sequential program. 9. Define memory interleaving. Memory interleaving is a category of techniques for increasing memory speed. For example, with separate memory banks for odd and even addresses, the next byte of memory can be accessed while the current byte is being refreshed. 10. Summarize the sequence of events involved in handling an interrupt request from a single device. If there is an interrupt, the interrupt handler will be triggered. The handler will stop the present instruction which is processing and save its configuration in a register and load the program counter of the interrupt from a location which is given by the interrupt vector table. After processing the interrupt by the processor interrupt handler will load the instruction and its configuration from the saved register, process will start its processing where it’s left. PART-B 11 (a) Explain the important measures of the performance of a computer and derive the basic performance equation. Refer section 1.5 (Page No.1.17, 1.19 and 1.20) Or (b) Explain direct, immediate, relative and indexed addressing modes with examples. Refer section 2.7 (Page No. 2.33 - 2.35)

12 (a) (i) Demonstrate multiplication of two binary numbers with an example? Design an arithmetic element to perform this multiplication. Refer section 3.3.1 (Page No. 3.28)

(ii) Describe non restoring division with an example. (Out of Syllabus) Or (b) (i) Design an arithmetic element to perform the basic floating point operations. Refer section 3.5.4, 3.5.5 (Page No. 3.56, 3.64) (ii) What is meant by sub word parallelism?Explain. Refer section 3.6 (Page No. 3.79) 13 (a) Discuss the modified data path to accommodate pipelined executions with a diagram. Refer section 4.6.1 (Page No. 4.26) Or (b) (i) Explain the hazards caused by unconditional branching statements. Refer section 4.8 (Page No. 4.38 – 4.44) (ii) Describe operand forwarding in a pipeline processor with a diagram. Refer section 4.7.1 (Page No. 4.32 – 4.34) 14 (a) (i) Discuss the challenges in parallel processing with necessary examples. Refer section 5.3 (Page No. 5.12 – 5.13) (ii) Explain Flynn’s classification of parallel processing with necessary diagrams. Refer section 5.4 (Page No. 5.16 – 5.20) Or (b) Explain the four principal approaches to multithreading with necessary diagrams. Refer section 5.5 (Page No. 5.23 – 5.26) 15 (a) Explain the different mapping functions that can be applied on cache memories in detail. Refer section 6.4.3 (Page No. 6.23 – 6.30) Or (b) (i) Explain virtual memory address translation in detail with necessary diagrams. Refer section 6.5.2 (Page No. 6.44 – 6.46) (ii) What is meant by Direct Memory Access?Explain the use of DMA controllers in a computer system. Refer section 7.3 (Page No. 7.7 – 7.11) PART-C 16 (A) (i) Explain mapping functions in cache memory to determine how memory blocks are placed in cache. Refer section 6.4.3 (Page No. 6.23 – 6.30) (ii) Explain in detail about the Bus Arbitration techniques in DMA. Refer section 7.3.4 (Page No. 7.11 – 7.15) Or (b) A pipelined processor uses delayed branch techniques Recommend any one of the following possibility for the design of the processor. In the first possibility,the processor has a 4-stage pipeline and one delay slot.In the second possibility,it has a 6-stage pipeline and two delay slots.Compare the performance of these two alternatives,taking only the branch penalty into account.Assume that 20% of the

instructions are branch instructions and that an optimizing compiler has an 80% success rate in filling in the single delay slot.For the second alternative,the compiler is able to fill the second slot 25% of the time. ANSWER  The first possibility is the better option when compared with the second alternative.  It produces 80% success rate in filling in the single delay slot whereas the second alternative fill the second slot 25% of the time.  Hence the first alternative can be selected.  Diagrams should be drawn for four stage pipeline and five stage pipelines with one delay slot and two delay slots respectively by considering an example code which contains 20% of the instructions as branch instructions.  From the diagrams it is clear that the first option is the better one.

Question Paper Code : 40902 B.E/B.Tech. DEGREE EXAMINATION, APRIL/MAY 2018 Third/Fifth/Sixth Semester Computer Science and Engineering CS 6303 – COMPUTER ARCHITECTURE (Common to : Electronics and Communication Engineering / Electronics and Instrumentation Engineering / Instrumentation and Control Engineering / Robotics and Automation Engineering / Information Technology) (Regulation 2013) Time : Three Hours

Maximum : 100 Marks Answer ALL questions PART – A (10 × 2=20 Marks)

1.

Write the equation for the dynamic power required per transistor. The power required per transistor is just the product of energy of a transition and the frequency of transitions: Power ∝ 1/2 × Capacitive load × Voltage2 × Frequency switched\

2.

Classify the instructions based on the operations they perform and give one example to each category. Classifications of Instructions: Arithmetic – add $s1,$s2,$s3 Data transfer - lw $s1,20($s2) Logical – and $s1,$s2,$s3 Conditional branch - beq $s1,$s2, 25 Unconditional branch – j 2500

3.

Show the IEEE 754 binary representation of the number (-0.75)10 in single precision. IEEE 754 representation of (-0.75)10 in single precision format The number (-0.75)10 is represented in binary form as follows: 0 -0.112 × 2 and in normalized scientific notation, it is -1 -1.12 × 2 The general representation for a single precision number is,

 Solved Anna University Question Papers  s

(-1) × (1 + Fraction) × 2







 SQ.71 

(Exponent – 127) -1

Subtracting the bias 127 from the exponent of -1.1two × 2 yields 1

(-1) × (1 + .1000 0000 0000 0000 0000 000two) × 2

(126 – 127)

The single precision binary representation of -0.75ten is then 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 11 1 1 1 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 bit 8 bits 23 bits

4.

Define a datapath in CPU. Data Path : The component of the processor that performs arithmetic operations is called datapath. The datapath performs the arithmetic operations, and control tells the datapath, memory, and I/O devices what to do according to the wishes of the instructions of the program.

5.

What is the ideal CPI of a pipelined processor? Ideal CPI of a pipelined processor is almost 1.

6.

What is meant by exception? Give one examples of MIPS exception. Exception is also called interrupt. It is defined as an unscheduled event that disrupts program execution; used to detect overflow. Example : Arithmetic overflow.

7.

Protein String Matching Code has 4 days execution time on current machine doing integer instructions in 20% of time, doing I/O in 35% of time and other operations in the remaining time. Which is the better tradeoff among the following two proposals? First: Compiler optimization that reduces number of integer instructions by 25% (assume each integer instruction takes the same amount of time); Second: Hardware optimization that reduces the latency of each IO operations from 6µs to 5µ. Second Proposal: Hardware optimization that reduces the latency of each IO operations from 6µs to 5µ is the best, because 35% of the time is consumed by I/O operations whereas integer instructions takes only 20% of the time. Further time take for executing integer instructions is very minimum when compared to time taken for an I/O operation. Hence, the second is the best proposal.

 SQ.72  8.

 Computer Architecture 

Give examples for each class in Flynn’s Classification. Flynn’s Classification Data Streams

Instruction Streams

Single

Single SISD : Intel Pentium 4

Multiple SIMD: SSE Instructions of x86

Multiple

MISD: No examples

MIMD: Intel Core I7

9. Distinguish SARM and DRAM. Static RAM

Dynamic RAM



SRAM uses transistor to store a single bit of data



SRAM does not need periodic refreshment to maintain data

 DRAM uses a separate capacitor to store each bit of data.  DRAM needs periodic refreshment to maintain the charge in the capacitors for data.



SRAM’s structure complex than DRAM



SRAM are expensive as compared to DRAM

 DRAM’s structure is simplex than SRAM.



SRAM are faster than DRAM

 DRAM’s are less expensive as compared to SRAM.



SRAM are used in Cache memory.

 DRAM’s are slower than SRAM.

is

 DRAM are memory.

used

in

Main

10. What is the use of DMA Controller? Direct memory access (DMA) is a method that allows an input/output (I/O) device to send or receive data directly to or from the main memory, bypassing the CPU to speed up memory operations. The process is managed by a chip known as a DMA controller (DMAC). PART – B (5 × 13 = 65 Marks) 11.

a) i)

Consider three different processors P1, P2 and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0 P3 has a 4.0 GHz clock rate and has a CPI of 2.2.

 Solved Anna University Question Papers 

(a)







 SQ.73 

Which processor has the highest performance expressed in instructions per second? CPU execution time can be calculated using the following formula:

CPU Time =

Instruction count  CPI Clock rate

Let the instruction count be ‘n’ and assume it is same for all the three processors. Now calculate the CPU time for each processor. CPU TimeP1 =

n  1.5  n  0.5  109 sec 9 3  10

CPU TimeP2 =

n  1.0  n  0.4  109 sec 9 2.5  10

CPU TimeP3 =

n  2.2  n  0.55  109 sec 9 4  10

CPU time for processor P2 is smallest than other processor. Hence, P2 produces highest performance than other processors. (b)

If the processors each execute a program in10 seconds, find the number of cycles and the number of instructions in each processor. (5)

(i)

Number of clock cycles CPU time for P1, P2 and P3 = 10 sec The formula for finding CPU time can be rewritten to find the number of clock cycles. CPU clock cycles CPU time = Clock rate CPU Clock cycles = CPU time × Clock rate 9

9

CPU Clock cyclesP1 = 10 sec × 3 × 10 Hz = 30 × 10 cycles 9

CPU Clock cyclesP2 = 10 sec × 2.5 × 109 Hz = 25 × 10 cycles CPU Clock cycles (ii)

= 10

× 4 × 10 H = 40 × 10 cycles

Instruction Count

The formula for finding CPU clock cycles can be rewritten to find the instruction count.

 SQ.74 

 Computer Architecture 

CPU clock cycles = Instruction Count × CPI Instruction Count = Instruction count Instruction count Instruction count

CPU clock cycles CPI

30 × 10 = = 20 × 10 instructions 1.5 25 × 10 = = 25 × 10 instructions 1.0 40 × 10 = = 18.18 × 10 instructions 2.2

(ii) Explain in detail the components of a computer system. (8) Refer Section 1.3 (or)

(b) (i) Translate the following C code to MIPS assembly code. Use a minimum number of instructions. Assume that I and k correspond to registers $s3 and $s5 and the base of the array save is in $s6. What is the MIPs assembly code corresponding to this C segment? (5)

while (save[i] = = k) i+1 = 1; The first step is to load save[i] into a temporary register. Before we can load save[i] into a temporary register, we need to have its address. Before we can add i to the base of array save to form the address, we must multiply the index i by 4 due to the byte addressing problem. We can use shift left logical, since shifting left by 2 bits multiplies by 4 We need to add the label Loop to it so that we can branch back to that instruction at the end of the loop: Loop: sll $t1,$s3,2

# Temp reg $t1 = i * 4

To get the address of save[i], we need to add $t1 and the base of save in $s6: add $t1,$t1,$s6

# $t1 = address of save[i]

Now we can use that address to load save[i] into a temporary register: lw $t0,0($t1)

# Temp reg $t0 = save[i]

 Solved Anna University Question Papers 







 SQ.75 

The next instruction performs the loop test, exiting if save[i] ≠ k: bne $t0,$s5, Exit

# go to Exit if save[i] ≠ k

The next instruction adds 1 to i: addi $s3,$s3,1

#i=i+1

The end of the loop branches back to the while test at the top of the loop. We just add the Exit label after it. j Loop

# go to Loop

Exit: ii)

What is an addressing mode in a computer ? Classify MIPS addressing modes and give one examples instruction to each category. (8) Refer Section 2.7

12.

a)

i)

Perform X+Y and Y-X using 2’s complements for given the two binary numbers X = 0000 1011 1110 1111 and Y = 1111 0010 1001 1101. (5)

Perform X+Y: X = 0000 1011 1110 1111 (+) Y = 1111 0010 1001 1101 X+Y = 1111 1010 1000 1100 Perform Y-X: Step -1: Find 2’s complement of X; Step -2 : Add the result with Y; (i)

(ii)

Find 2’s complement of X: X = 0000 1011 1110 1111 1’s complement of X = 1111 0100 0001 0000 2’s complement of X = 1111 0100 0001 0001 Add with y 1111 0100 0001 0001 (+) 1111 0010 1001 1101 1 1111 0110 1010 1110 Y-X = 1110 0110 1010 1110

 SQ.76 

 Computer Architecture 

ii)

Multiply the following signed 2’s compliment numbers using the Booth algorithm. A = 001110 and B = 111001 where A is multiplicand and B is multiplier. (8) A = 001110 (Multiplicand – m) B = 111001 (Multiplier Q) -M = 110010 Step 0 1

M 001110 001110 001110

A 000000 110010 111001

Q 111001 111001 011100

Q-1 0 0 1

2

001110 001110

000111 000011

011100 101110

1 0

3

001110 001110 001110 001110

000111 000011 110101 111010

101110 110111 110111 111011

0 0 0 1

001110 001110 001110 001110

111010 111101 111101 111110

111011 011101 011101 101110

1 1 1 1

4

5 6

Action Initial values Q0 Q-1 = 10; A = A-N Shift Right AQQ-1 Q0 Q-1 = 01; A + M Shift right AQQ-1 Q0 Q-1 = 00; NOP Shift Right AQQ-1 Q0Q-1 = 10; A = A-N Shift Right AQQ-1 Q0 Q1 = 11; NOP Shift Right AQQ-1 Q0 Q-1 = 11; NOP Shift right AQQ-1

Product = AQ Product = 111110 101110 (or) b)

i)

Draw the block diagram of integer divider and explain the division algorithm. (5)

Refer section 3.4.1 ii) (i)

Add the numbers (0.75)10 and (-0.275)10 in binary using the Floating point addition algorithm. (8)

Convert the numbers into binary is (0.75)10 = 0.112 × 20 = 1.1 × 2-1 (-0.275)10 = -0.010001 × 20 = -1.0001 × 2-2

 Solved Anna University Question Papers 

(ii)







 SQ.77 

Equalize exponents: -1.0001 × 2-2 to shifted to right. = -0.10001 × 2-1

(iii) Add significands: 1.10000 (+) -0.10001 0.11111 × 2-1 (iv) Normalization = 1.1111 × 2-2 Sum = 1.1111 × 2-2 13. a) Design a simple datapath with the control unit explain in detail.(13) Refer section 4.4.2 (or) b) Discuss the limitations of pipelining a processor’s datapath. Suggest the methods to overcome them. (13) Pipelining increases the CPU instruction throughput - the number of instructions completed per unit of time. But it does not reduce the execution time of an individual instruction. In fact, it usually slightly increases the execution time of each instruction due to overhead in the pipeline control. The increase in instruction throughput means that a program runs faster and has lower total execution time. Limitations on practical depth of a pipeline arise from:  Pipeline latency. The fact that the execution time of each instruction does not decrease puts limitations on pipeline depth;  Imbalance among pipeline stages. Imbalance among the pipe stages reduces performance since the clock can run no faster than the time needed for the slowest pipeline stage;  Pipeline overhead. Pipeline overhead arises from the combination of pipeline register delay (setup time plus propagation delay) and clock skew.  Once the clock cycle is as small as the sum of the clock skew and latch overhead, no further pipelining is useful, since there is no time left in the cycle for useful work.

 SQ.78  14. a)

 Computer Architecture 

i)

List the limitations of instruction level parallelism. (5) Refer section 5.2.3

ii)

What are the challenges in parallel processing?

(8)

Refer section 5.3 (or) b)

i)

Compare and contrast fine-grained multi-threading, coarse-grained multi-threading and simultaneous multithreading.(9) Refer section 5.5

ii)

Classify shared memory multiprocessor based on the memory access latency. (4) Refer section 5.6.1

15. a)

i)

ii)

What is the need of Cache memory ? List the three mapping methods of Cache memory and explain any two. (10) Refer section 6.4.3 Define virtual memory. What is the advantage of using virtual memory? (3)

Virtual memory is a memory management capability of an OS that uses hardware and software to allow a computer to compensate for physical memory shortages by temporarily transferring data from random access memory (RAM) to disk storage. It allows us to run more applications on the system than we have enough physical memory to support. Virtual memory is simulated memory that is written to a file on the hard drive. That file is often called page file or swap file. It's used by operating systems to simulate physical RAM by using hard disk space. b)

i)

Discuss about computers. (6)

(or) Programmed

I/Os

associated

with

Refer section 7.2 ii)

Write the sequence of operations carried out by a processor when interrupted by a peripheral device connected to it. (7) Refer section 7.4.1

 Solved Anna University Question Papers 







 SQ.79 

PART – C (1 × 15 = 15 Marks) 16. a)

i)

The following sequence of instructions are executed in the basic 5-stage pipelined processor : (15)

or r1, r2, r3 or r2,r1,r4 or r1,r1,r2 a) Indicate dependences and their type. Instructions: I1 : or r1, r2, r3 I2 : or r2, r1, r4 I3 : or r1, r1, r2 Dependencies: 1. True Data dependency: 

2.

Instruction I2 and I1 have true data dependency, because I2 can be executed only when I1 in completed ie, I2 needs r1 which is available only when I1 is completed.  Similar kind of data dependency exists between I1 and I3, I2 and I3. Output dependency:  Instruction I1 and I3 are having output dependency, because I3 cannot be executed before I1. No forwarding: Data hazard occurs as shown below: b) Assume there is no forwarding in this pipelined processor. Indicate hazards and add NOP instructions to eliminate them. I1 : or r1, r2, r3 I2 : or r2, r1, r4 I3 : or r1, r1, r2 CC1

C C2

IF

ID IK

C C3

C C4

C C5 C C6 CC7

MEM WB r1 r1 MEN ID

WB

r2 IK

ID

MEM

 SQ.80 

 Computer Architecture 



As in figure, r1 is available only at CC5, but instructions I3 need r1 at CC3 and CC4. This is a situation of data hazard.  NOP instruction can be inserted to avoid data hazard.  In instruction I2, after instruction fetch (IF) ID stage is delayed until CC5, so that r1 will be available at CC5.  Similar case exist between I2 & I3. I1: or r1, r2, r3 I2 : NOP I3 : NOP I4 : or r2, r1, r4 I5 : NOP I6 : NOP I7 : or r1, r1, r2 c) Assume there is full forwarding. Indicate hazards and add NOP instructions to eliminate them. Forwarding: I1 : or r1, r2, r3 I2 : or r2, r1, r4 I3 : or r1, r1, r2 CC1

C C2

IF

ID IF

C C3

C C5

MEM r1

WB MEN r2

ID IF

  

C C4

ID

C C6

CC7

WB MEM

WB

When forwarding in applied, there is no need of NOP instruction. R1 is forwarded to instructions I2 & I3 as shown in fig at CC3. Similarly, r2 in forwarded to I3 at CC4. (or)

b)

Explain the detail of DMA control with suitable diagrams. Discuss how it improve the overall performance of the system. (15) Refer section 7.3

www.Vidyarthiplus.com

VIDYARTHIPLUS - ANNA UNIVERSITY ONLINE STUDENTS COMMUNITY Department of Computer Science and Engineering CS6303 COMPUTER ARCHITECTURE Question Bank Year :II year (2014-15) ODD SEMESTER UNIT-1OVERVIEW AND INSTRUCTIONS PART- A

1. What are the eight great ideas in computer architecture? 2. What are the five classic components of a computer? 3. What is the function of data path and control path? 4. What is instruction set Architecture? 5. Define application binary interface 6. Differentiate DRAM and SRAM. 7. Compare Volatile and nonvolatile memory. 8. List the advantages of Network Computer. 9. Define VLSI 10. Differentiate Throughput and Response Time 11. Write the CPU performance equation. 12. If computer A runs a program in 10 seconds, and computer B runs the same program in 15 seconds, how much faster is A over B. 13. Write the formula for CPU execution time for a program 14. Write the formula for CPU clock cycles required for a program. 15. How will you measure the dynamic power dissipppation? 16. Define – Stored Program Concepts 17. What are the fields in an MIPS instruction? 18. List the advantages of multiprocessor over uniprocessor. 19. What are the different types ofoperands? Give examples 20. List the different addressing modes PART-B 1. i)Discuss in detail about Eight great ideas of computer Architecture.(8) ii) Explain in detail about Technologies for Building Processors and Memory (8) 2. Explain the various components of computer System with neat diagram (16) 3. Discuss in detail the various measures of performance of a computer(16) 4. Define Addressing mode and explain the basic addressing modes with an example for each. 5. Explain operations and operands of computer Hardware in detail (16) 6. i)Discuss the Logical operations and control operations of computer (12) ii)Write short notes on Power wall(6) 7. Consider three diff erent processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2. a. Which processor has the highest performance expressed in instructions per second? b. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions. c. We are trying to reduce the execution time by 30% but this leads to an increase

www.Vidyarthiplus.com

www.Vidyarthiplus.com

of 20% in the CPI. What clock rate should we have to get this time reduction? 8.Assume a program requires the execution of 50 × 106 FP instructions,110 × 106 INT instructions, 80 × 106 L/S instructions, and 16 × 106 branchinstructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively.Assume that the processor has a 2 GHz clock rate. a. By how much must we improve the CPI of FP instructions ifwe want the program to run two times faster? b. By how much must we improve the CPI of L/S instructionsif we want the program to run two times faster? c. By how much is the execution time of the program improvedif the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%? 9.Explain Branching operations with example 10.Explain the following addressing modes in detail with diagram i)Immediate addressing ii)Register addressing, iii)Baseor displacement addressing, iv)PC-relative addressing v)Pseudodirect addressing UNIT-II ARITHMETIC OPERATIONS PART-A 1. Add 610 to 710 in binary and Subtract 610 from 710 in binary 2. Write the overflow conditions for addition and subtraction. 3. Draw the Multiplication hardware diagram 4. List the steps of multiplication algorithm 5. What is fast multiplication? 6. List the steps of division algorithm 7. What is scientific notation and normalization? Give an example 8. Give the representation of single precision floating point number 9. Define overflow and under flow with examples 10. Give the representation of double precision floating point number 11. What are the floating point instructions in MIPS? 12. What are the steps of floating point addition? 13. List the steps of floating point multiplication 14. Define – Guard and Round 15. Write the IEEE 754 floating point format. 16. What is meant by sub-word parallelism? 17. Multiply 100010 * 100110. 18. Divide 1,001,010ten by 1000ten. 19.For the following C statement, what is the corresponding MIPS assembly code? f = g + (h − 5) 20.For the following MIPS assembly instructions above, what is a corresponding C statement? add f, g, h add f, i, f

www.Vidyarthiplus.com

www.Vidyarthiplus.com

PART- B 1. 2. 3. 4. 5.

Explain the Multiplication algorithm in detail with diagram and examples Discuss in detail about division algorithm in detail with diagram and examples Explain in detail about floating point addition with example Explain in detail about floating point multiplication Give the algorithm for multiplication of signed 2’s complement numbers and illustrate with an example 6. Multiply the following pair of signed 2’s complement numbers : A = 010111, B = 101100. 7. Add the numbers 0.510 and -0.437510using binary Floating point Addition algorithm 8. Multiply 1.10 10X 1010and 9.200X10-5 using binary Floating point multiplication 9.Calculate the division of A and B A : 3.264 X 103 B: 6.52 X 102 10.Show the IEEE 754 binary representation of the number -0.75 10in single and double precision UNIT III PROCESSOR AND CONTROL UNIT PART-A 1. What is meant by data path element? 2. What is the use of PC register? 3. What is meant by register file? 4. What are the two state elements needed to store and access an instruction? 5. Draw the diagram of portion of datapath used for fetching instruction. 6. Define – Sign Extend and Vector interupts 7. What is meant by branch target address? 8. Differentiate branch taken from branch not taken. 9. What is meant by delayed branch? 10. Write the instruction format for the jump instruction. 11. What are hazards? Write its types. 12. What is meant by forwarding? 13. What is pipeline stall? 14. What is meant by branch prediction? 15. What are the 5 pipeline stages? 16. What are exceptions and interrupts? 17. What is meant by pipelining? 18. What are the five steps in MIPS instruction execution? 19. What are the three instruction classes and their instruction formats? 20. Write the formula for calculating time between instructions in a pipelined processor. PART B 1. 2. 3. 4.

Explain the basic MIPS implementation of instruction set Explain the basic MIPS implementation with necessary multiplexers and control lines What is control hazards ?Explain the methods for dealing with the control hazards. Discuss the influence of pipelining in detail

www.Vidyarthiplus.com

www.Vidyarthiplus.com

5. Explain how the instruction pipeline works. What are the various situations where an instruction pipeline can stall? What can be its resolution? 6. What is data hazard? How do you overcome it?What are its side effects? 7. Discuss the data and control path methods in pipelining 8. Explain dynamic branch prediction 9. How exceptions are handled in MIPS 10. Explain in detail about building a datapath 11. Explain in detail about control implementation scheme

UNIT IVPARALLELISAM PART-A 1. What is meant by ILP? 2. What is multiple issue? Write any two approaches. 3. What is meant by speculation? 4. Define – Static Multiple Issue 5. Define – Issue Slots and Issue Packet 6. Define – VLIW 7. Define – Superscalar Processor 8. What is meant by loop unrolling? 9. What is meant by anti-dependence? How is it removed? 10. Differentiate in-order execution from out-of-order execution. 11. What is meant by hardware multithreading? 12. What are the two main approaches to hardware multithreading? 13. What is SMT? 14. Compare SMT from hardware multithreading. 15. What are the three multithreading options? 16. Define – SMP 17. Differentiate UMA from NUMA. 18. What is a multicore microprocessor? 19. What is a parallel processing program? 20. Define a cluster PART- B 1. Explain Instruction level parallelism 2. Explain the difficulties faced by parallel processing programs 3. Explain shared memory multiprocessor 4. Explain in detail Flynn’s classification of parallel hardware 5. Explain cluster and other Message passing Multiprocessor 6. Explain in detail hardware Multithreading 7. Explain SISD and MIMD 8. Explain SIMD and SPMD 9. Explain Multicore processors 10. Explain the different types of multithreading

www.Vidyarthiplus.com

www.Vidyarthiplus.com

UNIT VMEMORY AND I/O SYSTEM PART-A 1. 2. 3. 4. 5. 6. 7. 8.

What are the temporal and spatial localities of references? Write the structure of memory hierarchy What are the various memory technologies? Differentiate SRAM from DRAM. What is flash memory ? Define − Rotational Latency What is direct-mapped cache? Consider a cache with 64 blocks and a block size of 16 bytes. To what block number does byte address 1200 map? 9. How many total bits are required for a direct-mapped cache with 16 KB of data and 4-word blocks, assuming a 32-bit address? 10. What are the writing strategies in cache memory? 11. What are the steps to be taken in an instruction cache miss? 12. Define – AMAT 13. What are the various block placement schemes in cache memory? 14. Define – MTTF and AFR 15. Define – Availability 16. What are the three ways to improve MTTF? 17. Define – TLB 18. What is meant by virtual memory? 19. Differentiate physical address from logical address. 20. What is meant by address mapping?

PART- B 1. Explain in detail about memory Technologies 2. Expain in detail about memory Hierarchy with neat diagram 3. Describe the basic operations of cache in detail with diagram 4. Discuss the various mapping schemes used in cache design(10) A byte addressable computer has a small data cache capable of holding eight 32-bit words. Each cache block contains 132-bit word. When a given program is executed, the processor reads data from the following sequence of hex addresses – 200, 204, 208, 20C, 2F4, 2F0, 200, 204,218, 21C, 24C, 2F4. The pattern is repeated four times. Assuming that the cache is initially empty, show the contents of the cache at the end of each pass, and compute the hit rate for a direct mapped cache. (6) 5. Discuss the methods used to measure and improve the performance of the cache. 6. Explain the virtual memory address translation and TLB withnecessary diagram. 7. Draw the typical block diagram of a DMA controller and explain how it is used for direct data transfer between memory and peripherals. 8. Explain in detail about interrupts with diagram 9. Describe in detail about programmedInput/Output with neat diagram 10.Explain in detail about I/O processor.

www.Vidyarthiplus.com

Related Documents

Ca 2 Marks & Big Ques.pdf
December 2019 12
2 Marks
May 2020 3
Marks
December 2019 36
Marks
November 2019 34
Marks
November 2019 34
Marks
June 2020 16

More Documents from ""

Cse_c_06_26 01 2019.docx
December 2019 18
Excercise At Home.docx
December 2019 29
Db And Table Creation.docx
December 2019 17
Ca 2 Marks & Big Ques.pdf
December 2019 12
07_chapter2.pdf
May 2020 8