Computer Architecture (kiến Trúc Máy Tính)

  • Uploaded by: nguyenthanhkien
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Computer Architecture (kiến Trúc Máy Tính) as PDF for free.

More details

  • Words: 14,096
  • Pages: 249
Computer Architecture This course for using in HEDSPI Project

Nguyen Thanh Kien Department of Computer Engineering Faculty of Information Technology Hanoi University of Technology

Acknowledge  These materials are used as reference for this slide: – “Computer Architecture, Background and Motivation” slides of UCSB, B.Parhami. – Computer architecture - Behrooz Parhami – Computer organization and design - John L. Hennessy & David A. Patterson

Page  2

© NTK 2009

Reference  Text Books: – Computer architecture - Behrooz Parhami – Computer organization and design - John L. Hennessy & David A. Patterson

 Reference books – Computer Architecture and Organization - John P. Hayes – Computer Organization and Architecture – Designing for Performance William Stallings – Computer Architecture: Single and Parallel Systems - Mehdi R. Zargham – Assembly Language Programming and Organization of the IBM-PC Ytha Yu, Charles Marut Page  3

© NTK 2009

About  Lecturer: – Nguyen Thanh Kien – Department of Computer Engineering, Faculty of Information Technology, Hanoi University of Technology – Mobile: +84 983588135 – Email: [email protected] [email protected] – Address: • Room 322, C1, Hanoi University of Technology • No.1, Dai Co Viet, Hai Ba Trung, Hanoi.

Page  4

© NTK 2009

Content 1. Introduction - Computer system technology and Computer Performance 2. Instruction Set Architecture 3. Arithmetic for Computer 4. CPU Organization

Page  5

© NTK 2009

Content 1. Introduction - Computer system technology and Computer Performance 2. Instruction Set Architecture 3. Arithmetic for Computer 4. CPU Organization

Page  6

© NTK 2009

Content 1. Introduction - Computer system technology and Computer Performance 2. Instruction Set Architecture 3. Arithmetic for Computer 4. CPU Organization

Page  7

© NTK 2009

2. Instruction Set Architecture  2.1. Instructions and addressing  2.2. Procedures and Data

Page  8

© NTK 2009

2.1. Instructions and addressing 2.1.1 Abstract View of Hardware 2.1.2 Instruction Formats 2.1.3 Simple Arithmetic / Logic Instructions 2.1.4 Load and Store Instructions 2.1.5 Jump and Branch Instructions 2.1.6 Addressing Modes

Page  9

© NTK 2009

2.1. Instructions and addressing 2.1.1 Abstract View of Hardware 2.1.2 Instruction Formats 2.1.3 Simple Arithmetic / Logic Instructions 2.1.4 Load and Store Instructions 2.1.5 Jump and Branch Instructions 2.1.6 Addressing Modes

Page  10

© NTK 2009

2.1.1 Abstract View of Hardware ... Loc 0 Loc 4 Loc 8 4 B / location

m ≤ 2 32

Memory

up to 2 30 words ...

EIU (Main proc.)

$0 $1 $2

$31

ALU

Execution & integer unit

Chapter 11

(Coproc. 1)

Integer mul/div Hi

Chapter 10

FPU

Lo

Chapter 12

FP arith

$0 $1 $2

Loc Loc m−8 m−4

Floatingpoint unit

$31

TMU

BadVaddr Trap & (Coproc. 0) Status memory Cause unit EPC

Memory and processing subsystems for MiniMIPS. Page  11

© NTK 2009

Data Types

Byte =Byte 8 bits

Halfword= 2 bytes Halfword Word =Word 4 bytes Doubleword = 8 bytes Doubleword

MiniMIPS registers hold 32-bit (4-byte) words. Other common data sizes include byte, halfword, and doubleword.

Page  12

© NTK 2009

$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $20 $21 $22 $23 $24 $25 $26 $27 $28 $29 $30 $31 Page  13

 0   

$zero $at Reserved for assembler use $v0 Procedure results $v1 $a0 Procedure $a1 Saved arguments $a2 $a3 $t0 $t1 $t2 Temporary $t3 values $t4 $t5 $t6 $t7 $s0 $s1 Saved $s2 across $s3 Operands procedure $s4 calls $s5 $s6 $s7 More $t8 temporaries $t9 $k0 Reserved for OS (kernel) $k1 $gp Global pointer $sp Stack pointer Saved $fp Frame pointer $ra Return address

A 4- byte word sits in consecutive memory addresses according to the big- endian order (most significant byte has the lowest address) Byte numbering:

© NTK 2009

3

2

3 2 1 0

1

Register Conventions

0

When loading a byte into a register, it goes in the low end Byte Word Doubleword

A doubleword sits in consecutive registers or memory locations according to the big- endian order (most significant word comes first)

Registers and data sizes in MiniMIPS.

Registers Used in This Chapter 10 temporary registers

8 operand registers

Page  14

© NTK 2009

2.1.2 Instruction Formats High-level language statement:

a = b + c

Assembly language instruction:

add $t8, $s2, $s1

Machine language instruction:

000000 10010 10001 11000 00000 100000 ALU-type Register Register Register Addition Unused instruction 18 17 24 opcode

Instruction cache P C

Register file

Data cache (not used)

$17 $18

Instruction fetch

Register readout

Register file

ALU $24

Operation

Data read/store

Register writeback

A typical instruction for MiniMIPS and steps in its execution. Page  15

© NTK 2009

Add, Subtract, and Specification of Constants MiniMIPS add & subtract instructions; e.g., compute: g = (b + c) − (e + f) add add sub

$t8,$s2,$s3 $t9,$s5,$s6 $s7,$t8,$t9

# put the sum b + c in $t8 # put the sum e + f in $t9 # set g to ($t8) − ($t9)

Decimal and hex constants Decimal Hexadecimal

25, 123456, −2873 0x59, 0x12b4c6, 0xffff0000

Machine instruction typically contains an opcode one or more source operands possibly a destination operand Page  16

© NTK 2009

MiniMIPS Instruction Formats

R

I

J

31

31

31

op

25

rs

20

rt

15

6 bits

5 bits

5 bits

Opcode

Source register 1

Source register 2

op

25

rs

20

rt

rd

10

5 bits Destination register 15

sh

fn

5

5 bits

6 bits

Shift amount

Opcode extension

operand / offset

6 bits

5 bits

5 bits

16 bits

Opcode

Source or base

Destination or data

Immediate operand or address offset

op

25

jump target address

0

0

6 bits

1 0 0 0 0 0 0 0 0 0 0 0 26 0 bits 0 0 0 0 0 0 0 1 1 1 1 0 1

Opcode

Memory word address (byte address divided by 4)

MiniMIPS instructions come in only three formats: register (R), immediate (I), and jump (J). Page  17

0

© NTK 2009

2.1.3 Simple Arithmetic/Logic Instructions Add and subtract already discussed; logical instructions are similar add sub and or xor nor

R

31

$t0,$s0,$s1 $t0,$s0,$s1 $t0,$s0,$s1 $t0,$s0,$s1 $t0,$s0,$s1 $t0,$s0,$s1

op

25

rs

# # # # # #

20

rt

set set set set set set 15

$t0 $t0 $t0 $t0 $t0 $t0 rd

to to to to to to 10

($s0)+($s1) ($s0)-($s1) ($s0)∧($s1) ($s0)∨($s1) ($s0)⊕($s1) (($s0)∨($s1))′ sh

5

fn

0

0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 x 0 ALU instruction

Source register 1

Source register 2

Destination register

Unused

add = 32 sub = 34

The arithmetic instructions add and sub have a format that is common to all two-operand ALU instructions. For these, the fn field specifies the arithmetic/logic operation to be performed. Page  18

© NTK 2009

Arithmetic/Logic with One Immediate Operand An operand in the range [−32 768, 32 767], or [0x0000, 0xffff], can be specified in the immediate field. addi andi ori xori

$t0,$s0,61 $t0,$s0,61 $t0,$s0,61 $t0,$s0,0x00ff

# # # #

set set set set

$t0 $t0 $t0 $t0

to to to to

($s0)+61 ($s0)∧61 ($s0)∨61 ($s0)⊕ 0x00ff

For arithmetic instructions, the immediate operand is sign-extended

I

31

op

25

rs

20

rt

15

operand / offset

0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 Errors 0 1 addi = 8 Source Destination Immediate operand

Instructions such as addi allow us to perform an arithmetic or logic operation for which one operand is a small constant.

Page  19

0

© NTK 2009

2.1.4 Load and Store Instructions I

31

op

25

rs

20

rt

15

operand / offset

0

1 0 x 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 lw = 35 sw = 43

Memory A[0] A[1] A[2] . . .

A[i]

Base register

lw lw

Data register

$t0,40($s3) $t0,A($s3) Address in base register

Offset = 4i Element i of array A

Offset relative to base Note on base and offset: The memory address is the sum of (rs) and an immediate value. Calling one of these the base and the other the offset is quite arbitrary. It would make perfect sense to interpret the address A($s3) as having the base A and the offset ($s3). However, a 16-bit base confines us to a small portion of memory space.

MiniMIPS lw and sw instructions and their memory addressing convention that allows for simple access to array elements via a base address and an offset (offset = 4i leads us to the i th word). Page  20

© NTK 2009

lw, sw, and lui Instructions lw sw $s3” lui

I

31

$t0,40($s3) $t0,A($s3)

# load mem[40+($s3)] in $t0 # store ($t0) in mem[A+($s3)] # “($s3)” means “content of

$s0,61

# The immediate value 61 is # loaded in upper half of $s0 # with lower 16b set to 0s

op

25

rs

20

rt

15

operand / offset

0

0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 lui = 15

Unused

Destination

Immediate operand

0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Content of $s0 after the instruction is executed

The lui instruction allows us to load an arbitrary 16-bit value into the upper half of a register while setting its lower half to 0s. Page  21

© NTK 2009

Initializing a Register Example Show how each of these bit patterns can be loaded into $s0: 0010 0001 0001 0000 0000 0000 0011 1101 1111 1111 1111 1111 1111 1111 1111 1111 Solution The first bit pattern has the hex representation: 0x2110003d lui ori

$s0,0x2110 # put the upper half in $s0 $s0, $s0, 0x003d# put the lower half in $s0

Same can be done, with immediate values changed to 0xffff for the second bit pattern. But, the following is simpler and faster: nor Page  22

$s0,$zero,$zero # because (0 ∨ 0)′ = 1 © NTK 2009

2.1.5 Jump and Branch Instructions Unconditional jump and jump through register instructions j jr

verify $ra

J

31

op

# go to mem loc named “verify” # go to address that is in $ra; # $ra may hold a return address jump target address

25

0 0 0 0 1 0

0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1

j=2

x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 From PC

R

31

op

Effective target address (32 bits) 25

rs

20

rt

15

rd

10

sh

5

fn

0

0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ALU instruction

Source register

Unused

Unused

Unused

jr = 8

The jump instruction j of MiniMIPS is a J-type instruction which is shown along with how its effective target address is obtained. The jump register (jr) instruction is R-type, with its specified register often being $ra. Page  23

© NTK 2009

Conditional Branch Instructions Conditional branches use PC-relative addressing bltz $s1,L beq $s1,$s2,L bne $s1,$s2,L

I

31

op

25

20

rt

15

operand / offset

31

op

Source 25

rs

Zero 20

rt

Relative branch distance in words 15

operand / offset

0

0 0 0 1 0 x 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 beq = 4 bne = 5

Source 1

Source 2

Relative branch distance in words

Conditional branch instructions of MiniMIPS. Page  24

0

0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 bltz = 1

I

rs

# branch on ($s1)< 0 # branch on ($s1)=($s2) # branch on ($s1)≠($s2)

© NTK 2009

Comparison Instructions for Conditional Branching slt

$s1,$s2,$s3

slti

$s1,$s2,61

R

31

op

25

20

if ($s2)<($s3), set $s1 to 1 else set $s1 to 0; often followed by beq/bne if ($s2)<61, set $s1 to 1 else set $s1 to 0

rt

15

rd

10

sh

31

op

fn

0

Source 1 register 25

rs

Source 2 register 20

rt

Destination

15

Unused

operand / offset

slt = 42

0

0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 slti = 10

Source

Destination

Immediate operand

Comparison instructions of MiniMIPS. Page  25

5

0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 ALU instruction

I

rs

# # # # #

© NTK 2009

Examples for Conditional Branching If the branch target is too far to be reachable with a 16-bit offset (rare occurrence), the assembler automatically replaces the branch instruction beq $s0,$s1,L1 with: bne j L2: ...

$s1,$s2,L2 L1

# skip jump if (s1)≠(s2) # goto L1 if (s1)=(s2)

Forming if-then constructs; e.g., if (i == j) x = x + y bne $s1,$s2,endif add $t1,$t1,$t2 endif: ...

# branch on i≠j # execute the “then” part

If the condition were (i < j), we would change the first line to: slt beq

$t0,$s1,$s2 $t0,$0,endif

# set $t0 to 1 if i<j # branch if ($t0)=0; # i.e., i not< j or i≥j

  Page  26

© NTK 2009

if-then-else Statements Example Show a sequence of MiniMIPS instructions corresponding to: if (i<=j) x = x+1; z = 1; else y = y–1; z = 2*z Solution

slt bne addi addi j else: addi add endif:...

$t0,$s2,$s1 $t0,$zero,else $t1,$t1,1 $t3,$zero,1 endif $t2,$t2,-1 $t3,$t3,$t3

# # # # # # #

  Page  27

© NTK 2009

j
while Statements Example The simple while loop: while (A[i]==k) i=i+1; Assuming that: i, A, k are stored in $s1,$s2,$s3 Solution loop: add

$t1,$s1,$s1 # t1 = 4*i

add

$t1,$t1,$t1 #

add

$t1,$t1,$s2 # t1 = A + 4*i

lw

$t0,0($t1)

# t0 = A[i]

bne

$t0,$s3,endwhl

#

addi $s1,$s1,1

#

j

#

loop

endwhl: … Page  28

# © NTK 2009

switch Statements Example The simple switch switch(test) { case 0: a=a+1; break; case 1: a=a-1; break; case 2: b=2*b; break; default: }

Assuming that: test,a,b are stored in $s1,$s2,$s3 Page  29

beq beq beq b case_0: addi b case_1: sub b case_2: add b default: continue: © NTK 2009

s1,t0,case_0 s1,t1,case_1 s1,t2,case_2 default s2,s2,1 continue

#a=a+1

s2,s2,t1 continue

#a=a-1

s3,s3,s3 continue

#b=2*b

2.1.6 Addressing Modes Addressing mode is the method by which the location of an operand is specified within an instruction. MiniMIPS uses sixs addr modes: Addressing

Instruction

Other elements involved

Extend, if required

Immediate Reg spec

Register

Reg file

Constant offset

Base Reg base

Reg file

Reg data

Constant offset

Page  30

addi, ori… Reg data

Mem Add addr

Mem Add addr

PC

Pseudodirect

jal

Some place in the machine

Implied

PC-relative

Operand

PC

lw, sw…

Mem Memory data

Mem Memory data

branch instr

Mem addr Memory Mem data © NTK 2009

Schematic representation of addressing modes in MiniMIPS.

j

Finding the Maximum Value in a List of Integers Example List A is stored in memory beginning at the address given in $s1. List length is given in $s2. Find the largest integer in the list and copy it into $t0. Solution

 

lw $t0,0($s1) # initialize maximum to A[0] addi $t1,$zero,0 # initialize index i to 0 loop: add $t1,$t1,1 # increment index i by 1 beq $t1,$s2,done # if all elements examined, quit add $t2,$t1,$t1 # compute 2i in $t2 add $t2,$t2,$t2 # compute 4i in $t2 add $t2,$t2,$s1 # form address of A[i] in $t2 lw $t3,0($t2) # load value of A[i] into $t3 slt $t4,$t0,$t3 # maximum < A[i]? beq $t4,$zero,loop # if not, repeat with no change addi $t0,$t3,0 # if so, A[i] is the new maximum j loop # change completed; now repeat # continuation of the program Page  31done: ... © NTK 2009

The 20 MiniMIPS Instructions Covered So Far

Copy Arithmetic

Logic

Memory access Control transfer Table 5.1

Page  32

Instruction

Usage

Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch less than 0 Branch equal Branch not equal

lui add sub slt addi slti and or xor nor andi ori xori lw sw j jr bltz beq bne

© NTK 2009

rt,imm rd,rs,rt rd,rs,rt rd,rs,rt rt,rs,imm rd,rs,imm rd,rs,rt rd,rs,rt rd,rs,rt rd,rs,rt rt,rs,imm rt,rs,imm rt,rs,imm rt,imm(rs) rt,imm(rs) L rs rs,L rs,rt,L rs,rt,L

op fn 15 0 0 0 8 10 0 0 0 0 12 13 14 35 43 2 0 1 4 5

32 34 42 36 37 38 39

8

2. Instruction Set Architecture  2.1. Instructions and addressing  2.2. Procedures and Data

Page  33

© NTK 2009

2.2. Procedures and Data  2.2.1 Simple Procedure Calls  2.2.2 Using the Stack for Data Storage  2.2.3 Parameters and Results  2.2.4 Data Types  2.2.5 Arrays and Pointers  2.2.6 Additional Instructions

Page  34

© NTK 2009

2.2.1 Simple Procedure Calls #Laboratory Exercise 4, Home Assignment 1 #include .text .set noreorder .globl start .ent start start: li a0,-45 #load input parameter jal abs #jum and link to abs procedure nop .end start .ent abs abs: sub v0,zero,a0 #put -(a0) in v0; in case (a0)<0 bltz a0,done nop add v0,a0,zero done: jr ra .end abs

Page  35

#if (a0)<0 then done #else put (a0) in v0

© NTK 2009

2.2.1 Simple Procedure Calls  A procedure is: a subprogram that when called (initiated, invoked) performs a specific task, perhaps leading to one or more results, based on the input parameters (arguments) with which it is provided and returns to the point of call, having perturbed nothing else.  In assembly language, a procedure is associated with a symbolic name that denotes its starting address. The jal instruction in MIPS is intended specifically for procedure calls: – it performs the control transfer (unconditional jump) to the starting address of the procedure, – while also saving the return address in register $ra. Page  36

© NTK 2009

Illustrating a Procedure Call main

PC

jal

proc

Prepare to call Prepare to continue

proc Save, etc.

Restore jr

$ra

Relationship between the main program and a procedure.

Page  37

© NTK 2009

2.2.1 Simple Procedure Calls Using a procedure involves the following sequence of actions: 1. 2. 3. 4. 5. 6.

Put arguments in places known to procedure (reg’s $a0-$a3) Transfer control to procedure, saving the return address (jal) Acquire storage space, if required, for use by the procedure Perform the desired task Put results in places known to calling program (reg’s $v0-$v1) Return control to calling point (jr)

MiniMIPS instructions for procedure call and return from procedure:

Page  38

jal

proc

# jump to loc “proc” and link; # “link” means “save the return # address” (PC)+4 in $ra ($31)

jr

rs

# go to loc addressed by rs © NTK 2009

$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $20 $21 $22 $23 $24 $25 $26 $27 $28 $29 $30 $31 Page  39

 0   

$zero $at $v0 $v1 $a0 $a1 $a2 $a3 $t0 $t1 $t2 $t3 $t4 $t5 $t6 $t7 $s0 $s1 $s2 $s3 $s4 $s5 $s6 $s7 $t8 $t9 $k0 $k1 $gp $sp $fp $ra

A 4-b yte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address)

Reserved for assembler use Procedure results Procedure arguments

Saved

Byte numbering:

3

2

3Recalling Register 2 Conventions 1 0

1

0

When loading a byte into a register, it goes in the low end Byte

Temporary values

Word Doublew ord

Operands

Saved across procedure calls A doubleword sits in consecutive registers or memory locations according to the big-endian order (most significant word comes first)

More temporaries Reserved for OS (kernel) Global pointer Stack pointer Frame pointer Return address

Saved

© NTK 2009

Registers and data sizes in MiniMIPS.

A Simple MiniMIPS Procedure Example Procedure to find the absolute value of an integer. $v0 ←

|($a0)|

Solution The absolute value of x is –x if x < 0 and x otherwise. abs: sub

$v0,$zero,$a0

bltz $a0,done add $v0,$a0,$zero done: jr $ra

# # # # #

put -($a0) in $v0; in case ($a0) < 0 if ($a0)<0 then done else put ($a0) in $v0 return to calling program 

In practice, we seldom use such short procedures because of the overhead that they entail. In this example, we have 3-4 instructions of overhead for 3 instructions of useful computation.   

Page  40

© NTK 2009

Nested Procedure Calls main

PC

jal

abc

Prepare to call Prepare to continue

abc

Procedure abc Save xyz jal

Text version is incorrect

xyz

Restore jr

$ra

Example of nested procedure calls.

Page  41

Procedure xyz

© NTK 2009

jr

$ra

2.2.2 Using the Stack for Data Storage

sp

b a

Push c

sp

c b a

Pop x

sp sp = sp – 4 mem[sp] = c

b a

x = mem[sp] sp = sp + 4

Effects of push and pop operations on a stack. push: addi sw Page  42

$sp,$sp,-4 $t4,0($sp)

pop: lw addi © NTK 2009

$t5,0($sp) $sp,$sp,4

Memory Map in MiniMIPS Hex address

00000000 00400000

10000000

Addressable with 16-bit signed offset

$28 $29 $30

10008000 1000ffff

Reserved

1 M words

Program

Text segment 63 M words

Static data Data segment Dynamic data

$gp

448 M words $sp

$fp

80000000

Stack

Stack segment

7ffffffc

Second half of address space reserved for memory-mapped I/O Page  43

© NTK 2009

Overview of the memory address space in MiniMIPS.

2.2.3 Parameters and Results Stack allows us to pass/return an arbitrary number of values $sp Local variables

z y .. .

Saved registers $sp

$fp

c b a .. .

Frame for current procedure

Old ($fp)

c b a .. .

$fp Before calling

After calling

Use of the stack by a procedure. Page  44

© NTK 2009

Frame for current procedure

Frame for previous procedure

Example of Using the Stack Saving $fp, $ra, and $s0 onto the stack and restoring them at the end of the procedure

$sp $sp $fp $fp

Page  45

proc: sw addi addi sw sw . ($s0) . ($ra) . ($fp) lw lw addi lw jr

$fp,-4($sp) $fp,$sp,0 $sp,$sp,–12 $ra,-8($fp) $s0,-12($fp)

# # # # #

save the old frame pointer save ($sp) into $fp create 3 spaces on top of stack save ($ra) in 2nd stack element save ($s0) in top stack element

$s0,-12($fp) $ra,-8($fp) $sp,$fp, 0 $fp,-4($sp) $ra

# # # # #

put top stack element in $s0 put 2nd stack element in $ra restore $sp to original state restore $fp to original state return from procedure

© NTK 2009

2.2.4 Data Types

Data size (number of bits), data type (meaning assigned to bits) Signed integer: Unsigned integer: Floating-point number: Bit string:

byte byte byte

word word word word

doubleword doubleword

Converting from one size to another Type

Page  46

8-bit number Value

32-bit version of the number

Unsigned 0010 1011 Unsigned 1010 1011

43 171

0000 0000 0000 0000 0000 0000 0010 1011 0000 0000 0000 0000 0000 0000 1010 1011

Signed Signed

+43 –85

0000 0000 0000 0000 0000 0000 0010 1011 1111 1111 1111 1111 1111 1111 1010 1011

0010 1011 1010 1011

© NTK 2009

ASCII Characters ASCII (American standard code for information interchange) 3

4

6

7

SP

0

@

P

`

p

DC1

!

1

A

Q

a

q

STX

DC2



2

B

R

b

r

ETX

DC3

#

3

C

S

c

s

EOT

DC4

$

4

D

T

d

t

5

ENQ

NAK

%

5

E

U

e

u

6

ACK

SYN

&

6

F

V

f

v

7

BEL

ETB



7

G

W

g

w

8

BS

CAN

(

8

H

X

h

x

9

HT

EM

)

9

I

Y

i

y

a

LF

SUB

*

:

J

Z

j

z

b

VT

ESC

+

;

K

[

k

{

c

FF

FS

,

<

L

\

l

|

d

CR

GS

-

=

M

]

m

}

e

SO

RS

.

>

N

^

n

~

f

SI

US

/

?

O

_

o

DEL

0 1 2 3 4

Page  47

0 NUL f

1

2

DLE

SOH

5

© NTK 2009

8-9

a-

More

More

controls

symbols

8-bit ASCII code (col #, row #)hex e.g., code for + is (2b) hex or (0010 1011)two

Loading and Storing Bytes Bytes can be used to store ASCII characters or small integers. MiniMIPS addresses refer to bytes, but registers hold words.

I

31

lb

$t0,8($s3)

lbu

$t0,8($s3)

sb

$t0,A($s3) op

25

rs

# # # # # 20

rt

load rt with mem[8+($s3)] sign-extend to fill reg load rt with mem[8+($s3)] zero-extend to fill reg LSB of rt to mem[A+($s3)] 15

immediate / offset

1 0 x x 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 lb = 32 lbu = 36 sb = 40

Base register

Data register

Address offset

Load and store instructions for byte-size data elements. Page  48

0

© NTK 2009

Meaning of a Word in Memory Bit pattern (02114020) hex

0000 0010 0001 0001 0100 0000 0010 0000

00000010000100010100000000100000 Add instruction 00000010000100010100000000100000 Positive integer 00000010000100010100000000100000 Four-character string

A 32-bit word has no inherent meaning and can be interpreted in a number of equally valid ways in the absence of other cues (e.g., context) for the intended meaning. Page  49

© NTK 2009

2.2.5 Arrays and Pointers Index: Use a register that holds the index i and increment the register in each step to effect moving from element i of the list to element i + 1 Pointer: Use a register that points to (holds the address of) the list element being examined and update it in each step to point to the next element Array index i

Add 1 to i; Compute 4i; Add 4i to base

Base

Array A

A[i] A[i + 1]

Pointer to A[i]

Add 4 to get the address of A[i + 1]

Array A

A[i] A[i + 1]

Stepping through the elements of an array using the indexing method and the pointer updating method. Page  50

© NTK 2009

Selection Sort To sort a list of numbers, repeatedly perform the following: Find the max element, swap it with the last item, move up the “last” pointer A

A

first

first max

A first

x

y

last last

last

Start of iteration

y

x

Maximum identified

End of iteration

One iteration of selection sort. Page  51

© NTK 2009

Selection Sort Using the Procedure max A

A

first

Inputs to proc max

first

In $a0

max

In $v1

In $a1

y Outputs from proc max last

last

Start of iteration

Page  52

x

In $v0 last

sort: beq jal lw sw sw addi element j done: ...

A first

y

x

Maximum identified

End of iteration

$a0,$a1,done max $t0,0($a1) $t0,0($v0) $v1,0($a1) $a1,$a1,-4

# # # # # #

single-element list is sorted call the max procedure load last element into $t0 copy the last element to max loc copy max value to last element decrement pointer to last

sort

# repeat sort for smaller list # continue with rest of program © NTK 2009

2.2.6 Additional Instructions MiniMIPS instructions for multiplication and division: mult div

$s0, $s1 $s0, $s1

mfhi mflo

$t0 $t0 R

31

op

# # # # #

25

rs

20

rt

set set and set set

Hi,Lo to ($s0)×($s1) Hi to ($s0)mod($s1) Lo to ($s0)/($s1) $t0 to (Hi) $t0 to (Lo)

rd

15

10

sh

5

fn

0

0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 x 0 ALU instruction

Source register 1

Source register 2

Unused

Unused

mult = 24 div = 26

The multiply (mult) and divide (div) instructions of MiniMIPS. R

31

op

25

rs

20

rt

15

rd

10

sh

5

fn

0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 x 0 ALU instruction

Unused

Unused

Destination register

Unused

mfhi = 16 mflo = 18

MiniMIPS instructions for copying the contents of Hi and Lo registers into general registers . Page  53

© NTK 2009

Logical Shifts MiniMIPS instructions for left and right shifting: sll srl sllv srlv

R

$t0,$s1,2 $t0,$s1,2 $t0,$s1,$s0 $t0,$s1,$s0 31

op

25

20

$t0=($s1) $t0=($s1) $t0=($s1) $t0=($s1) rt

15

left-shifted by 2 right-shifted by 2 left-shifted by ($s0) right-shifted by ($s0)

rd

10

sh

5

31

op

Unused

25

0

rs

Source register

20

rt

Destination register

15

rd

Shift amount

10

sh

sll = 0 srl = 2

5

fn

0

0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 x 0 ALU instruction

Amount register

Source register

Destination register

Unused

The four logical shift instructions of MiniMIPS. Page  54

fn

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 x 0 ALU instruction

R

rs

# # # #

© NTK 2009

sllv = 4 srlv = 6

Unsigned Arithmetic and Miscellaneous Instructions MiniMIPS instructions for unsigned arithmetic (no overflow exception): addu subu multu divu

$t0,$s0,$s1 $t0,$s0,$s1 $s0,$s1 $s0,$s1

addiu $t0,$s0,61

# # # # # # # #

set $t0 to ($s0)+($s1) set $t0 to ($s0)–($s1) set Hi,Lo to ($s0)×($s1) set Hi to ($s0)mod($s1) and Lo to ($s0)/($s1) set $t0 to ($s0)+61; the immediate operand is sign extended

To make MiniMIPS more powerful and complete, we introduce later: sra $t0,$s1,2 srav $t0,$s1,$s0 syscall Page  55

# sh. right arith (Sec. 10.5) # shift right arith variable # system call (Sec. 7.6) © NTK 2009

The 20 MiniMIPS Instructions from Chapter 6 (40 in all so far)

Copy

Arithmetic Table 6.2 (partial)

Shift

Memory access Control transfer Page  56

Instruction

Usage

Move from Hi

mfhi

rd

Move from Lo

mflo

rd

Add unsigned

addu

rd,rs,rt

Subtract unsigned

subu

rd,rs,rt

Multiply

mult

rs,rt

Multiply unsigned

multu rs,rt

Divide

div

rs,rt

Divide unsigned

divu

rs,rt

Add immediate unsigned

addiu rs,rt,imm

Shift left logical

sll

rd,rt,sh

Shift right logical

srl

rd,rt,sh

Shift right arithmetic

sra

rd,rt,sh

Shift left logical variable

sllv

rd,rt,rs

Shift right logical variable

srlv

rt,rd,rs

Shift right arith variable

srav

rd,rt,rd

Load byte

lb

rt,imm(rs)

Load byte unsigned

lbu

rt,imm(rs)

Store byte

sb

rt,imm(rs)

Jump and link

jal

L

System call

syscall

© NTK 2009

op fn 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 32 36 40 3 0

16 18 33 35 24 25 26 27 0 2 3 4 6 7

12

The 37 + 3 MiniMIPS Instructions Covered So Far Instruction

Usage

Instruction

Usage

Load upper immediate

lui

rt,imm

Move from Hi

mfhi

rd

Add

add

rd,rs,rt

Move from Lo

mflo

rd

Subtract

sub

rd,rs,rt

Add unsigned

addu

rd,rs,rt

Set less than

slt

rd,rs,rt

Subtract unsigned

subu

rd,rs,rt

Add immediate

addi

rt,rs,imm

Multiply

mult

rs,rt

Set less than immediate

slti

rd,rs,imm

Multiply unsigned

multu rs,rt

AND

and

rd,rs,rt

Divide

div

rs,rt

OR

or

rd,rs,rt

Divide unsigned

divu

rs,rt

XOR

xor

rd,rs,rt

Add immediate unsigned

addiu rs,rt,imm

NOR

nor

rd,rs,rt

Shift left logical

sll

rd,rt,sh

AND immediate

andi

rt,rs,imm

Shift right logical

srl

rd,rt,sh

OR immediate

ori

rt,rs,imm

Shift right arithmetic

sra

rd,rt,sh

XOR immediate

xori

rt,rs,imm

Shift left logical variable

sllv

rd,rt,rs

Load word

lw

rt,imm(rs)

Shift right logical variable

srlv

rd,rt,rs

Store word

sw

rt,imm(rs)

Shift right arith variable

srav

rd,rt,rs

Jump

j

L

Load byte

lb

rt,imm(rs)

Jump register

jr

rs

Load byte unsigned

lbu

rt,imm(rs)

Branch less than 0

bltz

rs,L

Store byte

sb

rt,imm(rs)

Branch equal Page  57 Branch not equal

beq

rs,rt,L

jal

L

bne

rs,rt,L

Jump and link © NTK 2009 System call

syscall

Content 1. Introduction - Computer system technology and Computer Performance 2. Instruction Set Architecture 3. Arithmetic for Computer 4. CPU Organization

Page  58

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  59

© NTK 2009

3.1. Introduction

Page  60

© NTK 2009

Number Representation  Numbers are normally represented using a positional number system:

N (b ) = an an −1an − 2 ...a1a0 .a−1a− 2 ...a− m – Base/radix: b (the number of digits) – Digits: 0..(b-1) • 0 ≤ ai ≤ (b-1) – Binary: b=2, digits:0,1 – Decimal: b=10, digits: 0,1,2,3,4,5,6,7,8,9 – Octal: b=8, digits: 0,1,2,3,4,5,6,7 © NTK 2009 – Hexadecimal: b=16, digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

Page  61

Number Representation

N (b ) = an an −1an − 2 ...a1a0 .a−1a− 2 ...a− m N (10 ) = an .b n + an −1.b n −1 + ... + a1.b1 + a0 .b 0 + a−1.b −1 + ... + a− m .b − m

N (10 ) =

n

∑ a .b

i =− m

i

i

11101.11(2) = 1x24+1x23+1x22+0x21+1x20+1x2-1+1x2-2= 29.75(10) Page  62

© NTK 2009

Number Representation

 Decimal: – b=10 – Digits: 0,1,2,3,4,5,6,7,8,9

N (10 ) = an an −1an − 2 ...a1a0 .a−1a− 2 ...a−m

N (10 ) = – Eg:

n

∑ a .10

i =− m

i

i

539.45(10) = 5x102+3x101+9x100+4x10-1+5x10-2 Page  63

© NTK 2009

ai = 0..9

Number Representation

 Binary: – b=2

bit – binary digit

– Digits: 0,1

N ( 2 ) = an an −1an −2 ...a1a0 .a−1a− 2 ...a−m

N (10 ) =

n

∑ a .2

i =− m

i

i

– Eg: 1011.011(2) = 1x23 + 0x22 + 1x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3 Page  64

© NTK 2009

ai = 0,1

Number Representation  Binary (cnt’) – n-bit binary number can represent which range? • an-1...a1a0 from 0 to 2n-1

– MSB – Most Significant Bit

N ( 2 ) = an −1an − 2 ...a1a0

– LSB – Least Significant Bit

MSB

0001 = 1

0110 = 6

1011 = 11

0010 = 2

0111 = 7

1100 = 12

0011 = 3

1000 = 8

1101 = 13

0100 = 4

1001 = 9

1110 = 14

0101 = 5

1010 = 10

1111 = 15

Page  65

© NTK 2009

LSB

Number Representation  Octal: – b=8

N (8) = an an −1...a1a0 .a−1a− 2 ...a− m

– Digits: 0,1,2,3,4,5,6,7

ai = 0..7

– Eg: 503.071(8) = 5x82 + 0x81 + 3x80 + 0x8-1 + 7x8-2 + 1x8-3

 Hexadecimal:

N (16 ) = an an −1...a1a0 .a−1a− 2 ...a− m

– b=16

ai = 0..F

– Digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F – Eg: 503.071(16) = 5x162 + 0x161 + 3x160 + 0x16-1 + 7x16-2 + 1x16-3 Page  66

© NTK 2009

Convert from base b to base 10

 Base b to base 10 conversion

N (b ) = an an −1an − 2 ...a1a0 .a−1a− 2 ...a− m N (10 ) = an .b n + an −1.b n −1 + ... + a1.b1 + a0 .b 0 + a−1.b −1 + ... + a− m .b − m

 Eg: – 1010.11(2)= 1x23+0x22+1x21+0x20+1x2-1+1x2-2=10.75(10) – 1010.11(8)=? – A12(16)=? Page  67

© NTK 2009

Convert from base 10 to base b

Base 10 to base b conversion – For integer part: • Divide integer part by b until the result is 0 • Write remainders in reverse order to get the converted  result. – For the odd part after “.” • Multiply by b until the result is 0

Page  68

© NTK 2009

Convert from base 10 to base 2

 Eg1: 6.625(10) = ?(2) – The odd part after “.”

– The integer part 6 0

• 0.625 x 2 = 1.25

2

• 0.25 x 2 = 0.5

3

2

1

1

2

1

0

• 0.5

6.625(10) = 110.101(2)

 Eg2: 120.5625(10) = 1111000.1001(2) Page  69

x 2 = 1.0

© NTK 2009

Convert from base 2 to base 2n  Group from right to left n-bit groups and replace the equivalent values in base 2n  Eg:

 101011(2) = ?(8)

1010.110(2)=12.6(8)

 101011(2) = ?(16)

1010.110(2)=A.C(16)

Page  70

© NTK 2009

Convert from base 2n to base 2

 Each digit in base 2n is replaced by n bit in base 2.  Eg:

 37A.B(16)=?(2)

Page  71

© NTK 2009

Convert from base i to base j

 If both i and j are powers of 2, use base 2 as an intermediate base: – Eg: base 8 → base 2 → base 16 – 735.37(8)=?(16)

 Else, use base 10 as an intermediate base: – Eg: base 5 → base 10 → base 2

Page  72

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  73

© NTK 2009

3.2. Unsigned Numbers  The general form of signed number A: an-1an-2...a2a1a0  Value of A:

A = an −1 2 n −1 + an − 2 2 n − 2 + ... + a1 21 + a0 20 n −1

A = ∑ ai 2i i =0

 Range of representation: – Use n bit to represent 2’s complement numbers – Range: 0 => 2n-1

Page  74

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  75

© NTK 2009

3.3. Signed Numbers  1’s complement and 2’s complement number – A binary integer A is represented by n bit: • 1’s complement number of A is (2n - 1) – A • 2’s complement number of A is 2n - A • Notes: 2’s complement number of A = 1’s complement number + 1 – Eg: • n=8, A = 00110101 • 1’s complement number of A is (28 - 1) - 00110101= • 2’s complement number of A is 28 - 00110101=

Page  76

© NTK 2009

2’s complement representation of signed numbers  Most left bit is sign bit  Positive and 0 numbers are expressed in usual binary format. – The largest number can be represented is 2n-1-1 – n=8 => largest signed number: 28-1-1 = 127

 Negative number a is stored as the binary equivalent of 2n-A in one n-bit system. – -3 is stored as 28-3=11111101 in a 8-bit system – The most negative number can be stored is -2n-1

Page  77

© NTK 2009

2’s complement representation of signed numbers  The general form of signed number A: an-1an-2...a2a1a0 n−2

 Value of A:

A = − an −1 2 n −1 + ∑ ai 2i i =0

 Range of representation: – Use n bit to represent 2’s complement numbers – Range: -2n-1 => 2n-1-1

Page  78

© NTK 2009

2’s complement representation of signed numbers  +10 = 0000 1010  - 10 = 28-10 = 1 0000 0000 –

0000 1010 1111 0110

- 10 = 1111 0110  +10 + (-10) = ?

Page  79

© NTK 2009

MIPS signed number representation  32 bit signed numbers: 0000 0000 0000 ... 0111 0111 1000 1000 1000 ... 1111 1111 1111

Page  80

0000 0000 0000 0000 0000 0000 0000two = 0(10) 0000 0000 0000 0000 0000 0000 0001two = + 1(10) 0000 0000 0000 0000 0000 0000 0010two = + 2(10) 1111 1111 0000 0000 0000

1111 1111 0000 0000 0000

1111 1111 0000 0000 0000

1111 1111 0000 0000 0000

1111 1111 0000 0000 0000

1111 1111 0000 0000 0000

1110two 1111two 0000two 0001two 0010two

= = = = =

+ + – – –

2,147,483,646(10) 2,147,483,647(10) 2,147,483,648(10) 2,147,483,647(10) 2,147,483,646(10)

1111 1111 1111 1111 1111 1111 1101two = – 3(10) 1111 1111 1111 1111 1111 1111 1110two = – 2(10) 1111 1111 1111 1111 1111 1111 1111two = – 1(10)

© NTK 2009

2’s complement representation of signed numbers  Procedure to find binary representation of negative number in 2’s complement: – Find the binary equivalent of the magnitude – Complement each bit (0=>1, 1=>0) – Add 1

 Eg: find representation of -13 in 8-bit signed number system using 2’s complement: • Magnitude:

13 = 0000 1101

• 1’s complement: • Add 1: • -13 = Page  81

1111 0010 +

1 1111 0011 © NTK 2009

2’s complement representation of signed numbers  To find the magnitude of a negative number: – Complement each bit – Add 1

 Eg: -5 11111011

-1

00000100

00000000

1

1

5 00000101

Page  82

11111111

1

© NTK 2009

00000001

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  83

© NTK 2009

3.4. Addition & Subtraction  3.4.1. Addition of unsigned numbers  3.4.2. Addition of signed numbers  3.4.3. Subtraction of signed numbers

Page  84

© NTK 2009

3.4.1. Addition of Unsigned Numbers 

Unsigned binary addition similar to decimal addition. carry

decimal binary 1100 11110

A

2565

10110

B

6754

11011

sum

9319

110001

Eg: 10101(2) + 11011(2) = ? (2)

Page  85

© NTK 2009

Addition of Unsigned Numbers  Overflow – Occur when the result of addition is out of range of representation (the result can not be stored in the predefined number of bits) – Eg: X = 1001 0110 = 150

X = 1100 0101

= 197

Y = 0001 0011 = 19

Y = 0100 0110

=

S = 1010 1001 = 169

S = 0000 1011=11≠ 267

Cout = 0

Cout = 1 → carry-out

⇒ Overflow occurs when Cout = 1 Page  86

© NTK 2009

70

3.4.2. Addition of Signed Numbers  The reason that 2’s complement is so popular is the simplicity of addition.  To add any two numbers, no matter what the sign of each is, we just do binary addition on their representation.

Page  87

-5 1011

-5

1011

-5

1011

+7 0111

+5 0101

+3

0011

+2 0010

0 0000

-2

1110

© NTK 2009

Addition of Signed Numbers  Overflow – Occur when the result of addition is out of range of representation (the result can not be stored in the predefined number of bits) – Occur when? • Add two numbers of the opposite sign? • Add two positive numbers? • Add two negative numbers?

⇒ Overflow occurs when adding two numbers with the same sign and the result is in different sign

Page  88

© NTK 2009

3.4.3. Subtraction of Signed Numbers  Principle: – Subtraction is addition of negative number.

a – b = a + (-b)  Eg: 7 – 5 = ?

5

0101

7

0111

1010

-5

+1011

+

2

0010

1

-5 1011

Page  89

© NTK 2009

 Overflow when adding or subtracting signed numbers:

Page  90

© NTK 2009

Addition & Subtraction in MIPS  Unsigned integers are commonly used for memory addresses where overflow ignored, so MIPS provide two kinds of addition and subtraction: – add, addi, sub cause exception when overflow – addu, addiu, subu do not cause exception when overflow – MIPS has a register called exception program counter (EPC) to store the address of the instruction that caused the exception. – Instruction mfc0 is used to copy EPC into a general purpose register mfc0 s1,epc

Page  91

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  92

© NTK 2009

3.5. Multiplication  3.5.1. Multiplication of unsigned numbers  3.5.2. Multiplication of signed numbers

Page  93

© NTK 2009

3.5.1. Multiplication of unsigned numbers       1000     x 1001       1000     0000   0000 1000 1001000

Multiplicand (+8) Multiplier      (+9)

Product         (+72)

 Multiplication of two n-bit unsigned numbers, the product is one 2n-bit unsigned number.

Page  94

© NTK 2009

Multiplication implementation - 1st version

 Multiplicand, ALU, product are 64 bit  Multiplier is 32 bit

Page  95

© NTK 2009

Multiplication algorithm  If each step take one clock cycle, this multiplication algorithm will require almost 100 clock cycles to multiply two 32-bit numbers.

Page  96

© NTK 2009

Multiplication implementation – 2nd version

 Multiplicand, ALU, multiplier are 32 bit  Product is 64 bit

Page  97

© NTK 2009

3.5.2. Multiplication of signed numbers  Use unsigned multiplication: – Convert multiplicand & multiplier to positive numbers – Multiply using unsigned multiplication algorithm – Change the sign of product: • If the signs of multiplicand and multiplier are the same, the product is the result of step 2. • If the signs disagree, the product is two’s complement of the result of step 2.

Page  98

© NTK 2009

Faster multiplication

Page  99

© NTK 2009

Multiply in MIPS  Product is stored in a pair of 32-bit special registers, called Hi & Lo  To produce properly signed and unsigned product, MIPS has two multiplication instructions: – mult – multiply signed – multu – multiply unsigned

 To fetch the 32-bit product, MIPS provide two instructions: – mfhi – mflo

Page  100

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  101

© NTK 2009

3.6. Division  3.6.1. Division of unsigned numbers  3.6.2. Division of signed numbers

Page  102

© NTK 2009

3.6.1. Division of unsigned numbers  Divide 1001010(10) by 1000(10)

Page  103

© NTK 2009

Division implementation – 1st version

 Divisor, ALU, remainder are 64 bit  Quotient is 32 bit

Page  104

© NTK 2009

Division algorithm

Page  105

© NTK 2009

Division implementation – 2nd version

Page  106

© NTK 2009

3.6.2. Division of signed numbers  Use unsigned division algorithm: – Convert dividend and divisor to positive numbers – Divide using unsigned division algorithm – Change the sign of product: • Dividend

Divisor

Quotient

Remainder

+

+

Keep

Keep

+

-

Negate

Keep

-

+

Negate

Negate

-

-

Keep

Negate

Page  107

© NTK 2009

Faster division

Page  108

© NTK 2009

Divide in MIPS  Remainder is stored in Hi, quotient is stored in Lo  To produce properly signed and unsigned result, MIPS has two division instructions: – div – divide signed – divu – divide unsigned

 To fetch the 32-bit result, MIPS provide two instructions: – mfhi – mflo

Page  109

© NTK 2009

3. Arithmetic for Computer – Introduction – Unsigned Numbers – Signed Number – Addition and Subtraction – Multiplication – Division – Floating Point

Page  110

© NTK 2009

3.7. Floating Point

 Floating-point numbers can be represented in the form:

(-1)sx1.xxxxx(2)x2yyyy s: sign (s=0 => positive, s=1 => negative) x,y={0,1} 15.75(10) =1111.11(2) = 1.11111x23

Page  111

© NTK 2009

Floating-point number representation

Page  112

© NTK 2009

IEEE 754 standard  To represent floating-point numbers  Base b=2  Three basic forms: – Single-precision number representation, 32 bit – Double-precision number representation, 64 bit – Extended-precision number representation, 80 bit

 Formats: 31 30 S

23 22 e

m

63 62 S

52 51

S

0

e

m

79 78 Page  113

0

64 63 e

0 © NTK 2009

m

IEEE 754 standard 31 30 S

23 22 e

m

63 62 S

52 51

0

e

m

79 78 S

0

64 63

0

e

m

X = (-1)S x 1.m x 2e-b  s: sign bit (s=0 => positive, s=1 => negative)  e: excess  b: bias – Single-precision 32-bit : b = 127 – Double-precision 64-bit : b = 1023 – Extended-precisioin 80-bit : b = 16383 Page  114

© NTK 2009

IEEE 754 standard  Example: One real number X is represented using IEEE 754 standard with the following format: 1100 0001 0101 0110 0000 0000 0000 0000 Show the decimal value of X  Solution:

1100 0001 0101 0110 0000 0000 0000 0000 s=1

e=130

m

X = (-1)S x 1.m x 2e-b = (-1)1 x 1.1010110 x 2130-127 = -1.101011 x 23= -1101.011 = -13.375(10) Page  115

© NTK 2009

IEEE 754 standard  Example: One real number X is represented using IEEE 754 standard with the following format: 0011 1111 1000 0000 0000 0000 0000 0000 Show the decimal value of X Solution: X = (-1)S x 1.m x 2e-127 = (-1)0 x 1.0 x 2127-127 =1

Page  116

© NTK 2009

IEEE 754 standard  Example: Represent the real number -19.7890625(10) using 32 bit IEEE 754 standard  19.7890625 = 10011.1100101(2) = 1.00111100101 x 24  -19.7890625 = (-1)1 x 1.00111100101 x 2131-127 = 1 1000 0011 001111001010...0

Page  117

© NTK 2009

Special convention

Page  118

© NTK 2009

Range of Representation

underflow

overflow −∞

-b

-a

-0 +0

overflow a

b

 32 bit: a = 2-127 ≈ 10-38

b = 2+127 ≈ 10+38

 64 bit: a = 2-1023 ≈ 10-308

b = 2+1023 ≈ 10+308

 80 bit: a = 2-16383 ≈ 10-4932

b = 2+16383 ≈ 10+4932

Page  119

© NTK 2009

+∞

Floating-point addition

Page  120

© NTK 2009

Block diagram of an arithmetic unit dedicated to floating-point addition

Page  121

© NTK 2009

Floating-point multiplication

Page  122

© NTK 2009

Floating-point instructions in MIPS

Operations Addition Subtraction Multiplication Division Comparison

Single add.s sub.s mul.s div.s c.x.s

bclt bclf

Branch if true Branch if false

Page  123

Double add.d sub.d mul.d div.d c.x.d

© NTK 2009

x = eq, neq, lt, le, gt, ge

Floating-point instructions in MIPS  MIPS has dedicated registers and instructions used for floating-point operations: – 32 floating-point registers: $f0,$f1,$f2...,$f31 – Separate load and store for floating-point registers: lwc1, swc1,

Page  124

lwc1

f4,0(sp)

# Load 32 bit F.P number into f4

lwc1

f6,4(sp)

# Load 32 bit F.P number into f6

add.s f2,f4,f6

# f2 = f4 + f6

swc1

# Store F.P number from f2

f2,8(sp)

© NTK 2009

Floating-point instructions in MIPS  A double precision register is combination of an even-old pair of single precision registers, using even register as its name lwc1 f4,0(sp)

# Load upper part of 64 bit F.P number into f4

lwc1 f5,4(sp)

# Load lower part of 64 bit F.P number into f5

lwc1 f6,8(sp)

# Load upper part of 64 bit F.P number into f6

lwc1 f7,12(sp)

# Load lower part of 64 bit F.P number into f7

add.d f2,f4,f6

# (f2,f3) = (f4,f5) + (f6,f7)

swc1 f2,16(sp)

# Store upper part of 64 bit F.P number from f2

swc1 f2,20(sp)

# Store upper part of 64 bit F.P number from f2

Page  125

© NTK 2009

MIPS floating-point assembly language

Page  126

© NTK 2009

Content 1. Introduction - Computer system technology and Computer Performance 2. Instruction Set Architecture 3. Arithmetic for Computer 4. CPU Organization

Page  127

© NTK 2009

4. CPU Organization  4.1. Building a datapath  4.2. Single cycle implementation  4.3. Multi cycle implementation  4.4. Exceptions  4.5. Pipelining

Page  128

© NTK 2009

4.1. Building a datapath  A basic MIPS implementation  Implement a subset of the core MIPS instruction set: – Memory-reference instructions: lw, sw – Arithmetic-logical instructions: add, sub, and, or, slt – Branch instructions: beq, j

Page  129

© NTK 2009

Review of instruction classes

R

I

J

31

31

31

Page  130

op

25

rs

20

rt

15

6 bits

5 bits

5 bits

Opcode

Source register 1

Source register 2

op

25

rs

20

rt

rd

10

5 bits Destination register 15

sh

fn

5

5 bits

6 bits

Shift amount

Opcode extension

operand / offset

6 bits

5 bits

5 bits

16 bits

Opcode

Source or base

Destination or data

Immediate operand or address offset

op

25

jump target address

0

0

0

6 bits

1 0 0 0 0 0 0 0 0 0 0 0 26 0 bits 0 0 0 0 0 0 0 1 1 1 1 0 1

Opcode

Memory word address (byte address divided by 4)

© NTK 2009

A basic MIPS implementation  For every instruction, the first two steps are identical: – Send the program counter PC to instruction memory and fetch the instruction from that memory. – Read one or two registers, using fields of the instruction to select the registers to read. – Instructions class except for jump use ALU: • Memory-reference instructions use ALU for address calculation • Arithmetic-logical instructions use ALU for operation execution • Branch instructions use ALU for comparison – Next actions differ according to instructions: • Memory-reference instructions: access memory to write / read • Arithmetic-logical instructions: write data from ALU back to registers • Branch instructions: change next instruction address based in comparison Page  131

© NTK 2009

Abstract view of a basic MIPS implementation – v1.0

3 1 2 Page  132

© NTK 2009

Abstract view of a basic MIPS implementation – v1.0  Drawback in version 1.0 of the basic MIPS implementation: – In several places, data going into a particular unit as coming from two different sources. • Value written into PC can come from one of two adders. • Data written into registers can come from either ALU or date memory – Several of the units must be controlled depending on the type of instruction. • Data memory must read on load and write on store. • Registers must be written on a load and an arithmetic-logical instruction.

Page  133

© NTK 2009

Abstract view of a basic MIPS implementation – v2.0

Page  134

© NTK 2009

Abstract view of a basic MIPS implementation – ver 2  Benefits: – Simple, easy to understand – Single-cycle datapath

 Requirement: – Instruction memory and data memory are separate because: • Format of data and instruction is different • Having separate memories is less expensive • The processor operates in one cycle and cannot use single-portet memory for two different access within that cycle.

Page  135

© NTK 2009

Logic design conventions  To design the machine, how the logic implementing the machine will operate and how the machine is clocked needed to be decided.  MIPS design consists of two types of logic elements: – Combinational elements – Sequential elements: • Flip-flops, memories, registers

Page  136

© NTK 2009

Please revise Digital Logic Design

Building a datapath  Fetching instructions and incrementing PC  Implement R-format ALU operations  Loading and storing data  Branching with beq

Page  137

© NTK 2009

Fetching instructions and incrementing PC

 Read data from instruction memory to output  PC = PC + 4

Page  138

© NTK 2009

Implement R-format ALU operations

 R-format ALU instructions: add, sub, and, or, slt add s1,s2,s3

# s1 = s2 + s3

1. Read s2,s3 from register file according to two register numbers to outputs 2. ALU execute operation add 3. Write result back to register s1,©need write control signal NTK 2009

Page  139

Loading and storing data lw t1, offset_value(t2) sw t1, offset_value(t2) 3. Compute memory address by adding the base register t2 with 16-bit sign extended field offset_value. 4. Read data from register t1 to write to calculated address in data memory or Read data from calculated address in data memory to write to register t1.

Page  140

© NTK 2009

Branching with beq beq t1,t2,offset  Compute branch target address by adding sign-extended offset to PC – Two notes in the definition of branch instruction: • The instruction set architecture specifies that the base for branch address calculation is the address of the instruction following the branch. So we can always compute PC+4 in fetching period. • The offset field is shifted left 2 bits so that it’s a word offset.

 Use ALU to evaluate branch condition: – Read two registers t1, t2 from register file to inputs of ALU – Subtract two inputs of ALU, assert control signal Page  141

© NTK 2009

Branching with beq

Page  142

© NTK 2009

Creating a single datapath  Single datapath: – Execute every instruction in one clock cycle. – No resource used more than once per instruction, so any elements needed more than once must be duplicated. • Separate memories for instruction and data.

Page  143

© NTK 2009

A single datapath for memory and R-type instructions

Page  144

© NTK 2009

A single datapath for a basic MIPS

Control signals are not connected Page  145

© NTK 2009

4. CPU Organization  4.1. Building a datapath  4.2. Single cycle implementation  4.3. Multi cycle implementation  4.4. Exceptions  4.5. Pipelining

Page  146

© NTK 2009

4.2. Single cycle implementation  ALU Control  Main Control Unit

Page  147

© NTK 2009

ALU Control  ALU has four control inputs:

 Depend on the instruction class, one of first five functions of ALU will be performed (NOR is needed for other parts): – Load, store instructions: use ALU to compute memory address by addition – R-type instructions: ALU performs one of five actions (add, sub, and, or, slt) based on 6-bit function field in the instruction. – Branch beq: ALU performs subtraction Page  148

© NTK 2009

ALU control inputs

Page  149

© NTK 2009

Truth table which uses don’t care values to have compact minimization form

Designing the Main Control Unit

Page  150

© NTK 2009

Three instruction classes

 The opcode is always contained in bits 31:26. We refer to this field as Opcode[5:0]  Two registers to be read (R-type, beq, sw) are always at 25:21 and 20:16  The base register for load and store instruction is always in bit 25:21  16 bit offset for load/store/beq is always in 15:0  The destination register is in one of two places: – For load, it’s in 20:16 – For R-type, it’s in 15:11

Page  151

© NTK 2009

A simple datapath with all control lines identified

Page  152

© NTK 2009

Single datapath with control unit

153

Single control & datapath extended to handle jump

154

Why a single-cycle implementation is not used today?  Although single-cycle design will work correctly, it will not be used in modern designs because it’s inefficient: – Clock cycle must have the same length for every instruction in this single-cycle design => Clock cycle is determined by the longest possible path in the machine. • Longest instruction: Load instruction: uses 5 functional units in series: instruction memory, register file, ALU, data memory, register file. => Not good since several instruction could fit in a shorter clock cycle – Some functional units (ALU) must be duplicated => Hardware cost

Page  155

© NTK 2009

4. CPU Organization  4.1. Building a datapath  4.2. Single cycle implementation  4.3. Multicycle implementation  4.4. Exceptions  4.5. Pipelining

Page  156

© NTK 2009

4.3. Multicycle implementation  Multicycle implementation: – Break each instruction into a series of steps corresponding to the functional unit operations needed. • Each step in the execution will take one clock cycle. Each instruction will take different numbers of clock cycles. • Allow a functional unit to be used more than once per instruction => hardware share.

Page  157

© NTK 2009

High-level view of multicycle datapath

 Key elements: – A shared memory unit for both instructions & data – A single ALU – Require additional registers: IR, Memory data register, A,B, ALUout – Require additional multiplexers© NTK 2009

Page  158

Multicycle implementation  At the end of a clock cycle, all data used in subsequent clock cycle must be stored in a state element: – Data used by subsequent instructions in a later clock cycle is stored in one of the programmer-visible state elements: register file, PC, memory – Data used by the same instruction in a later clock cycle is stored in one of additional registers.

 One clock cycle can accommodate at most one of the following operations: A memory access, A register file access (two reads and one write), An ALU operation => Any data produced by these three functional units must be saved into a temporary register for use in later cycle. If not saved, timing race. Page  159

© NTK 2009

Additional registers  Instruction Register (IR) and Memory data register (MDR) are added to save the output of memory for instruction read and a data read. Two separate registers are used since both values are needed during the same clock.  A,B registers are used to hold values read from register file.  ALUout register holds the output of ALU.

Page  160

© NTK 2009

Additional multiplexers  Replacing three ALUs of single-cycle datapath by a single ALU requires: – An additional multiplexer: {A,PC} => the first ALU input. – An additional multiplexer: {B,4,Sign extend, shift left 2} => the second ALU input

Page  161

© NTK 2009

Multicycle datapath with control lines

Page  162

© NTK 2009

Branch and jump instructions  With jump and branch instructions, three possible sources of values to be written into PC: – The output of ALU, PC + 4 => PC directly – Register ALUout – the address of branch target after it is computed – Address of jump target = Lower 26 bit of IR << 2 concatenated with 4 upper bits of the incremented PC

Page  163

© NTK 2009

Complete datapath for multicycle implementation

Page  164

© NTK 2009

R

I

J

31

31

31

Page  165

op

25

rs

20

rt

15

6 bits

5 bits

5 bits

Opcode

Source register 1

Source register 2

op

25

rs

20

rt

rd

10

5 bits Destination register 15

sh

fn

5

5 bits

6 bits

Shift amount

Opcode extension

operand / offset

6 bits

5 bits

5 bits

16 bits

Opcode

Source or base

Destination or data

Immediate operand or address offset

op

25

jump target address

0

0

0

6 bits

1 0 0 0 0 0 0 0 0 0 0 0 26 0 bits 0 0 0 0 0 0 0 1 1 1 1 0 1

Opcode

Memory word address (byte address divided by 4)

© NTK 2009

ftp address to download materials

ftp://dce.hut.edu.vn/kiennt/

Page  166

© NTK 2009

Action of the 1-bit control signals

Page  167

© NTK 2009

Action of the 2-bit control signals

Page  168

© NTK 2009

Breaking instruction execution into clock cycles  Each MIPS instruction needs from three to five of these steps – 1. Instruction fetch step. – 2. Instruction decode and register fetch step.

Same for all instructions

– 3. Execution, memory address computation or branch completion. – 4. Memory access or R-type instruction completion step. – 5. Memory read completion step.

Page  169

© NTK 2009

1. Instruction fetch step  Fetch the instruction from memory and compute the address of the next sequential instructions: – IR <= Memory[PC]; • Assert the control signals MemRead and IRWrite and set IorD = 0 to select PC as the source of address. – PC <= PC + 4; • ALUSrcA = 0 (sending PC to ALU input) • ALUSrcB = 01 (sending 4 to ALU input) • ALUOp = 00 (to make ALU add) • PCSource = 00 (to select output of ALU addition as source) • assert PCWrite to write back to PC Page  170

© NTK 2009

2. Instruction decode and register fetch step  Read two registers in rs and rt instruction fields and compute the branch target address with ALU – A <= Reg[IR[25:21]]

Are these operations necessary for all – B <= Reg[IR[20:16]] instructions? – ALUOut <= PC + (sign-extend (IR[15:0])<<2 • ALUSrcA = 0 to select PC as ALU input • ALUSrcB = 11 to select sign-extended and shifted offset field as ALU input • ALUOp = 00 to make ALU add.

Page  171

© NTK 2009

3. Execution, memory address computation or branch completion step

 This is the first cycle during which the datapath operation is determined by the instruction class – Memory reference • ALUOut <= A + sign-extend (IR[15:0]) – Arithmetic-logical instruction (R-type) • ALUOut <= A op B – Branch • if (A==B) PC <= ALUOut – Jump • PC <= {PC[31:28], (IR[25:0],2’b00)};

Page  172

© NTK 2009

4. Memory access or R-type instruction completion  During this step, a load or store instruction accesses memory and an arithmetic-logical instruction writes its result. – Memory reference: • MDR <= Memory [ALUOut]; • or Memory [ALUOut] <= B; – Arithmetic-logical instruction: • Reg[ IR[15:11] ] <= ALUOut;

Page  173

© NTK 2009

5. Memory read completion step  During this step, loads complete by writing back the value from memory – Reg [IR[20:16]] <= MDR;

Page  174

© NTK 2009

Summary of steps taken to execute any instruction class

Page  175

© NTK 2009

Defining the control  We use state machine to describe the operation of MIPS

Page  176

© NTK 2009

Figure 5.32. FSM for instruction fetch and decode

Page  177

© NTK 2009

Figure 5.33. FSM for controlling memory-reference instructions

Page  178

© NTK 2009

Figure 5.34. FSM for R-type instruction

Page  179

© NTK 2009

Figure 5.35. FSM for branch instruction

Page  180

© NTK 2009

Figure 5.36. FSM for jump instruction

Page  181

© NTK 2009

Complete FSM

Page  182

© NTK 2009

4. CPU Organization  4.1. Building a datapath  4.2. Single cycle implementation  4.3. Multicycle implementation  4.4. Exceptions

Page  183

© NTK 2009

4.4. Exceptions  Exception: – Is an unexpected event from within the processor – Ex: arithmetic overflow, using an undefined instruction...

 Interrupt: – Is an event that causes an unexpected change in control flow but comes from outside of the processor. – Interrupts are used by IO devices to communicate with the processor. – Ex: timer interrupt...

Page  184

© NTK 2009

How exceptions are handled?  Two types of exceptions which our current implementation can generate: – Execution of an undefined instruction – Execution of an arithmetic overflow

 Basic action of machine when an exception occurs: – Save address of offending instruction in the Exception Program Counter (EPC). – Transfer control to the OS at some specific address. • The OS can then take appropriate action which may involve providing some service to the user program, taking predefined action in response to and overflow, or stopping the execution of program and reporting errors. • Then the OS can terminate the program or may continue its execution, using EPC to determine the address to restart the program.

Page  185

© NTK 2009

How exceptions are handled?  We can implement the processing required for exception by adding a few extra registers and control signals to our basic implementation and by slightly extending the FSM. – Two additional registers: • EPC: 32-bit register used to hold address of the affected instruction • Cause: 32-bit register used to record the cause of exception – Cause register = 0: undefined instruction – Cause register = 1: arithmetic overflow – Two control signals used to cause EPC and Cause register to be written: EPCWrite, CauseWrite. – One-bit signal to set value 0/1 to Cause register. – Write exception address of handling code to PC (8000 0180(16)) Page  186

© NTK 2009

The multicycle datapath with exception handling

Page  187

© NTK 2009

How control checks for exceptions?  Undefined instruction exception: – This is detected when no next state is defined from state 1 for op value – We handle this exception by defining the next-state value for all op values rather than lw, sw, 0 (R-type), j and beq as state 10.

 Arithmetic overflow exception: – The ALU designed includes logic to detect overflow and a signal called Overflow is provided as an output from ALU. This signal is used to specify additional possible next state (state 11) for state 7.

Page  188

© NTK 2009

FSM with exception detection

Page  189

© NTK 2009

Midterm exam (70’)  Using multicycle datapath implementation of MIPS, explain the operation of the following instructions: – a. load/store t1,15(s0) – b. add/sub/and/or/slt s0,t1,t2 – c. beq t1,t2,Label – d. jump Label

Page  190

© NTK 2009

4. CPU Organization  4.1. Building a datapath  4.2. Single cycle implementation  4.3. Multicycle implementation  4.4. Exceptions  4.5. Pipelining

Page  191

© NTK 2009

4.5. Pipelining  Introduction  Pipelined Datapath  Pipelined Control  Pipelined Hazards

Page  192

© NTK 2009

4.5. Pipelining  Introduction  Pipelined Datapath  Pipelined Control  Pipelined Hazards

Page  193

© NTK 2009

Introduction  Pipelining is an implementation technique used to enhance performance in which multiple instructions are overlapped in execution.  The idea of pipelining is: Divide an instruction into smaller steps which can be executed concurrently.

Page  194

© NTK 2009

Introduction  MIPS instructions classically take five steps: – 1. Fetch instruction from memory. – 2. Read registers while decoding the instruction. The format of MIPS instruction allow reading and decoding to occur simultaneously. – 3. Execute the operation or calculate the address. – 4. Access an operand in memory. – 5. Write the result into a register.

Page  195

© NTK 2009

Pipelining

Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Inst 6 Inst 7

Page  196

1

2

3

4

FI

DI

EX

FO

5

6

7

8

WR

© NTK 2009

9

10

11

12

Single-cycle versus Pipelined performance  In this example, we limit our attention to 8 instructions: lw, sw, add, sub, and, or, slt, beq.  Total time for each instruction:

Page  197

© NTK 2009

Single-cycle versus Pipelined performance

Page  198

© NTK 2009

Pipelined performance  If the stages are perfectly balanced, then the time between instructions in pipelined processor when the number of instruction is large is equal to:

Page  199

© NTK 2009

4.5. Pipelining  Introduction  Pipelined Datapath  Pipelined Control  Pipelined Hazards

Page  200

© NTK 2009

A pipelined datapath  The division of an instruction into five stages means a fivestage pipeline, which in turn means five instructions will be in execution during any single clock cycle:

Page  201

© NTK 2009

Pipelined execution in single-cycle datapath of MIPS

 Because each resource is used during only one of the five stages of an instruction, allowing it to be shared by other instructions during the other four stages. => To retain the value of an individual instruction for its other four © NTK 2009 stages, the value must be saved in a register.

Page  202

Pipelined datapath of MIPS

Registers Registers must be wide enough to store all data corresponding to the lines Page  203 © NTK 2009 that go through them. IF/ID:64, ID/EX:128, EX/MEM:97,MEM/WB:64

How are portions of datapath used during an instruction? – Example of a load instruction

Page  204

© NTK 2009

IF: first stage of an instruction

Page  205

© NTK 2009

ID: second stage of an instruction

Page  206

© NTK 2009

EX: third stage of an instruction

Page  207

© NTK 2009

MEM: forth stage of an instruction

Page  208

© NTK 2009

WB: fifth stage of an instruction

Page  209

© NTK 2009

Write register number needed to be saved for WB stage

Corrected datapath to handle a load instruction

Page  210

© NTK 2009

4.5. Pipelining  Introduction  Pipelined Datapath  Pipelined Control  Pipelined Hazards

Page  211

© NTK 2009

Pipelined datapath of MIPS with control  For pipelined datapath, we can divide control lines into five groups according to the pipeline stage: – IF: The control signals to read instruction memory and to write the PC are always asserted, so there is nothing special to control in this pipeline stage. – ID: As in the previous stage, the same thing happens at every clock cycle, so there are no optional control lines to set. – EX: The signals to be set are RegDst, ALUOp, ALUSrc. The signals select Result register, the ALU operation, and either read data 2 or a sign-extended immediate for ALU. – MEM: The control lines set in this stage are Branch, MemRead, MemWrite. These signals are set by the branch equal, load, store instructions. – WB: two control lines are MemtoReg, which decides between sending the ALU result or the memory value to the register file, and RegWrite, which writes the chosen value.

Page  212

© NTK 2009

Pipelined datapath of MIPS with control

Page  213

© NTK 2009

9 control lines for final three stages

 Four control lines for EX stage  Three control lines for MEM stage  214 © NTK 2009 PageTwo control lines for WB stage

The pipelined datapath with control signals connected

Page  215

© NTK 2009

Designing Instruction sets for pipelining  The design of MIPS was designed for pipeline execution: – 1. All MIPS instructions are the same length. » Easy to fetch instruction in 1st stage and decode it in 2nd stage.

– 2. MIPS only has a few instruction formats, with the source register fields being located in the same place in each instruction. » In 2nd stage, register file can be read at the same time that hardware is deciding what type of instruction was fetched. – 3. Memory operands only appear in load and store in MIPS. – 4. Operands are aligned in memory. – RISC – Reduced Instruction Set Computer – CISC – Complex Instruction Set Computer

Page  216

© NTK 2009

4.5. Pipelining  Introduction  Pipelined Datapath  Pipelined Control  Pipelined Hazards

Page  217

© NTK 2009

Pipeline Hazards  There are situations in pipelining when the next instruction can not execute in the following clock cycle => hazards.  Three types of hazards: – Structural Hazards – Data Hazards • Forwarding • Stall – Control Hazards

Page  218

© NTK 2009

Hazards  Structural Hazards  Data Hazards – Forwarding – Stall

 Control Hazards

Page  219

© NTK 2009

Structural Hazards  Structural Hazards occur when the hardware cannot support the combination of instructions that we want to execute in the same clock cycle.  MIPS is designed to avoid structural harzards when designing a pipeline. – Ex: suppose we have a single memory instead of two mem (instruction memory + data memory)

Conflict when both read memory

Page  220

© NTK 2009

Hazards  Structural Hazards  Data Hazards – Forwarding – Stall

 Control Hazards

Page  221

© NTK 2009

Data Hazards  Data Hazards occur when the pipeline must be stalled because one step must wait for another to complete. – In a computer pipeline, data hazards arise from the dependence of one instruction on an ealier one that is still in pipeline. add s0, t0, t1 sub t2, s0, t3 # add instruction doesn’t write its result to s0 until fifth stage # sub instruction read s0 register in second stage

 Solution: – Observation: we don’t need to wait for instruction to complete before trying to resolve the data hazards. For code above, as soon as the ALU creates sum for addition, we can supply it as input for substraction. – Adding extra hardware to to retrieve missing item early from the internal Page  222 © NTK 2009 resources is called forwading or bypassing.

Representation of instruction pipeline

 IF: Instruction fetch  ID: Instruction decode + register file read  EX: Execution  MEM: Memory access  WB: Write back  The shading indicates the element is used by instruction – MEM: white => add doesn’t access memory – Shading on the right half: Read Page © NTK 2009 –  223 Shading on the left half: Write

Hazards  Structural Hazards  Data Hazards – Forwarding – Stall

 Control Hazards

Page  224

© NTK 2009

Data hazards and forwarding

 Instead of waiting until the fifth stage of add instruction for the result in s0 register, forwarding uses extra hardware to write the output of EX stage of add instruction to the input of EX stage for sub instruction. (extra hardware to create connection) Page  225

© NTK 2009

Data hazards and forwarding  Forwarding paths are valid if only the destination stage is later in time than the source stage. – Ex:

lw

s0,20(t1)

sub t2,s0,t3 there cannot be a valid forwarding path from the output of memory access stage in the first instruction to the input of the execution stage of the following , since that would mean going backward in time.

pipeline stall Page  226

© NTK 2009

Data hazards and forwarding

pipeline stall

 Pipeline stall or bubble is added to solve data hazards when an R-format instruction following a load tries to use the data.

Page  227

© NTK 2009

Reordering code to avoid pipeline stalls

Page  228

© NTK 2009

Reordering code to avoid pipeline stalls

Page  229

© NTK 2009

Forwarding  Forwarding yields another insight into the MIPS architecture. Each MIPS instruction writes at most one result and does so near the end of the pipeline. Forwarding is harder if there are multiple results to forward per instruction or they need to write a result early on in instruction execution.  Notes: – The name “forwarding” comes from the idea that the result is passed forward from an earlier instruction to a later instruction. “Bypassing” comes from passing the result by register file to the desired unit.

Page  230

© NTK 2009

ALU and pipeline registers without forwarding

Page  231

© NTK 2009

ALU and pipeline registers with forwarding

Page  232

© NTK 2009

Datapath modified to resolve hazards via forwarding

Page  233

© NTK 2009

Hazards  Structural Hazards  Data Hazards – Forwarding – Stall

 Control Hazards

Page  234

© NTK 2009

Data hazards and Stalls

Forwarding can not resolve this program because the destination stage is ealier in time than the source stage.

?

Page  235

© NTK 2009

Stalls are inserted into pipeline

Page  236

© NTK 2009

Harzards detection unit used for stalls

Page  237

© NTK 2009

Harzards detection unit used for stalls  Harzards detection unit is used to solve hazard when an instruction tries to read a register following a load instruction that writes the same register.  Checking for load instruction: load instruction or not?

– if (ID/EX.MemRead and (( ID/EX.RegisterRt = IF/ID.RegisterRs) or ( ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline

Page  238

© NTK 2009

the destination register field of load instruction in EX stage matches either source register of the instruction in ID stage

Hazards  Structural Hazards  Data Hazards – Forwarding – Stall

 Control Hazards

Page  239

© NTK 2009

Control Hazards (Branch Hazards)  Control hazards or branch hazards occur when the proper instruction can not execute in the proper clock cycle because the instruction that was fetched is not the one that is needed, that is the flow of instruction addresses is not what the pipeline expected.  In branch instruction: – We need to fetch the instruction following the branch on the next clock cycle to allow pipeline. – But the pipeline cannot possibly know what the next instruction should be, since it only just received the branch instruction from memory.

Page  240

© NTK 2009

Impact of pipeline on the branch instruction

if branch is taken, we need to discard (flush) these instructions

Page  241

© NTK 2009

Solution 1: Stall

 If we can move branch decision up earlier, we have less delay.  Assume that we put in enough extra hardware so that we can test registers, calculate the branch address, and update PC during the second stage of pipeline. Even with this highly costed extra hardware, we still have stall: – lw instruction, executed if the branch fails, is stalled one extra 200ps clock cycle before starting. Page  242

The cost of ©extra NTK 2009hw for most computer is too high

Solution 2: Branch Prediction  To solve branch hazards, we use branch prediction: – One simple approach is to always predict that branches will be undertaken. • When branches are correctly undertaken, the pipeline proceeds at full speed. • When branches are taken, we have some stalls.

Page  243

© NTK 2009

Branch prediction is solution to control hazards

Page  244

© NTK 2009

Branch prediction  A more sophisticated version of branch prediction would have some branches predicted as undertaken, and some as taken: – For usual branch like previous example, we predict branch as undertaken. – For branches at the loops, branches are usually jump back to the top of loop. So branches are predicted as taken. – Dynamic hardware predictor: • May change predictions for a branch over the life of a program. • Keeping a history table for each branch as taken or not taken, the using the past behavior to predict the future.

Page  245

© NTK 2009

Solution 3: Delayed decision (used by MIPS)  The delayed branch always executes the next sequential instruction, with branch taking place after that one instruction delay.  Compiler and assembler try to place an instruction that always executes after the branch in the branch delay slot. add t1,t2,t4 beq s0,s1,LABEL or t5,t6,t7 LABEL: ....

Page  246

beq s0,s1,LABEL add t1,t2,t4 or t5,t6,t7 LABEL: ....

© NTK 2009

Scheduling the branch delay slot

Page  247

© NTK 2009

Pipeline summary  We have seen three models of execution: single cycle, multicycle, and pipelined.  Pipelined control strives for 1 clock cycle per instruction, like single cycle, but also for a fast clock cycle, like multicycle.

Page  248

© NTK 2009

FINISH

Page  249

© NTK 2009

Related Documents


More Documents from "Kuldeep Sahu"

May 2020 3
May 2020 1
May 2020 4
Vhdl
May 2020 12