Enhancing Performance With Pipelining

Uploaded by: api-26072581
0
0

July 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Enhancing Performance With Pipelining as PDF for free.

More details

Words: 1,587
Pages: 71

Preview
Full text

CSE 313

Enhancing Performance with Pipelining

Pipeline motivation 

Need both low CPI and high frequency for best performance  Want

CPI

a multicycle for high frequency, but need better



Idea behind pipelining is to have a multicycle implementation that operates like a factory assembly line



Each “worker” in the pipeline performs a particular task, hands off to the next “worker”, while getting new work

Pipeline motivation 

Tasks should take about the same time – if one “worker” is much slower than the rest, then other “workers” will stand idle



Once the assembly line is full, a new “product” (instruction) comes out of the back-end of the line each time period



In a computer assembly line (pipeline), each task is called a stage and the time period is one clock cycle

MIPS 5-stage pipeline



Like single cycle datapath but with registers separating each stage

MIPS 5-stage pipeline 

5 stages for each instruction  IF:

instruction fetch  ID: instruction decode and register file read  EX: instruction execution or effective address calculation  MEM: memory access for load and store  WB: write back results to register file 

Delays of all 5 stages are relatively the same



Staging registers are used to hold data and control as instructions pass between stages



All instructions pass through all 5 stages



As an instruction leaves a stage in a particular clock period, the next instruction enters it

Pipeline operation for lw 

Stage 1: Instruction fetch

Pipeline operation for lw 

Stage 2: Instruction decode and register file read



What happens to the instruction info in IF/ID?

Pipeline operation for lw 

Stage 3: Effective address calculation

Pipeline operation for lw 

Stage 4: Memory access

Pipeline operation for lw 

Stage 5: Write back



Instruction info in IF/ID is gone – won’t work

Modified pipeline with write back fix 

Write register bits from the instruction must be carried through the pipeline with the instruction

Pipeline operation for lw 

Pipeline usage in each stage for lw

Pipeline operation for sw 

Stage 3: Effective address calculation

Pipeline operation for sw 

Stage 4: Memory access

Pipeline operation for sw 

Stage 5: Write back (nothing)

Pipeline operation for lw, sub sequence

Pipeline operation for lw, sub sequence

Pipeline operation for lw, sub sequence

Pipeline operation for lw, sub sequence

Pipeline operation for lw, sub sequence

Pipeline operation for lw, sub sequence

Graphical pipeline representation 

Represent overlap of pipelined instructions as multiple pipelines skewed by a cycle

Another useful shorthand form

Pipeline control 

Basic pipeline control is similar to the single cycle implementation

Pipeline control 

Control for an instruction is generated in ID and travels with the instruction and data through the pipeline



When an instruction enters a stage, it’s control signals set the operation of that stage

Pipeline control

Multiple instruction example 

For the following code fragment lw sub and or add

$10, 20($1) $11, $2, $3 $12, $4, $5 $13, $6, $7 $14, $8, $9

show the datapath and control usage as the instruction sequence travels down the pipeline

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

Multiple instruction example

How the MIPS ISA simplifies pipelining 

Fixed length instruction simplifies  Fetch

– just get the next 32 bits  Decode – single step; don’t have to decode opcode before figuring out where to get the rest of the fields 

Source register fields always in same location  Can



read source registers during decode

Load/store architecture  ALU

can be used for both arithmetic and EA calculation  Memory instruction require about same amount of work as arithmetic ones, easing pipelining of the two together 

Memory data must be aligned  Read

or write accesses can be done in one cycle

Pipeline hazards 

A hazard is a conflict, regarding data, control, or hardware resources



Data hazards are conflicts for register values



Control hazards occur due to the delay to execute branch and jump instruction



Structural hazards are conflicts for hardware resources, such as A

single memory for instructions and data  A multi-cycle, non-pipelined functional unit (such as a divider)

Data dependences 



A read after write (RAW) dependence occurs when the register written by an instruction is a source register of a subsequent instruction lw

$10, 20($1)

sub

$11, $10, $3

and

$12, $4, $11

or

$13, $11, $4

add

$14, $13, $9

Also have write after read (WAR) and write after write (WAW) data dependences (later)

Pipelining and RAW dependences 

RAW dependences that are close by may cause data hazards in the pipeline



Consider the following code sequence: sub and or add sw



$2, $1, $3 $12, $2, $6 $13, $6, $2 $14, $2, $2 $15, 100($2)

What are the RAW dependences?

Pipelining and RAW dependences 

Data hazards with first three instructions

hazard hazard ok ok

Forwarding 

Most RAW hazards can be eliminated by forwarding results between pipe stages

at this point, result of sub is available

Detecting forwarding 

Rd of the instruction in MEM or WB must match Rs and/or Rt of the instruction in EX



The instruction in MEM or WB must have RegWrite=1 (why?)



Rd must not be $0 (why?)

Detecting forwarding from MEM to EX 

To the upper ALU input (ALUupper)  EX/MEM.RegWrite

=1  EX/MEM.RegisterRd not equal 0  EX/MEM.RegisterRd = ID/EX.RegisterRs



To the lower ALU input (ALUlower)  EX/MEM.RegWrite

=1  EX/MEM.RegisterRd not equal 0  EX/MEM.RegisterRd = ID/EX.RegisterRt

Forwarding datapaths 

Bypass paths feed data from MEM and WB back to MUXes at the EX ALU inputs



Do we still have to write the register file in WB?

Detecting forwarding from WB to EX 

To the upper ALU input  MEM/WB.RegWrite

=1  MEM/WB.RegisterRd not equal 0  MEM/WB.RegisterRd = ID/EX.RegisterRs  The value is not being forwarded from MEM (why?)



To the lower ALU input  MEM/WB.RegWrite

=1  MEM/WB.RegisterRd not equal 0  MEM/WB.RegisterRd = ID/EX.RegisterRt  The value is not being forwarded from MEM

Forwarding control 

Control is handled by the forwarding unit

Forwarding example 

Show forwarding for the code sequence: sub

$2,

$1,

$3

and

$4,

$2,

$5

or

$4,

$4,

$2

add

$9,

$4,

$2

Forwarding example 

sub produces result in EX

Forwarding example 

sub forwards result from MEM to ALUupper

Forwarding example 

sub forwards result from WB to ALUlower



and forwards result from MEM to ALUupper

Forwarding example 

or forwards result from MEM to ALUupper

RAW hazards involving loads 

Loads produce results in MEM – can’t forward to an immediately following R-type instruction



Called a load-use hazard

RAW hazards involving loads 

Solution: stall the stages behind the load for one cycle, after which the result can be forwarded

Detecting load-use hazards 

Instruction in EX is a load  ID/EX.MemRead



=1

Instruction in ID has a source register that matches the load destination register  ID/EX.RegisterRt

= IF/ID.RegisterRs OR ID/EX.RegisterRt = IF/ID.RegisterRt

Stalling the stages behind the load 

Force nop (“no operation”) instruction into EX stage on next clock cycle  Force

ID/EX.MemWrite input to zero  Force ID/EX.RegWrite input to zero



Hold instructions in ID and IF stages for one clock cycle  Hold

the contents of PC  Hold the contents of IF/ID

Control for load-use hazards 

Control is handled by the hazard detection unit

Load-use stall example 

Code sequence: lw

$2,

20($1)

and

$4,

$2,

$5

or

$4,

$4,

$2

add

$9,

$4,

$2

Load-use stall example 

lw enters ID

Load-use stall example 

Load-use hazard detected

Load-use stall example 

Force nop into EX and hold ID and IF stages

Load-use stall example 

lw result in WB forwarded to and in EX



or reads operand $2 from register file

Load-use stall example 

Pipeline advances normally

Control hazards 

Taken branches and jumps change the PC to the target address from which the next instruction is to be fetched



In our pipeline, the PC is changed when the taken beq instruction is in the MEM stage



This creates a control hazard in which sequential instructions in earlier stages must be discarded

beq instruction that is taken instri+3



instri+2

instri+1

instri+1 , instri+2 , instri+3 must be discarded

beq $2,$3,7

beq instruction that is taken



In this example, the branch delay is three

Reducing the branch delay 

Reducing the branch delay reduces the number of instructions that have to be discarded on a taken branch



We can reduce the branch delay to one for beq by moving both the equality test and the branch target address calculation into ID



We need to insert a nop between the beq and the correctly fetched instruction

Reducing the branch delay

beq with one branch delay 

Register equality test done in ID by a exclusive ORing the register values and NORing the result



Instruction in ID forced to nop by zeroing the IF/ID register



Next fetched instruction will be from PC+4 or branch target depending on the beq outcome

beq with one branch delay 

beq in ID; next sequential instruction (and) in IF

beq with one branch delay 

bubble in ID; lw (from taken address) in IF

Enhancing Performance With Pipelining

Overview

More details

Related Documents

Enhancing Performance With Pipelining

Chapter 6 Enhancing Performance With Pipelining

Pipelining

Enhancing Heat Exchange Performance

Mechanisms Involved In Enhancing Human Performance By

Problems With Performance Appraisal