Parallel processing: Pipelining
Pipeline Processing It is a technique of decomposing a sequential process (task) into suboperations, with each subprocess (subtask) being executed in a special dedicated hardware stage that operates concurrently with all other stages in the pipeline. This is also called Overlapped Parallelism.
Linear Pipelining • Cascade of processing stages • Stages are pure combinational circuits performing arithmetic or logic operations over the data stream flowing through the pipe. • Stages are separated by high speed interface called latches. • Common clock applied to all latches.
• Clock Period: – Each stage Si has a time delay ti – Each interface latch has a time delay tl Clock Period of a linear pipeline is defined by: t = max { ti }1k + tl = tm + tl
• Speedup: – Speedup of a k-stage linear-pipeline processor over an equivalent nonpipeline processor as: Sk = T1/T2 = n . k / k + (n-1) – Maximum Speedup is Sk -> k, for n >> k
Pipeline Vs Non-Pipeline • Pipeline executing n tasks clock periods: Tk = k + (n-1) ; k = no. of stages of a pipeline # once the pipe is filled up, it will output one result per clock cycle
• Non-Pipeline executing n tasks clock periods: Tk = k . n
Pipelined CPU CPU can be partitioned into 3 sections: • • •
Instruction unit Instruction queue Execution unit
• Instruction Unit consists of pipeline stages for instruction fetch, operand address calculation, and operand fetches, if needed. • Instruction queue is a FIFO storage area for decoded instructions and fetched operands. • Execution unit contains multiple functional pipelines for arithmetic & logic functions.
Classification of Pipeline processors • Instruction Pipeline • Arithmetic Pipeline • Processor Pipeline
Instruction Pipeline •
Each instruction processing requires the following sequence of steps: – – – –
• •
Instruction fetch from the memory (IF) Instruction Decode and effective address calculation (ID) Fetch the operands from the memory : operand fetch (OF) Execution of the instruction (EX)
Execution of a stream of instructions by overlapping the execution of current instruction with the fetch, decode and operand fetch of subsequent instructions. This is also called instruction lookahead.
3 difficulties that cause Instruction pipeline to deviate from its normal operation are: 1. Resource Conflicts 2. Data Dependency 3. Instruction Branching
1. Resource Conflicts: can be resolved by using separate instruction and data memories
2. Data Dependency: – Collision of data or address: » Data dependency » Address dependency – Resolution: » Use of hardware interlocks
3. Branch handling: – An instruction may cause a branch out of sequence – Breaking the normal execution of an instruction stream – In such a case; • The pipeline must be emptied • All the instructions that have been read from memory after the branch instruction must be discarded
Example: 4-segment Instruction Pipeline
– Resolution: prefetch target instruction in addition to the instruction following the branch instruction
Arithmetic Pipeline • Used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations. • The various operations are decomposed into suboperations.
Example of Arithmetic pipeline: Floating point addition and subtraction A and B are the fractional parts x and y are the exponents
Suboperations: 1. 2. 3. 4.
Compare the exponents Align the fractional part Add or subtract the fractional part Normalize the result
Unifunction Vs Multifunction Pipelines • Unifunction Pipeline is one with a fixed and dedicated function. – Example: Floating-point adder
• Multifunction Pipeline may perform different functions, either at different times or at same time, by interconnecting different subsets of stages in the pipeline. – Example: T1-ASC has 4 multifunction pipeline processors, each of which is reconfigurable for a variety of arithmetic and logic functions at different times.
Static Vs Dynamic Pipelines • Static Pipeline may assume only one functional configuration at a time. • Can be unifunction or multifunction • Instructions of same type are executed continiously. • Dynamic Pipeline processor permits several functional configurations to exist simultaneously. • Must be multifunction.