Consider a sequence of instructions, where the syntax consists of an opcode followed by the destination register, followed by one or two source registers.
0 ADD R3, R1, R2
1 LOAD R6, [R3]
2 AND R7, R5, 3
3 ADD R1, R6, R7
4 SRL R7, R0, 8
5 OR R2, R4, R7
6 SUB R5, R3, R4
7 ADD R0, R1, R10
8 LOAD R6,[R5]
9 SUB R2, R1, R6
10 AND R3,R7,15
Assume the use of a four-stage pipeline: fetch, decode/issue, execute, and write back. Assume that all pipeline stages take one clock cycle except the execute stage. For simple integer arithmetic and logical instructions, the execute stage takes one cycle, but for LOAD from memory, five cycles are consumed in the execute stage. If we have a simple scalar pipeline, but allow out-of-order execution, we can construct a table showing for the execution of the first seven instructions.
Instruction Fetch Decode Execute Write back
0 0 1 2 3
1 1 2 4 9
2 2 3 5 6
3 3 4 10 11
4 4 5 6 7
5 5 6 8 10
6 6 7 9 12
The entries under the four pipeline stages indicate the clock cycle at which each instruction begins each phase. In this program , the second ADD instruction (instruction 3) depends on the load (instruction 1) for one of its operands, r6. Because the LOAD instruction takes five clock cycles, and the issue logic encounters the dependent ADD instruction after two clocks, the issue logic must delay the ADD instruction for three clock cycles. With an out of order capability, the processor can stall instruction 3 at clock cycle 4, and then move on to issue the following three independent instructions, which enter execution at clocks 6, 8, and 9. The LOAD finishes execution at clock 9, and so the dependent ADD can be launched into execution on clock 10.
Questions to answer:
a) Complete the preceding table
b) Redo the table, assuming no out-of-order capability. What is the savings using the capability?
c) Redo the table assuming a superscalar implementation that can handle two instructions at a time at each stage.