1. Given the following code and the assembler equivalent to the right:
for (i=999, i>=0, i--) x[i]=x[i]+y[i];
LOOP: LD F0, 0(R2) ;get x[i]
LD F1, 0(R3) ;get y[i]
DADD F2, F0, F1 ;multiply
SD F2, 0(R2) ;store back
DADDUI R2,R2,#-8
DADDUI R3,R3,#-8
BNE R2, R4, LOOP
a. Using Figure A-3, Indicate the number of stalls that would occur between the lines of code as seen in the book and our in class example. Write them between the lines to the right.
b. Unroll the loop so that two iterations are shown and rearrange the code to minimize the number of stalls while preserving the correctness of the code.
c. Show how the unrolled code might be executed in a VLIW processor with the units below, given the same latencies in Figure A-3.
Load/Store Unit
|
Load/Store Unit
|
FP Unit
|
FP Unit
|
Integer Unit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2. Given the code below
|
|
BGE F1, F0, SKIP ; Check for a small number (0) in F1 - F0 has a small number
LD F1, 0(R4) ; number was too small, load a fixed value for divide
SKIP DDIV F2, F2, F1 ; divide by F1, if too small a division overflow error could occur
|
|
We see that an error could occur if the incorrect branch were taken, and a number small enough to cause overflow were in F1. (BGE means branch on greater than, if F1>=F0 then branch). If we have a branch target buffer that predicts the branch taken when a value is in F1 that is too small (thus causing an error), explain how both the Tomasulo Algorithm and the Tomasulo Algorithm with ReOrder Buffers would preserve exception behavior.
3 In figure A-1, complete the fill out of the issue of the 4 instructions in the Tomasulo (without ROB) algorithm.