Problem
Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction)
DADDI R3,R0,8
DADDI R1,R0,1024
DADDI R2,R0,1024
Loop: L.D F0,0(R1)
MUL.D F0,F0,F2
L.D F4,0(R2)
ADD.D F0,F0,F4
S.D F0,0(R2)
DSUB R1,R1,R3
DSUB R2,R2,R3
BNEZ R1,Loop
HALT
(a) Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Show the rearranged loop. Can you reduce the stalls for this code?
(b) Now, transform the loop by unrolling the loop and reschedule the instructions to minimize stalls.