1. Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as follows:
Instruction Class
|
Machine MI. -
|
Machine M2 -
|
Frequency
|
|
Cycles/Instruction
|
Cycles/Instruction
|
|
|
Class
|
Class
|
|
A
|
1
|
2
|
60%
|
B
|
2
|
3
|
30%
|
C
|
4
|
4
|
10%
|
(a) Calculate the average CPI for each machine, M1, and M2.
(b) Calculate the average MIPS ratings for each machine, M1 and M2.
(c) Which machine has a smaller MIPS rating? Which individual instruction class CPI do you need to change, and by how much, to have this machine have the same or better performance as the machine with the higher MIPS rating (you can only change the CPI for one of the instruction classes on the slower machine)?
2. Suppose that we can improve the floating point instruction performance of machine by a factor of 15 (the same floating point instructions run 15 times faster on this new machine). What percent of the instructions must be floating point to achieve a Speedup of at least 4?
3. In the snippet of MIPS assembler code below, how many times is instruction memory accessed? How many times is data memory accessed? (Count only accesses to memory, not registers.)
lw $v1, 0($a0)
addi $v0, $v0, 1
sw $v1, 0($a1)
addi $a0, $a0, 1
4. Use the register and memory values in the table below for this question. Assume a 32-bit machine. Assume each of the following questions starts from the table values; that is, DO NOT use value changes from one question as propagating into future parts of the question.
Register Value Memory Location Value
|
R1
|
12
|
12
|
16
|
|
R2
|
16
|
16
|
20
|
|
R3
|
20
|
20
|
24
|
|
R4
|
24
|
24
|
28
|
|
a) Give the values of R1, R2, and R3 after this instruction: add R3, R2, R1
b) What values will be in R1 and R3 after this instruction is executed: load R3, 12(R1)
c) What values will be in the registers after this instruction is executed: addi R2, R3, #16
5. This problem covers floating-point IEEE format.
(a) List four floating-point operations that cause NaN (Not a Number) to be created?
(b) Assuming single precision IEEE 754 format, what decimal number is represent by this word:
1 01111101 00100000000000000000000
6. Perform the following operations by converting the operands to 2's complement binary numbers and then doing the addition or subtraction shown. Please show all work in binary, operating on 16-bit numbers.
Please follow the format in the given example: 3 + 12
0000 0000 000C OCI1 (3)
0000 0000 0000 1100 (121
0000 0000 0000 1111
(a) 13 - 2
(b) 5 - 6
(c) -7 - (-7)
7. Consider the following assembly language code:
I0: ADD R4 = R1 + R0;
I1: SUB R9 = R3 - R4;
I2: ADD R4 = R5 + R6;
I3: LDW R2 = MEM[R3 + 100];
I4: LDW R2 = MEM[R2 + 0];
I5: STW MEM[R4 + 100] = R2;
I6: AND R2 = R2 & R1;
I7: BEQ R9 == R1, Target;
I8: AND R9 = R9 & R1;
Consider a pipeline with forwarding, hazard detection, and 1 delay slot for branches. The pipeline is the typical 5-stage IF, ID, EX, MEM, WB MIPS design. For the above code, complete the pipeline diagram below (instructions on the left, cycles on top) for the code. Insert the characters IF, ID, EX, MEM, WB for each instruction in the boxes. Assume that there two levels of bypassing, that the second half of the decode stage performs a read of source registers, and that the first half of the write-back stage writes to the register file.
Label all data stalls (Draw an X in the box). Label all data forwards that the forwarding unit detects (arrow between the stages handing off the data and the stages receiving the data). What is the final execution time of the code?
8. This question covers your understanding of dependences between instructions. Using the code below, list all of the dependence types (RAW, WAR, WAW). List the dependences in the respective table (example INST-X to INST-Y) by writing in the instruction numbers involved with the dependence.
I0: A = B + C;
I1: C = A - B;
I2: D = A + C;
I3: A = B * C * D;
I4: C = F / D;
I5: F = A ˆ G;
I6: G = F + D;
9. A two-part question. (Part A) Assume the following 10-bit address sequence generated by the microprocessor:
Urn.
Access
|
0 10001101
|
1 10110010
|
2 10111111
|
3 10001100
|
4 10011100
|
5 11101001
|
6 11111110
|
7 11101001
|
TAG
|
|
|
|
|
|
|
|
|
SET
|
|
|
|
|
|
|
|
|
INDEX
|
|
|
|
|
|
|
|
|
The cache uses 4 bytes per block. Assume a 2-way set associative cache design that uses the LRU algorithm (with a cache that can hold a total of 4 blocks). Assume that the cache is initially empty. First determine the TAG, SET, BYTE OFFSET fields and fill in the table above. In the figure below, clearly mark for each access the TAG, Least Recently Used (LRU), and HIT/MISS information for each access.