Throughout, be sure to type your homework; you may want to use the electronic version of the homework as you can then just insert your answers where they need to be. Show your work on the problems so you can get partial credit for incorrect answers (and tips from the grader about where you went wrong). Writing your units in the calculations will help ensure you wind up with a correct value. When done with a problem, ask yourself: "Does the answer make sense?" Please remember that the homework in here requires some time to complete-it is best if you work on it several times between now and the due date.
1. Three processors P1, P2, and P3 have the same ISA. Their clock rates and average CPI are as follows:
|
P1
|
P2
|
P3
|
Clock Rate
|
3 GHz
|
2.5 GHz
|
4.0 GHz
|
Average CPI
|
1.5
|
1.0
|
2.2
|
a. Which processor has the highest performance?
b. How much faster is the fastest processor than the slowest processor?
c. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions for each processor.
|
P1
|
P2
|
P3
|
Number of cycles:
|
|
|
|
Number of instructions:
|
|
|
|
d. For processor P1, we want to reduce the execution time of the reference program from 10 seconds to 7 seconds, but the change leads to an increase of 20% in the CPI. What clock rate is required to reach the target reduction in execution time?
2. The following table shows the number of instructions for programs A and B on a processor:
|
Arith
|
Store
|
Load
|
Branch
|
Total
|
Prog A
|
650
|
100
|
600
|
50
|
1400
|
Prog B
|
750
|
250
|
500
|
500
|
2000
|
Assume that arithmetic instructions take 1 clock cycle, load and store each take 5 cycles, and branches require 2 cycles.
a. What is the execution time of each program in a 2 GHz processor?
b. Find the average CPI for each program.
c. If the number of load instructions can be reduced by one half (thus also reducing the total number of instructions), calculate the new average CPI of each respective program over its original version?
d. What is the speedup of each revised program in the previous problem compared with the original versions?
3. Consider two implementations of the same ISA, P1 and P2, with five classes of instructions (A through E) in the instruction set. P1 has a 4 GHz clock rate, while P2 has a 6 GHz clock rate. The average number of cycles for each instruction class are in the following table:
|
CPI Classes:
|
|
A
|
B
|
C
|
D
|
E
|
P1
|
1
|
2
|
3
|
4
|
5
|
P2
|
3
|
3
|
3
|
5
|
5
|
a. Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence (ie: the instructions can be chosen to maximize performance). What are the peak performances of P1 and P2 expressed in instructions per second?
b. If the number of instructions in a program has equal numbers of all instruction classes except for class A, which occurs twice as often as the others, how much faster is P2 than P1?
c. Using the same instruction mix, what clock frequency for P1 will give it the same performance as P2?
4. Consider a CPU manufacturing process using 300mm wafers.
a. If the dimensions of the die are 1.2cm x 0.8cm, what is the approximate number of dies produced?
b. Assuming a defect density of 0.55/cm2, what is the die yield?
c. If the wafer costs $300, what is the price of a die?
d. If the chip price is 22% more than the cost per die, what is the chip price?
5. Consider a program comprising 62% arithmetic instructions, 16% load instruction, 8% store instructions, and 14% branch instructions. Assume the CPI for all instructions is 2, except for branches which have a CPI of 3.
a. Suppose we consider two alternate improvements to a processor. P1 will execute arithmetic instructions twice as fast. P2 will execute both load and store instructions 3 times faster. In each case, other instructions are unaffected by the changes. Which is faster, P1 or P2, and by how much?
b. Consider running the program on a machine with a large graphics card. When we run the program on this machine, the arithmetic instructions only can be run in parallel on the card, everything else is run sequentially. As the number of stream processors on the GPU goes toward infinity, what is the maximum speedup obtainable on this program?
c. In the previous problem, the GPU improvement was applied toward arithmetic instructions. Assuming that you could apply the GPU to any one instruction class, this is still the smartest choice. What "Great Idea in Computer Architecture" is this an example of?
d. Assuming we can apply the GPU improvement to both load and store instructions. With infinitely many streaming GPU processors, is it possible to make the program run two times faster? Explain.
6. Compilers can dramatically affect performance of some applications. Assume compiler A results in a dynamic instruction count of 1.0E9 with an execution time of 1.1 seconds. Compiler B results in a dynamic IC of 1.2E9 and an execution time of 1.5 seconds.
a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.
b. Assume P1 executes compilers A's code, and P2 executes compiler B's code, and the execution times are the same. How much faster is the clock on P1 versus P2?
c. A third compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. Calculate the speedup using this new compiler relative to both compiler A and compiler B.
7. Consider a reference program. Running on processor P1, with a clock rate of 4 GHz and an average CPI of 0.9, it requires execution of 5.0E9 instructions. Running on processor P2, with a clock rate of 3 GHz and an average CPI of 0.75, it requires execution of 1.0E9 instructions to complete.
a. One fallacy is that the fastest clock rate equates to the best performance. Calculate the execution times. Is the chip with the fastest clock the one with the best performance?
b. Another fallacy is to use MIPS (millions of instructions per second) to compare performance of two processors. Calculate the MIPS for each chip. Does the chip with the larger MIPS on this program also the faster one?
c. Another fallacy is to use MFLOPS (millions of floating-point operations) as a performance metric. MFLOPS is calculated as:
MFLOPS=(# floating pt operations)/(execution time · 10^6 )
d. Assume 35% of the instructions on both processors are floating-point operations. Find the MFLOPS figures for both programs. Is the chip with the highest MFLOPS also the fastest on this program?
8. Suppose we run a program on various parallel architectures. 75% of the program can be parallelized. What is the speedup for 2, 8, 32 and 128 processor machines? What is the maximum speedup achievable on this program?
9. Computationally hard problems are an important part of computer science. In such problems, the execution time of a solution program often grows exponentially as a size of the input. If I need to be able to substantially increase the size of the input I can handle on such a problem in order to solve a problem instance of interest, why is parallelizing the code likely to have little effect?