Question: Suppose we have a two-processor distributed memory system in which floating point arithmetic proceeds at R flops per second. Assume that when one processor sends or receives a message of k floating point numbers, then a + {3k seconds are required. Proc{l} houses an n-by-n matrix A, and each processor houses a copy of an n-vector x. The goal is to store the vector y = Ax in Proc(l}'s local memory. You may assume that n is even.
(1) How long would this take if Proc{l} handles the entire computation itself?
(2) Describe how the two processors can share the computation. Indicate the data that must flow between the two processors and what they must each calculate. You do not have to write formal node programs. Clear concise English will do.
(3) Does it follow that if n is large enough, then it is more efficient to distribute the computation? Justify your answer.