Question 1) a) Write the MPI program for multiplying two n x n matrices, A and B, with each processor producing a row-band of matrix C. P0 will send row-bands of A and all of B to slaves.
b) Folder: MPI multiplication program - column of B rotated in a ring topology
Instructions: B's column bands are distributed among processors, and rotated in a ring topology. Timing plots varying p = 1, 2, ..., 8 and n = 50, 100, 200, 500, etc.
c) Folder: Pthread matrix multiplication and reduction programs
(a) Re-implement your multiplication program with each process/thread computing a band of matrix C. No synchronization is needed here, as all three matrices can be allocated in the shared memory (i.e., in the global scope).
(b) Use a ring pattern to perform reduction over matrix A. For this, suppose the above processor to row-band allocation, and find row sums of matrix A in parallel with output going into the first column of A. Then, have P0 find the column sum of the first column. You will need a barrier.