Implementing parallel loops using nested parallelism
Consider the following multithreaded algorithm for performing pairwise addition on n-element arrays A[1 .. n] and B[1.. n], storing the sums in C[1.. n]:
SUM-ARRAYS(A,B,C)
1 parallel for i = 1 to A. length
2 C[i] = A[i] + B[i]
a. Rewrite the parallel loop in SUM-ARRAYS using nested parallelism (spawn and sync) in the manner of MAT-VEC-MAIN-LOOP.
Analyze the parallelism of your implementation.
Consider the following alternative implementation of the parallel loop, which contains a value grain-size to be specified:
SUM-ARRAYS'(A,B,C)
1 n = A. length
2 grain-size = ? // to be determined
3 r = [n/grain-size]
4 for k = 0 to r - 1
5 spawn ADD-SUBARRAY(A,B,C, k . grain-size + 1, min((k + 1) . grain-size, n))
6 sync
ADD-SUBARRAY(A,B,C, I, j )
1 for k = i to j
2 C[k]= A[k]+ B[k]|
b. Suppose that we set grain-size = 1. What is the parallelism of this implementation?
c. Give a formula for the span of SUM-ARRAYS0 in terms of n and grain-size.
Derive the best value for grain-size to maximize parallelism.