Assignment: Introduction to High Performance Computing
1. Answer the following questions
i. Try to write pseudo-code for the tree-structured global sum illustrated in Figure 1. Assume the number of cores is a power of two (1, 2, 4, 8, . . . ).
FIGURE 1: Multiple cores forming a global sum
Hints: Use a variable divisor to determine whether a core should send its sum or receive and add. The divisor should start with the value 2 and be doubled after each iteration. Also use a variable core difference to determine which core should be partnered with the current core. It should start with the value 1 and also be doubled after each iteration. For example, in the ?rst iteration 0 % divisor = 0 and 1 % divisor = 1, so 0 receives and adds, while 1 sends. Also in the ?rst iteration 0 + core difference = 1 and 1 core difference = 0, so 0 and 1 are paired in the ?rst iteration.
ii. As an alternative to the approach outlined in the preceding problem, we can use C's bitwise operators to implement the tree-structured global sum. In order to see how this works, it helps to write down the binary (base 2) representation of each of the core ranks, and note the pairings during each stage:
Cores
|
Stages
|
1
|
2
|
3
|
010 = 0002
|
110 = 0012
|
210 = 0102
|
410 = 1002
|
110 = 0012
|
010 = 0002
|
X
|
x
|
210 = 0102
|
310 = 0112
|
010 = 0002
|
x
|
310 = 0112
|
210 = 0102
|
X
|
x
|
410 = 1002
|
510 = 1012
|
610 = 1102
|
010 = 0002
|
510 = 1012
|
410 = 1002
|
X
|
x
|
610 = 1102
|
710 = 1112
|
410 = 1002
|
x
|
710 = 1112
|
610 = 1102
|
x
|
x
|
From the table we see that during the ?rst stage each core is paired with the core whose rank differs in the rightmost or ?rst bit. During the second stage cores that continue are paired with the core whose rank differs in the second bit, and during the third stage cores are paired with the core whose rank differs in the third bit. Thus, if we have a binary value bitmask that is 0012 for the ?rst stage, 0102 for the second, and 1002 for the third, we can get the rank of the core we're paired with by "inverting" the bit in our rank that is nonzero in bitmask. This can be done using the bitwise exclusive or ^ operator.
Implement this algorithm in pseudo-code using the bitwise exclusive or and the left-shift operator.
iii. What happens if your pseudo-code in problem i or ii is run when the number of cores is not a power of two (e.g., 3, 5, 6, 7)? Can you modify the pseudo-code so that it will work correctly regardless of the number of cores?
iv. Suppose the faculty are going to have a party for the students in the department.
a. Identify tasks that can be assigned to the faculty members that will allow them to use task-parallelism when they prepare for the party. Work out a schedule that shows when the various tasks can be performed.
b. We might hope that one of the tasks in the preceding part is cleaning the house where the party will be held. How can we use data-parallelism to partition the work of cleaning the house among the faculty?
c. Use a combination of task- and data-parallelism to prepare for the party. (If there's too much work for the faculty, you can use TAs to pick up the slack.)
2. Write a research paper on High Performance Computing. Address the items given below:
a. How can you use High Performance Computing in your research or life? Give two examples.
b. Why do we need to write parallel programs? (read pages 1-3 from your book)
c. Discuss the local and global impact of high performance computing on individuals, organizations, and society.