Problem
Consider a scalable multiprocessor with p processing nodes and distributed shared memory. Let R be the rate of each processing node generating a request to access remote memory through the interconnection network. Let L be the average latency for remote memory access. Derive expressions for the processor efficiency E under each of the following conditions:
(a) The processor is single-threaded. use only a private cache. and has no other latency-hiding mechanisms. Express E as a function of R and L.
(b) Suppose a coherent cache is supported by hardware with proper data sharing and h is the probability that a remote request can be satisfied by a local cache. Express E as a function of R, 1, and h.
(c) Now assume each processor is multithreaded to handle N contexts simultaneously. Assume a context-switching overhead of C. Express E as a function of N, R, L, h, and C.
(d) Now consider the use of a 2-D r x r torus with r2 = p and bidirectional links. Let tm be the time delay between adjacent nodes and t., be the local memory-access time. Assume that the network is fast enough to respond to each request without buffering. Express the latency L as a function of p. to and tn.. Then express the efficiency E as a function of N. R, h, C, p. t4 and t„,.