Question
Consider a system with 2 multiprocessors with following configurations-
(a) Machine 1, a NUMA machine with 2 processors, each with local memory of 512 MB with local memory access latency of 20 cycles per word and remote memory access latency of 60 cycles per word.
(b) Machine 2, a UMA machine with 2 processors, with a shared memory of 1GB with access latency of 40 cycles per word.
assume an application has two threads running on the 2 processors, each of them need to access an entire array of 4096 words, is it possible to partition this array on local memories of the NUMA machine so that the application runs faster on it rather than the UMA machine? If so, specify the partition. If not, by how many more cycles have to the UMA memory latency be worsened for a partitioning on the NUMA machine to enable a faster run than the UMA machine? Suppose that the memory operations dominate execution time.