1. Discuss the pros and cons of shared vs. private L2 caches for both single-threaded, multi-threaded, and multi programmed workloads, and reconsider them if having on-chip L3 caches.
2. Assume both benchmarks have a base CPI of 1 (ideal L2 cache). If having non-blocking cache improves the average number of concurrent L2 misses from 1 to 2, how much performance improvement does this provide over a shared L2 cache? How much improvement can be achieved over private L2?