Question: Suppose a CPU that implements four parallel fetch-execute pipelines for superscalar processing.
Show the performance improvement over scalar pipeline processing and no-pipeline processing, assuming an instruction cycle similar to figure 4.1 in the commentary, i.e.:
- a one clock cycle fetch
- a one clock cycle decode
- a two clock cycle execute
and a 60 instruction sequence:
No pipelining would require __?__ clock cycles:
A scalar pipeline would require __?__ clock cycles:
A superscalar pipeline with four parallel units would require __?___ clock cycles: