Clustered Pipeline Architecture:
One of the benefits of clustered pipeline architecture is that we have smaller register files in each cluster. Smaller register files translates into faster access to registers. Additionally, splitting the physical register file into multiple smaller pieces allows for more registers to be accessed in parallel for minimal additional power and area overhead. For architectures with high degree of parallelism (8 way superscalar as an example), many registers will be accessed at the same time, thus it is important to reduce RF (Register File) access time. One unwanted side effect is the extra communication overhead between clusters due to possible intercluster dependencies.
For more information about this architecture and its benefits and drawbacks you can read :
https://www.hpl.hp.com/techreports/98/HPL-98-204.pdf
Review (Dispatch Bound VS Issue Bound):
Dispatch Bound: In dispatch bound, the issue queue contains reservation stations (Memory elements that will hold the value of dependents once ready). For example, when an instruction is decoded, its dependents are read from the register file when ready, and when the instruction moves to the functional unit, the values are also sent with the instruction.
If the dependants are not ready, these dependants will be forwarded to the issue queue and stored in the reservation entities once the producing instruction completes.
Question 1
MOVC R1 #5
MOVC R2 #10
MOVC R3 #15
MOVC R4 #91
MOVC R5 #20 ADD R1 R1 #100 ADD R6 R1 R2 MULT R7 R6 R2 LOAD R8 R5 #500 DIV R9 R8 R4
STORE R7 #100 #200
STORE R9 #200 #300
Assumption: You have an instruction "CLUSTER #NO" that directs all subsequent instructions to the cluster determined by the instruction operand. For example, the following instruction sequence will send the ADD to cluster 1 and MULT to cluster 2.
Cluster #1 ADD R1 R1 R1
Cluster #2 MULT R2 R2 R3
Please re-order the provided instruction sequence utilizing the cluster command to minimize intercluster communication.
Question 2
Please justify your answer to question 1
Question 3
For the purpose of this question, you are allowed to add new data structures or modify existing structures inside the decode stage. Also assume that each cluster has limited capacity in terms of the number of instructions it can hold at a time. Propose any necessary changes so that dependent instructions, as much as possible, are executed in the same cluster. Briefly describe logic changes to the decode stage.
Note: Assume FRAT belongs to the decode stage.
Question 4
While it is true that executing all dependent instructions on one cluster will reduce intercluster dependency, this may limit parallelism and utilization of all clusters. Please suggest a smarter mechanism that would achieve an acceptable tradeoff reducing intercluster dependencies and maintaining good level of parallelism.
Question 5
Describe one scenario where dispatch-bound would more efficient than dispatch-bound.
Question 6
This question is similar to variation 2 for the architecture described in slide Lecture Slides 3, slide number 104. However, in this new proposed variation, each Rob entry holds the a copy of the instruction result.
Given the following:
1. N_R: Number of physical registers
2. N_S: Number of reservation entities
3. W_Matrix -- Its entries are ready for you to use by the decode stage
4. RAT -- RAT entries will always point to physical register. It will never point to architectural register.
5. Each RoB Entry contains (Destination Physical Register Number, Destination Architectural Register Number, A memory to hold the result of the instruction once ready). Assume Variables : Ph_NO, AR_NO, INST_VALUE respectively.
6. Free_List: A list of free physical registers
7. There is no Renamed[] vector.
At the decode stage, upon allocating a new register and performing renaming, there is a chance of freeing a physical register. Assuming the destination register number is R_Dest, write a pseudocode that would free a physical register when necessary.
Question 7
During instruction execution, there are other places where a physical might be freed. Briefly describe where else we need to put freeing logic and explain why.