Computer Architecture and Organization Homework
Clustered Pipeline Architecture?:
One of the benefits of a clustered pipeline architecture is that we have smaller register files in each cluster. Smaller register files translates into faster access to registers. Additionally, splitting the physical register file into multiple smaller pieces allows for more registers to be accessed in parallel for minimal power and area overhead. For architectures with high degree of parallelism (8 way superscalar as an example), many registers will be accessed at the same time, thus it is important to reduce RF (Register File) access time. One unwanted side effect is the extra communication overhead between clusters due to possible intercluster dependencies.
For more information about this architecture and its benefits and drawbacks you can read: https://www.hpl.hp.com/techreports/98/HPL-98-204.pdf
Review (Dispatch Bound VS Issue Bound):?
Dispatch Bound?: In a dispatch-bound architecture, issue queue entries contain fields to store instruction input values. When an instruction is decoded, its source registers are read from the register file when ready, and when the instruction moves to the issue queue (IQ), the values are also sent with the instruction. If the source registers are not ready, dependency information (source register numbers) will be forwarded to the issue queue and later captured into the IQ data fields once the producing instructions complete.
Question 1:
MOVC R1 #5
MOVC R2 #10
MOVC R3 #15
MOVC R4 #91
MOVC R5 #20
ADD R1 R1 #100
ADD R6 R1 R2
MULT R7 R6 R2
LOAD R8 R5 #500
DIV R9 R8 R4
STORE R7 #100 #200
STORE R9 #200 #300
Assumption?: You have an instruction "CLUSTER #NO" instruction that directs all subsequent instructions to the cluster specified by the instruction operand. For example, the following instruction sequence will send the ADD to cluster 1 and MULT to cluster 2.
Cluster #1
ADD R1 R1 R1
Cluster #2
MULT R2 R2 R3
Please reorganize the provided instruction sequence utilizing the cluster command to minimize intercluster communication.
Question 2:
Please justify your answer to question 1.
Question 3:
For the purpose of this question, you are allowed to add new data structures or modify existing structures inside the decode stage. Also assume that each cluster has limited capacity in terms of the number of instructions its IQ can hold at a time. Propose any necessary changes so that dependent instructions, as much as possible, are executed in the same cluster. Briefly describe logic changes to the decode stage.
Note: ?Assume FRAT belongs to the decode stage.
Question 4:
While it is true that executing all dependent instructions on one cluster will reduce intercluster dependency, this may limit parallelism and utilization of all clusters. Please suggest a smarter mechanism that would achieve an acceptable tradeoff, reducing intercluster dependencies and maintaining good level of parallelism.
Question 5:
Describe one scenario where dispatch-bound would more efficient than issue-bound for this architecture.
Question 6:
This question is similar to variation 2 for the architecture described in slide Lecture Slides 3, slide number 104. However, in this new proposed variation, each Rob entry holds a copy of the instruction result.
Given the following:
1. N_R: Number of physical registers
2. N_S: Number of IQ entities
3. W_Matrix -- Its entries are ready for you to use by the decode stage
4. RAT -- RAT entries will always point to physical register. It will never point to architectural register.
5. Each RoB Entry contains:
a. Destination Physical Register Number (P_Dest)
b. Destination Architectural Register Number (A_Dest)
c. A memory to hold the result of the instruction once ready (INST_VALUE)
6. Free_List: A list of free physical registers
7. There is no Renamed[] vector.
At the decode stage, upon allocating a new register and performing renaming, there is a chance of freeing a physical register. Assuming the destination register number is P_Dest, write a pseudocode that would free a physical register when necessary.
Question 7:
During instruction execution, there are other parts of the CPU that can determine when a physical register might be freed. Briefly describe where else we can put freeing logic and explain why.