Problem: Explore the cache statistics for two versions of matrix multiplication.
Description: Implement the matrix multiplication with two different versions:
- Version 1: Standard matrix multiplication
- Version 2: Matrix multiplication with loop interchanged.
Version 1:
For I = 0 to N-1
For J =0 to N-1
Sum = 0
For K =0 to N-1
Sum += A[I][K] * B [K][J]
C[I][J] = Sum
Version 2: Interchanged J and K loop.
//initialize C[I][J] to 0
For I =0 to N -1
For K = 0 to N -1
For J = 0 to N-1
C[I][J] += A[I][K] * B[K][J]
Find the cache statistics for both of these versions, for N = 10, 100, 1000. Following two cache configurations can be used for these experiments
Configuration 1: 4KB, Block size of 32 bytes, direct map
Configuration 2: 4KB, Block size of 32 byte, set associative with 2 sets.
Cache can be configured using the preference in ARMSim. For getting cache statistics, first put a breakpoint on SWI 0x11 instruction, and then note the statistics from cache-> statistics.
To simplify the experiment, you may left the data arrays un-initialize, therefore making the matrix 0 by default.