Hardware performance counters
- set of special-purpose registers built into modern microprocessors to store the counts of hardware related activities within computer systems
- low overhead compared to software based methods
- types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations.
Overflow handling
- generate an overflow signal after every threshold events are counted
- each counter has to be registered separately
- the value of each registered hardware counter is maintained separately
- (LONG_)LONG_MAX:
- 32 bit: 2,147,483,647
- 64 bit: 9,223,372,036,854,775,807
- overflow_handler(): user-defined function to process overflow events.
- function will be called by the PAPI library every time the threshold is reached overflow_vector: a bit-array that can be processed to determined which event(s) caused the overflow
- e.g. using PAPI_get_overflow_event_index()
• Software vs. hardware overflows:
- if processor does not support hardware overflow, software emulates it be periodically checking the counter values
- software overflow handling inaccurate and more expensive than hardware handling
- often implemented using a zero-crossing algorithm
- value of counter is set to -threshold and increased accordingly
1st Assignment
- Each student should deliver
- Please: no .o files and no executables!
- Documentation (.pdf, .doc, .tex or .txt file)
- In case of questions:
About the Project
- Given the source code for matrix-multiply operation( File hwmatmul. c).
- The code contains a trivial implementation of the matrix multiply operation and a blocked implementation
- The blocked implementation is called with block sizes of 16, 32, 64 and 128
- You can compile the C file, e.g. with
cc -O3 hw-matmul.c -o hw-matmul
- Once you added the PAPI functions
cc -o3 hw-matmul.c -o hw-matmul
-I/opt/papi/4.2.0/include -L/opt/papi/4.2.0/lib
-lpapi -lperfctr
- Run:
- allocate a node (see later in the lecture)
- type: ./hw-matmul
f he doesn't know t
Notes
- The PAPI version installed on shark is 4.2.0
- On the front-end node you can find tons ton's of examples in C and
Fortran on how to use PAPI in
/opt/papi/4.2.0/share/examples/ctests.
E.g.
- all_events.c -> how to check on a processor whether a counter
is available
- low-level.c -> how to use the low-level API of PAPI
- memory.c -> how to extract information of the memory
subsystem (e.g. cache sizes)
- overflow_index.c -> how to handle overflow correctly
he answer, he will ask me.
- Ask early, not the day before the submission is due