Performance events can be used to detect when the instruction blocks start.
However, not every event is suitable for detection.
While some events might never occur for our code (e.g. SW_PREFETCH_ACCESS:PREFETCHW
), other events depict the execution of code, and only a few events show the preparation of instruction blocks (fetching and decoding).
Good candidates include events related to the activity of:
- Instruction cache (e.g.
L2_RQSTS:CODE_RD_MISS
) - Translation-lookahead buffer for instructions [iTLB] (e.g.
ITLB_MISSES:MISS_CAUSES_A_WALK
) - Front-end and instruction decoding ( e.g.
FRONTEND_RETIRED:L1I_MISS
)
To find out and validate which performance events can detect instruction blocks, we test performance events available on Skylake/Coffee Lake microarchitectures.
We start with the Intel document skylake_core_v46.tsv from which we select performance events available on both microarchitectures and correct/update their names to match the real-world processors.
Then, we run this simple expression sqrt(A(1:LEN_1D)) + B .* C(1:LEN_1D)
and observe which performance events match the beginning of instruction blocks.
What we want to observe is a spike of activity only visible at the beginning of an instruction block. Any performance event which generates various levels of activity with or without spikes is not well suited, because there might be two different instruction blocks having the same level of activity for a given performance event. Therefore, we are only interested in the transitioning activity, hence, the spikes.
The final experiment was conducted on the Intel® Core™ i7-8700 Processor @ 3.20GHz of the Coffee Lake microarchitecture.
The result table depicts performance events and indicates which of them can detect instruction blocks.
Performance event | Can detect instruction block? |
---|---|
ARITH:DIVIDER_ACTIVE |
🚫 |
BACLEARS:ANY |
✔️ |
BR_INST_RETIRED:ALL_BRANCHES |
🚫 |
BR_INST_RETIRED:CONDITIONAL |
🚫 |
BR_INST_RETIRED:FAR_BRANCH |
🚫 |
BR_INST_RETIRED:NEAR_CALL |
✔️ |
BR_INST_RETIRED:NEAR_RETURN |
✔️ |
BR_INST_RETIRED:NEAR_TAKEN |
🚫 |
BR_INST_RETIRED:NOT_TAKEN |
🚫 |
BR_MISP_RETIRED:ALL_BRANCHES |
✔️ |
BR_MISP_RETIRED:CONDITIONAL |
🚫 |
BR_MISP_RETIRED:NEAR_CALL |
✔️ |
BR_MISP_RETIRED:NEAR_TAKEN |
✔️ |
CPU_CLK_THREAD_UNHALTED:ONE_THREAD_ACTIVE |
🚫 |
CPU_CLK_THREAD_UNHALTED:REF_XCLK |
🚫 |
CPU_CLK_THREAD_UNHALTED:REF_XCLK_ANY |
🚫 |
CPU_CLK_THREAD_UNHALTED:RING0_TRANS |
🚫 |
CPU_CLK_THREAD_UNHALTED:THREAD_P |
🚫 |
CYCLE_ACTIVITY:CYCLES_L1D_MISS |
🚫 |
CYCLE_ACTIVITY:CYCLES_L2_MISS |
🚫 |
CYCLE_ACTIVITY:CYCLES_L3_MISS |
🚫 |
CYCLE_ACTIVITY:CYCLES_MEM_ANY |
🚫 |
CYCLE_ACTIVITY:STALLS_L1D_MISS |
🚫 |
CYCLE_ACTIVITY:STALLS_L2_MISS |
🚫 |
CYCLE_ACTIVITY:STALLS_L3_MISS |
🚫 |
CYCLE_ACTIVITY:STALLS_MEM_ANY |
🚫 |
CYCLE_ACTIVITY:STALLS_TOTAL |
🚫 |
DSB2MITE_SWITCHES:PENALTY_CYCLES |
🚫 |
DTLB_LOAD_MISSES:MISS_CAUSES_A_WALK |
🚫 |
DTLB_LOAD_MISSES:STLB_HIT |
🚫 |
DTLB_LOAD_MISSES:WALK_ACTIVE |
🚫 |
DTLB_LOAD_MISSES:WALK_COMPLETED |
🚫 |
DTLB_LOAD_MISSES:WALK_COMPLETED_1G |
🚫 |
DTLB_LOAD_MISSES:WALK_COMPLETED_2M_4M |
🚫 |
DTLB_LOAD_MISSES:WALK_COMPLETED_4K |
🚫 |
DTLB_LOAD_MISSES:WALK_PENDING |
🚫 |
DTLB_STORE_MISSES:MISS_CAUSES_A_WALK |
🚫 |
DTLB_STORE_MISSES:STLB_HIT |
🚫 |
DTLB_STORE_MISSES:WALK_ACTIVE |
🚫 |
DTLB_STORE_MISSES:WALK_COMPLETED |
🚫 |
DTLB_STORE_MISSES:WALK_COMPLETED_1G |
🚫 |
DTLB_STORE_MISSES:WALK_COMPLETED_2M_4M |
🚫 |
DTLB_STORE_MISSES:WALK_COMPLETED_4K |
🚫 |
DTLB_STORE_MISSES:WALK_PENDING |
🚫 |
EPT:WALK_PENDING |
🚫 |
EXE_ACTIVITY:1_PORTS_UTIL |
🚫 |
EXE_ACTIVITY:2_PORTS_UTIL |
🚫 |
EXE_ACTIVITY:3_PORTS_UTIL |
🚫 |
EXE_ACTIVITY:4_PORTS_UTIL |
🚫 |
EXE_ACTIVITY:BOUND_ON_STORES |
🚫 |
EXE_ACTIVITY:EXE_BOUND_0_PORTS |
🚫 |
FP_ARITH_INST_RETIRED:128B_PACKED_DOUBLE |
🚫 |
FP_ARITH_INST_RETIRED:128B_PACKED_SINGLE |
🚫 |
FP_ARITH_INST_RETIRED:256B_PACKED_DOUBLE |
🚫 |
FP_ARITH_INST_RETIRED:256B_PACKED_SINGLE |
🚫 |
FP_ARITH_INST_RETIRED:SCALAR_DOUBLE |
🚫 |
FP_ARITH_INST_RETIRED:SCALAR_SINGLE |
🚫 |
FP_ASSIST:ANY |
🚫 |
FRONTEND_RETIRED:DSB_MISS |
🚫 |
FRONTEND_RETIRED:ITLB_MISS |
✔️ |
FRONTEND_RETIRED:L1I_MISS |
✔️ |
FRONTEND_RETIRED:L2_MISS |
✔️ |
FRONTEND_RETIRED:STLB_MISS |
✔️ |
HLE_RETIRED:ABORTED |
🚫 |
HLE_RETIRED:ABORTED_EVENTS |
🚫 |
HLE_RETIRED:ABORTED_MEM |
🚫 |
HLE_RETIRED:ABORTED_MEMTYPE |
🚫 |
HLE_RETIRED:ABORTED_UNFRIENDLY |
🚫 |
HLE_RETIRED:COMMIT |
🚫 |
HLE_RETIRED:START |
🚫 |
HW_INTERRUPTS:RECEIVED |
🚫 |
ICACHE_16B:IFDATA_STALL |
✔️ |
ICACHE_64B:IFTAG_HIT |
🚫 |
ICACHE_64B:IFTAG_MISS |
✔️ |
ICACHE_64B:IFTAG_STALL |
✔️ |
IDQ:ALL_DSB_CYCLES_4_UOPS |
🚫 |
IDQ:ALL_DSB_CYCLES_ANY_UOPS |
🚫 |
IDQ:ALL_MITE_CYCLES_4_UOPS |
🚫 |
IDQ:ALL_MITE_CYCLES_ANY_UOPS |
🚫 |
IDQ:DSB_UOPS |
🚫 |
IDQ:DSB_UOPS_CYCLES |
🚫 |
IDQ:MITE_UOPS |
🚫 |
IDQ:MITE_UOPS_CYCLES |
🚫 |
IDQ:MS_DSB_UOPS_CYCLES |
🚫 |
IDQ:MS_MITE_UOPS |
🚫 |
IDQ:MS_SWITCHES |
🚫 |
IDQ:MS_UOPS |
🚫 |
IDQ:MS_UOPS_CYCLES |
🚫 |
IDQ_UOPS_NOT_DELIVERED:CORE |
🚫 |
IDQ_UOPS_NOT_DELIVERED:CYCLES_0_UOPS_DELIV_CORE |
✔️ |
IDQ_UOPS_NOT_DELIVERED:CYCLES_FE_WAS_OK |
🚫 |
IDQ_UOPS_NOT_DELIVERED:CYCLES_LE_1_UOPS_DELIV_CORE |
🚫 |
IDQ_UOPS_NOT_DELIVERED:CYCLES_LE_2_UOPS_DELIV_CORE |
🚫 |
IDQ_UOPS_NOT_DELIVERED:CYCLES_LE_3_UOPS_DELIV_CORE |
🚫 |
ILD_STALL:LCP |
✔️ |
INT_MISC:CLEAR_RESTEER_CYCLES |
✔️ |
INT_MISC:RECOVERY_CYCLES |
✔️ |
INT_MISC:RECOVERY_CYCLES_ANY |
✔️ |
ITLB:ITLB_FLUSH |
🚫 |
ITLB_MISSES:MISS_CAUSES_A_WALK |
✔️ |
ITLB_MISSES:STLB_HIT |
🚫 |
ITLB_MISSES:WALK_COMPLETED |
✔️ |
ITLB_MISSES:WALK_COMPLETED_1G |
🚫 |
ITLB_MISSES:WALK_COMPLETED_2M_4M |
🚫 |
ITLB_MISSES:WALK_COMPLETED_4K |
✔️ |
ITLB_MISSES:WALK_PENDING |
✔️ |
L1D:REPLACEMENT |
🚫 |
L1D_PEND_MISS:FB_FULL |
🚫 |
L1D_PEND_MISS:PENDING |
🚫 |
L1D_PEND_MISS:PENDING_CYCLES |
🚫 |
L1D_PEND_MISS:PENDING_CYCLES_ANY |
🚫 |
L2_LINES_IN:ALL |
🚫 |
L2_LINES_OUT:NON_SILENT |
🚫 |
L2_LINES_OUT:SILENT |
🚫 |
L2_LINES_OUT:USELESS_HWPF |
🚫 |
L2_LINES_OUT:USELESS_HWPREF |
🚫 |
L2_RQSTS:ALL_CODE_RD |
✔️ |
L2_RQSTS:ALL_DEMAND_DATA_RD |
🚫 |
L2_RQSTS:ALL_DEMAND_MISS |
🚫 |
L2_RQSTS:ALL_DEMAND_REFERENCES |
🚫 |
L2_RQSTS:ALL_PF |
🚫 |
L2_RQSTS:ALL_RFO |
🚫 |
L2_RQSTS:CODE_RD_HIT |
🚫 |
L2_RQSTS:CODE_RD_MISS |
✔️ |
L2_RQSTS:DEMAND_DATA_RD_HIT |
🚫 |
L2_RQSTS:DEMAND_DATA_RD_MISS |
🚫 |
L2_RQSTS:MISS |
🚫 |
L2_RQSTS:PF_HIT |
🚫 |
L2_RQSTS:PF_MISS |
🚫 |
L2_RQSTS:REFERENCES |
🚫 |
L2_RQSTS:RFO_HIT |
🚫 |
L2_RQSTS:RFO_MISS |
🚫 |
L2_TRANS:L2_WB |
🚫 |
LD_BLOCKS:NO_SR |
🚫 |
LD_BLOCKS:STORE_FORWARD |
🚫 |
LD_BLOCKS_PARTIAL:ADDRESS_ALIAS |
🚫 |
LOAD_HIT_PRE:SW_PF |
🚫 |
LONGEST_LAT_CACHE:MISS |
🚫 |
LONGEST_LAT_CACHE:REFERENCE |
🚫 |
LSD:CYCLES_4_UOPS |
🚫 |
LSD:CYCLES_ACTIVE |
🚫 |
LSD:UOPS |
🚫 |
MACHINE_CLEARS:COUNT |
🚫 |
MACHINE_CLEARS:MEMORY_ORDERING |
🚫 |
MACHINE_CLEARS:SMC |
🚫 |
MEM_INST_RETIRED:ALL_LOADS |
🚫 |
MEM_INST_RETIRED:ALL_STORES |
🚫 |
MEM_INST_RETIRED:LOCK_LOADS |
🚫 |
MEM_INST_RETIRED:SPLIT_LOADS |
🚫 |
MEM_INST_RETIRED:SPLIT_STORES |
🚫 |
MEM_INST_RETIRED:STLB_MISS_LOADS |
🚫 |
MEM_INST_RETIRED:STLB_MISS_STORES |
🚫 |
MEM_LOAD_L3_HIT_RETIRED:XSNP_HIT |
🚫 |
MEM_LOAD_L3_HIT_RETIRED:XSNP_HITM |
🚫 |
MEM_LOAD_L3_HIT_RETIRED:XSNP_MISS |
🚫 |
MEM_LOAD_L3_HIT_RETIRED:XSNP_NONE |
🚫 |
MEM_LOAD_MISC_RETIRED:UC |
🚫 |
MEM_LOAD_RETIRED:FB_HIT |
🚫 |
MEM_LOAD_RETIRED:L1_HIT |
🚫 |
MEM_LOAD_RETIRED:L1_MISS |
🚫 |
MEM_LOAD_RETIRED:L2_HIT |
🚫 |
MEM_LOAD_RETIRED:L2_MISS |
🚫 |
MEM_LOAD_RETIRED:L3_HIT |
🚫 |
MEM_LOAD_RETIRED:L3_MISS |
🚫 |
OFFCORE_REQUESTS:ALL_DATA_RD |
🚫 |
OFFCORE_REQUESTS:ALL_REQUESTS |
🚫 |
OFFCORE_REQUESTS:DEMAND_CODE_RD |
🚫 |
OFFCORE_REQUESTS:DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS:DEMAND_RFO |
🚫 |
OFFCORE_REQUESTS:L3_MISS_DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS_BUFFER:SQ_FULL |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:ALL_DATA_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:CYCLES_WITH_DEMAND_CODE_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:CYCLES_WITH_DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:CYCLES_WITH_DEMAND_RFO |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:CYCLES_WITH_L3_MISS_DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:DEMAND_CODE_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:DEMAND_DATA_RD_GE_6 |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:DEMAND_RFO |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:L3_MISS_DEMAND_DATA_RD |
🚫 |
OFFCORE_REQUESTS_OUTSTANDING:L3_MISS_DEMAND_DATA_RD_GE_6 |
🚫 |
OTHER_ASSISTS:ANY |
🚫 |
PARTIAL_RAT_STALLS:SCOREBOARD |
🚫 |
RESOURCE_STALLS:ANY |
🚫 |
RESOURCE_STALLS:SB |
🚫 |
ROB_MISC_EVENTS:LBR_INSERTS |
🚫 |
ROB_MISC_EVENTS:PAUSE_INST |
🚫 |
RS_EVENTS:EMPTY_CYCLES |
🚫 |
RS_EVENTS:EMPTY_END |
🚫 |
RTM_RETIRED:ABORTED |
🚫 |
RTM_RETIRED:ABORTED_EVENTS |
🚫 |
RTM_RETIRED:ABORTED_MEM |
🚫 |
RTM_RETIRED:ABORTED_MEMTYPE |
🚫 |
RTM_RETIRED:ABORTED_UNFRIENDLY |
🚫 |
RTM_RETIRED:COMMIT |
🚫 |
RTM_RETIRED:START |
🚫 |
SQ_MISC:SPLIT_LOCK |
🚫 |
SW_PREFETCH_ACCESS:NTA |
🚫 |
SW_PREFETCH_ACCESS:PREFETCHW |
🚫 |
SW_PREFETCH_ACCESS:T0 |
🚫 |
SW_PREFETCH_ACCESS:T1_T2 |
🚫 |
TLB_FLUSH:DTLB_THREAD |
🚫 |
TLB_FLUSH:STLB_ANY |
🚫 |
TX_EXEC:MISC1 |
🚫 |
TX_EXEC:MISC2 |
🚫 |
TX_EXEC:MISC3 |
🚫 |
TX_EXEC:MISC4 |
🚫 |
TX_EXEC:MISC5 |
🚫 |
TX_MEM:ABORT_CAPACITY |
🚫 |
TX_MEM:ABORT_CONFLICT |
🚫 |
TX_MEM:ABORT_HLE_ELISION_BUFFER_FULL |
🚫 |
TX_MEM:ABORT_HLE_ELISION_BUFFER_MISMATCH |
🚫 |
TX_MEM:ABORT_HLE_ELISION_BUFFER_NOT_EMPTY |
🚫 |
TX_MEM:ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT |
🚫 |
TX_MEM:ABORT_HLE_STORE_TO_ELIDED_LOCK |
🚫 |
UOPS_DISPATCHED_PORT:PORT_0 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_1 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_2 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_3 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_4 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_5 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_6 |
🚫 |
UOPS_DISPATCHED_PORT:PORT_7 |
🚫 |
UOPS_EXECUTED:CORE |
🚫 |
UOPS_EXECUTED:CORE_CYCLES_GE_1 |
🚫 |
UOPS_EXECUTED:CORE_CYCLES_GE_2 |
🚫 |
UOPS_EXECUTED:CORE_CYCLES_GE_3 |
🚫 |
UOPS_EXECUTED:CORE_CYCLES_GE_4 |
🚫 |
UOPS_EXECUTED:CORE_CYCLES_NONE |
🚫 |
UOPS_EXECUTED:STALL_CYCLES |
🚫 |
UOPS_EXECUTED:THREAD |
🚫 |
UOPS_EXECUTED:THREAD_CYCLES_GE_1 |
🚫 |
UOPS_EXECUTED:THREAD_CYCLES_GE_2 |
🚫 |
UOPS_EXECUTED:THREAD_CYCLES_GE_3 |
🚫 |
UOPS_EXECUTED:THREAD_CYCLES_GE_4 |
🚫 |
UOPS_EXECUTED:X87 |
🚫 |
UOPS_ISSUED:ANY |
🚫 |
UOPS_ISSUED:SLOW_LEA |
🚫 |
UOPS_ISSUED:STALL_CYCLES |
🚫 |
UOPS_ISSUED:VECTOR_WIDTH_MISMATCH |
🚫 |
UOPS_RETIRED:RETIRE_SLOTS |
🚫 |
UOPS_RETIRED:STALL_CYCLES |
🚫 |
UOPS_RETIRED:TOTAL_CYCLES |
🚫 |