0

I'm using perf(version 5.4.73) to record cache-misses event, and perf report shows something like:

Samples: 285k of 'cache-misses'
...
0.37   vmovq   (%rcx), %xmm5
71.09  mov     0x30(%rbx), %ecx
2.15   cmp     %eax,%ecx  //both operands are registers, why cache miss here?
0.14   jne     242

My question is why there is cache miss event(2.15%) for cmp %eax,%ecx even if both operands are registers?

konchy
  • 573
  • 5
  • 16
  • My first guess would be an miss on the instruction itself. – MSalters Jul 19 '22 at 07:12
  • @MSalters I've tried record `L1-dcache-loads-misses` event and the phenomenon is similar, it shows dcache miss on instruction of which both operands are registers – konchy Jul 19 '22 at 07:34
  • Sounds like a case of skew, where sometimes the event spills onto the next instruction. (Although more often *most* of the counts get skewed). Without PEBS (Intel) for precise events, an interrupt has to fire to record a sample. Maybe try with `-e mem_load_retired.l1_miss:p` to force a precise event. (That's an event on my Skylake CPU; IDK what CPU architecture you're using.) – Peter Cordes Jul 19 '22 at 13:17
  • @PeterCordes Could you explain more about what is "skew" and when/why it happens? I've tried `-e mem_load_retired.l1_miss:p`, the results seem more rational, but I can still see something like `99.92 test %esi %esi `, `0.07 cmp %eax $ecx`. I'm using Cascade Lake intel CPU and I've also tested it(cache-misses event) on AMD EPYC(Milan) – konchy Jul 20 '22 at 01:25
  • https://easyperf.net/blog/2018/08/29/Understanding-performance-events-skid / [What does skew mean in the context of performance engineering](https://stackoverflow.com/q/15169480) / [How does perf record (or other profilers) pick which instruction to count as costing time?](https://stackoverflow.com/q/69351189). IDK why you'd be getting any counts for a mem_load_retired event on an ALU instruction no memory operands, though. `cmp`/`test` can macro-fuse, but only with a `jcc`, not memory-indirect jmp (which doesn't read FLAGS). – Peter Cordes Jul 20 '22 at 02:46
  • Also related: [What does this sentence mean in the context of perf tool: "Supports address when precise (Precise event)"?](https://stackoverflow.com/q/71629004) / [Perf shows L1-dcache-load-misses in a block with no memory access](https://stackoverflow.com/q/63251365) – Peter Cordes Jul 20 '22 at 02:50

0 Answers0