为啥 perf 不报告缓存未命中？

Posted 2023-03-28

技术标签:

【中文标题】为啥 perf 不报告缓存未命中？【英文标题】：Why doesn't perf report cache misses?为什么 perf 不报告缓存未命中？ 【发布时间】：2013-01-18 10:16:09 【问题描述】：

根据perf tutorials，perf stat 应该使用硬件计数器报告缓存未命中。但是，在我的系统（最新的 Arch Linux）上，它没有：

[joel@panda goog]$ perf stat ./hash

 Performance counter stats for './hash':

    869.447863 task-clock                #    0.997 CPUs utilized          
            92 context-switches          #    0.106 K/sec                  
             4 cpu-migrations            #    0.005 K/sec                  
         1,041 page-faults               #    0.001 M/sec                  
 2,628,646,296 cycles                    #    3.023 GHz                    
   819,269,992 stalled-cycles-frontend   #   31.17% frontend cycles idle   
   132,355,435 stalled-cycles-backend    #    5.04% backend  cycles idle   
 4,515,152,198 instructions              #    1.72  insns per cycle        
                                         #    0.18  stalled cycles per insn
 1,060,739,808 branches                  # 1220.015 M/sec                  
     2,653,157 branch-misses             #    0.25% of all branches        

   0.871766141 seconds time elapsed

我错过了什么？我已经搜索了手册页和网络，但没有发现任何明显的内容。

编辑：我的 CPU 是 Intel i5 2300K，如果这很重要的话。

【问题讨论】：

这取决于您的硬件计数器。我从未使用过perf，但我使用过PAPI (icl.cs.utk.edu/PAPI)，并且可以检查可用的硬件计数器以了解您可以从 CPU 中获得什么。试试perf stat -d - 它会报告一些缓存事件。还要检查用于记录/报告内存事件的新 perf mem 工具 - 记录在 linuxtag.org/2013/fileadmin/www.linuxtag.org/slides/… 幻灯片 10 和 man7.org/linux/man-pages/man1/perf-mem.1.html osgx, perf stat -d 将打开事件多路复用，有时可能会报告错误的速率。每次运行最好手动选择不超过 5-7 个硬件事件；并仅使用 perf stat -d 来获取此类事件的名称。英特尔的其他方式 - 尝试来自 github.com/andikleen/pmu-tools 的 toplev.py 【参考方案1】：

我花了几分钟试图理解perf。我通过先记录然后报告数据（两个perf 工具）发现了缓存未命中。

查看事件列表：

perf list

例如，为了检查最后一级缓存加载未命中，您需要像这样使用事件LLC-loads-misses

perf record -e LLC-loads-misses ./your_program

然后报告结果

perf report -v

【讨论】：

性能事件cache-misses和LLC-loads-misses有什么区别？我已经很久没有看到这些东西了，但我认为缓存未命中包括所有缓存级别（通常为 3 级）的未命中，而 LLC 可能仅适用于最后一级 L3 ，因为这是最关键的一个，如果它错过了它就会进入内存。【参考方案2】：

在我的系统上，Intel Xeon X5570 @ 2.93 GHz 通过像这样明确地请求这些事件，我能够让 perf stat 报告缓存引用和未命中

perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5
Performance counter stats for 'sleep 5':

         10573 cache-references                                            
          1949 cache-misses              #   18.434 % of all cache refs    
       1077328 cycles                    #    0.000 GHz                    
        715248 instructions              #    0.66  insns per cycle        
        151188 branches                                                    
           154 faults                                                      
             0 migrations                                                  

   5.002776842 seconds time elapsed

默认事件集不包含缓存事件，匹配你的结果，不知道为什么

perf stat -B sleep 5

Performance counter stats for 'sleep 5':

      0.344308 task-clock                #    0.000 CPUs utilized          
             1 context-switches          #    0.003 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
           154 page-faults               #    0.447 M/sec                  
        977183 cycles                    #    2.838 GHz                    
        586878 stalled-cycles-frontend   #   60.06% frontend cycles idle   
        430497 stalled-cycles-backend    #   44.05% backend  cycles idle   
        720815 instructions              #    0.74  insns per cycle        
                                         #    0.81  stalled cycles per insn
        152217 branches                  #  442.095 M/sec                  
          7646 branch-misses             #    5.02% of all branches        

   5.002763199 seconds time elapsed

【讨论】：

谢谢，这很有帮助。我猜他们一定改变了捕获的默认事件集。很好，我认为必须始终记录信息很奇怪。这种方法更快:) 问题中的一个问题，perf 输出中的fauts 计数是多少 @ElvisTeixeira faults 是 page-faults 的别名，用于运行 perf list 的所有事件列表 tnks @amdn。现在，page-faults 是什么？【参考方案3】：

在the latest source code中，默认事件不再包括cache-misses和cache-references：

struct perf_event_attr default_attrs[] = 

   .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK      ,
   .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES    ,
   .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS      ,
   .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS     ,

   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES      ,
   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND ,
   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND  ,
   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS        ,
   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS ,
   .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES       ,

;

所以到目前为止，这个人和大多数网络都已经过时了。

【讨论】：

以上是关于为啥 perf 不报告缓存未命中？的主要内容，如果未能解决你的问题，请参考以下文章