gcc -Ofast -pg 错误的 gprof 结果

Posted

技术标签:

【中文标题】gcc -Ofast -pg 错误的 gprof 结果【英文标题】:wrong gprof results with gcc -Ofast -pg 【发布时间】:2016-08-16 03:54:50 【问题描述】:

我看到 gcc -pggprof 的分析结果很奇怪。

我找不到更好/更小的重现方法,所以我将实际代码链接到我看到问题的地方。

我正在使用代码 here,使用 make prof 构建 gprof 分析结果

我注意到几个错误,主要是:

    一个根本不应该被调用的函数 shape_stream_test 显示为被 ai_best_move_rec 大量调用,并且在自身时间中占用了总运行时间的 15% ai_best_move_rec 合法调用的其他函数,例如 grid_block_removegrid_cpygrid_new 甚至不会作为子函数出现

实际教授统计数据:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 53.92      4.00     4.00  4231936     0.00     0.00  grid_eval
 13.62      5.01     1.01  4467368     0.00     0.00  shape_stream_test
  6.88      5.52     0.51  4690742     0.00     0.00  grid_block_center_top
  6.07      5.97     0.45  4521016     0.00     0.00  grid_block_valid
  5.80      6.40     0.43  4467368     0.00     0.00  grid_block_add
  3.37      6.65     0.25  4467368     0.00     0.00  grid_block_drop
  2.02      6.80     0.15       35     4.29   209.20  ai_best_move_rec
  1.95      6.95     0.15   235469     0.00     0.00  grid_init
  1.75      7.08     0.13  9212461     0.00     0.00  block_extreme
  1.48      7.19     0.11  4467402     0.00     0.00  block_move
  0.94      7.26     0.07  1654899     0.00     0.00  block_get
  0.54      7.30     0.04                             block_crust_get
  0.40      7.33     0.03                             grid_block_set_color
  0.27      7.35     0.02  4467368     0.00     0.00  grid_block_remove
  0.27      7.37     0.02   235501     0.00     0.00  block_new
  0.27      7.39     0.02                             grid_block_intersects
  0.20      7.40     0.02   235469     0.00     0.00  grid_new
  0.13      7.41     0.01   235467     0.00     0.00  shape_stream_peek
  0.13      7.42     0.01                             drop_amount
  0.00      7.42     0.00      621     0.00     0.00  grid_clear_lines
  0.00      7.42     0.00      587     0.00     0.00  grid_cpy
  0.00      7.42     0.00       35     0.00     0.00  grid_print
  0.00      7.42     0.00       34     0.00     0.00  block_init
  0.00      7.42     0.00       34     0.00     0.00  block_print
  0.00      7.42     0.00       34     0.00     0.00  game_move_print
  0.00      7.42     0.00       34     0.00     0.00  grid_apply_moves
  0.00      7.42     0.00       34     0.00     0.00  shape_stream_pop
  0.00      7.42     0.00        7     0.00     0.00  shape_new
  0.00      7.42     0.00        1     0.00     0.00  shape_stream_new

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
       else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
       function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
       the function in the gprof listing. If the index is
       in parenthesis it shows where it would appear in
       the gprof listing if it were to be printed.

Copyright (C) 2012-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

             Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.13% of 7.42 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     98.7    0.00    7.32                 ai_test [1]
                0.15    7.17      35/35          ai_best_move_rec [2]
                0.00    0.00      34/34          grid_apply_moves [21]
                0.00    0.00       2/235469      grid_new [9]
                0.00    0.00      35/35          grid_print [24]
                0.00    0.00      34/34          game_move_print [27]
                0.00    0.00      34/34          shape_stream_pop [28]
                0.00    0.00       1/1           shape_stream_new [30]
-----------------------------------------------
                              235432             ai_best_move_rec [2]
                0.15    7.17      35/35          ai_test [1]
[2]     98.7    0.15    7.17      35+235432  ai_best_move_rec [2]
                4.00    0.00 4231936/4231936     grid_eval [3]
                1.01    0.00 4467368/4467368     shape_stream_test [4]
                0.51    0.11 4690742/4690742     grid_block_center_top [5]
                0.45    0.09 4521016/4521016     grid_block_valid [6]
                0.43    0.00 4467368/4467368     grid_block_add [7]
                0.25    0.00 4467368/4467368     grid_block_drop [8]
                0.02    0.15  235467/235469      grid_new [9]
                0.11    0.00 4467368/4467402     block_move [12]
                0.02    0.00 4467368/4467368     grid_block_remove [16]
                0.02    0.00  235467/235501      block_new [17]
                0.01    0.00  235467/235467      shape_stream_peek [19]
                0.00    0.00     587/587         grid_cpy [23]
                0.00    0.00     587/621         grid_clear_lines [22]
                              235432             ai_best_move_rec [2]
-----------------------------------------------
                4.00    0.00 4231936/4231936     ai_best_move_rec [2]
[3]     53.9    4.00    0.00 4231936         grid_eval [3]
-----------------------------------------------
                1.01    0.00 4467368/4467368     ai_best_move_rec [2]
[4]     13.6    1.01    0.00 4467368         shape_stream_test [4]
-----------------------------------------------
                0.51    0.11 4690742/4690742     ai_best_move_rec [2]
[5]      8.4    0.51    0.11 4690742         grid_block_center_top [5]
                0.07    0.00 4690742/9212461     block_extreme [11]
                0.04    0.00 1056160/1654899     block_get [13]
-----------------------------------------------
                0.45    0.09 4521016/4521016     ai_best_move_rec [2]
[6]      7.3    0.45    0.09 4521016         grid_block_valid [6]
                0.06    0.00 4521016/9212461     block_extreme [11]
                0.03    0.00  598739/1654899     block_get [13]
-----------------------------------------------
                0.43    0.00 4467368/4467368     ai_best_move_rec [2]
[7]      5.8    0.43    0.00 4467368         grid_block_add [7]
-----------------------------------------------
                0.25    0.00 4467368/4467368     ai_best_move_rec [2]
[8]      3.4    0.25    0.00 4467368         grid_block_drop [8]
                0.00    0.00     635/9212461     block_extreme [11]
-----------------------------------------------
                0.00    0.00       2/235469      ai_test [1]
                0.02    0.15  235467/235469      ai_best_move_rec [2]
[9]      2.2    0.02    0.15  235469         grid_new [9]
                0.15    0.00  235469/235469      grid_init [10]
-----------------------------------------------
                0.15    0.00  235469/235469      grid_new [9]
[10]     2.0    0.15    0.00  235469         grid_init [10]
-----------------------------------------------
                0.00    0.00      68/9212461     grid_apply_moves [21]
                0.00    0.00     635/9212461     grid_block_drop [8]
                0.06    0.00 4521016/9212461     grid_block_valid [6]
                0.07    0.00 4690742/9212461     grid_block_center_top [5]
[11]     1.8    0.13    0.00 9212461         block_extreme [11]
-----------------------------------------------
                0.00    0.00      34/4467402     grid_apply_moves [21]
                0.11    0.00 4467368/4467402     ai_best_move_rec [2]
[12]     1.5    0.11    0.00 4467402         block_move [12]
-----------------------------------------------
                0.03    0.00  598739/1654899     grid_block_valid [6]
                0.04    0.00 1056160/1654899     grid_block_center_top [5]
[13]     0.9    0.07    0.00 1654899         block_get [13]
-----------------------------------------------
                                                 <spontaneous>
[14]     0.5    0.04    0.00                 block_crust_get [14]
-----------------------------------------------
                                                 <spontaneous>
[15]     0.4    0.03    0.00                 grid_block_set_color [15]
-----------------------------------------------
                0.02    0.00 4467368/4467368     ai_best_move_rec [2]
[16]     0.3    0.02    0.00 4467368         grid_block_remove [16]
-----------------------------------------------
                0.00    0.00      34/235501      grid_apply_moves [21]
                0.02    0.00  235467/235501      ai_best_move_rec [2]
[17]     0.3    0.02    0.00  235501         block_new [17]
-----------------------------------------------
                                                 <spontaneous>
[18]     0.3    0.02    0.00                 grid_block_intersects [18]
-----------------------------------------------
                0.01    0.00  235467/235467      ai_best_move_rec [2]
[19]     0.1    0.01    0.00  235467         shape_stream_peek [19]
-----------------------------------------------
                                                 <spontaneous>
[20]     0.1    0.01    0.00                 drop_amount [20]
-----------------------------------------------
                0.00    0.00      34/34          ai_test [1]
[21]     0.0    0.00    0.00      34         grid_apply_moves [21]
                0.00    0.00      34/235501      block_new [17]
                0.00    0.00      68/9212461     block_extreme [11]
                0.00    0.00      34/4467402     block_move [12]
                0.00    0.00      34/34          block_init [25]
                0.00    0.00      34/621         grid_clear_lines [22]
-----------------------------------------------
                0.00    0.00      34/621         grid_apply_moves [21]
                0.00    0.00     587/621         ai_best_move_rec [2]
[22]     0.0    0.00    0.00     621         grid_clear_lines [22]
-----------------------------------------------
                0.00    0.00     587/587         ai_best_move_rec [2]
[23]     0.0    0.00    0.00     587         grid_cpy [23]
-----------------------------------------------
                0.00    0.00      35/35          ai_test [1]
[24]     0.0    0.00    0.00      35         grid_print [24]
-----------------------------------------------
                0.00    0.00      34/34          grid_apply_moves [21]
[25]     0.0    0.00    0.00      34         block_init [25]
-----------------------------------------------
                0.00    0.00      34/34          game_move_print [27]
[26]     0.0    0.00    0.00      34         block_print [26]
-----------------------------------------------
                0.00    0.00      34/34          ai_test [1]
[27]     0.0    0.00    0.00      34         game_move_print [27]
                0.00    0.00      34/34          block_print [26]
-----------------------------------------------
                0.00    0.00      34/34          ai_test [1]
[28]     0.0    0.00    0.00      34         shape_stream_pop [28]
-----------------------------------------------
                0.00    0.00       7/7           shapes_read [68]
[29]     0.0    0.00    0.00       7         shape_new [29]
-----------------------------------------------
                0.00    0.00       1/1           ai_test [1]
[30]     0.0    0.00    0.00       1         shape_stream_new [30]
-----------------------------------------------

 This table describes the call tree of the program, and was sorted by
 the total amount of time spent in each function and its children.

 Each entry in this table consists of several lines.  The line with the
 index number at the left hand margin lists the current function.
 The lines above it list the functions that called this function,
 and the lines below it list the functions this one called.
 This line lists:
     index  A unique number given to each element of the table.
        Index numbers are sorted numerically.
        The index number is printed next to every function name so
        it is easier to look up where the function is in the table.

     % time This is the percentage of the `total' time that was spent
        in this function and its children.  Note that due to
        different viewpoints, functions excluded by options, etc,
        these numbers will NOT add up to 100%.

     self   This is the total amount of time spent in this function.

     children   This is the total amount of time propagated into this
        function by its children.

     called This is the number of times the function was called.
        If the function called itself recursively, the number
        only includes non-recursive calls, and is followed by
        a `+' and the number of recursive calls.

     name   The name of the current function.  The index number is
        printed after it.  If the function is a member of a
        cycle, the cycle number is printed between the
        function's name and the index number.


 For the function's parents, the fields have the following meanings:

     self   This is the amount of time that was propagated directly
        from the function into this parent.

     children   This is the amount of time that was propagated from
        the function's children into this parent.

     called This is the number of times this parent called the
        function `/' the total number of times the function
        was called.  Recursive calls to the function are not
        included in the number after the `/'.

     name   This is the name of the parent.  The parent's index
        number is printed after it.  If the parent is a
        member of a cycle, the cycle number is printed between
        the name and the index number.

 If the parents of the function cannot be determined, the word
 `<spontaneous>' is printed in the `name' field, and all the other
 fields are blank.

 For the function's children, the fields have the following meanings:

     self   This is the amount of time that was propagated directly
        from the child into the function.

     children   This is the amount of time that was propagated from the
        child's children to the function.

     called This is the number of times the function called
        this child `/' the total number of times the child
        was called.  Recursive calls by the child are not
        listed in the number after the `/'.

     name   This is the name of the child.  The child's index
        number is printed after it.  If the child is a
        member of a cycle, the cycle number is printed
        between the name and the index number.

 If there are any cycles (circles) in the call graph, there is an
 entry for the cycle-as-a-whole.  This entry shows who called the
 cycle (as parents) and the members of the cycle (as children.)
 The `+' recursive calls entry shows the number of function calls that
 were internal to the cycle, and the calls entry for each member shows,
 for that member, how many times it was called from other members of
 the cycle.

Copyright (C) 2012-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

Index by function name

   [2] ai_best_move_rec       [21] grid_apply_moves        [3] grid_eval
  [14] block_crust_get         [7] grid_block_add         [10] grid_init
  [11] block_extreme           [5] grid_block_center_top   [9] grid_new
  [13] block_get               [8] grid_block_drop        [24] grid_print
  [25] block_init             [18] grid_block_intersects  [29] shape_new
  [12] block_move             [16] grid_block_remove      [30] shape_stream_new
  [17] block_new              [15] grid_block_set_color   [19] shape_stream_peek
  [26] block_print             [6] grid_block_valid       [28] shape_stream_pop
  [20] drop_amount            [22] grid_clear_lines        [4] shape_stream_test
  [27] game_move_print        [23] grid_cpy

后来我发现我正在组合 -Ofast 和 -pg 标志,删除 -Ofast 可以解决上述问题。

我的问题是:

    这两个标志不兼容吗?我在文档中找不到任何此类警告

    如果它们不是不兼容的,这是 gcc 错误吗?

    如何解决此问题以查看与最激进的优化匹配的分析结果,以免浪费时间优化错误的代码?

【问题讨论】:

【参考方案1】:

你想要速度,对吧? -Ofast 仅在调用堆栈的底部很重要。它不能解决只有你能解决的问题。它只会让他们更难找到。首先关闭优化器并修复您可以修复的内容。 Here's an example of how some people do it. 当你尽可能地使用它时,打开优化器并让它发挥它的魔力。

Small Flame :) 很多人说分析未优化的代码是浪费时间,但他们从不说原因。 他们在演讲厅听到它,在博客上或从 *** 上阅读它,从某个在该主题上具有明确权威的人无需给出理由。

我认为这是一厢情愿的情况:我的代码基本上是完美的,因此可以加速它的唯一方法是运行编译器的优化器。然后当分析器显示无法加快速度时 - 这显然是正确的,我为自己欢呼!

为回应 cmets 添加:让我离题一点(抱歉)。我知道人们可以仔细检查代码,在数据表示上投入精力,内联一些函数,进行其他高级优化,并测量这样做的加速。伟大的。 但是,如果您每次更改代码的原因是查看并只是考虑代码,那么尽管您的想法受过教育,但它仍然是一个猜测。 猜测可能是正确的,但您应该问的问题是“我错过了什么?”和“我怎么找到它?”

我和很多人使用的方法是random pausing。 它需要的唯一工具是调试器,如 GDB 或任何 IDE。 它与分析的不同之处在于它直接定位不必要的计算,而不是进行测量并依靠您来破译它们、过滤掉噪音、查看特定的例程并在其中搜索。 它会告诉您具体的代码行和特定的数据、程序在做什么以及为什么要这样做。 然后你可以问是否有一种不那么浪费的方法来做到这一点,而且很有可能。 它做的是告诉你精确的时间分数——你只能得到一个非常粗略的测量结果。它所做 所做的是识别问题。 您可能认为您当然不能依赖这种不准确的方法,如果您只在 one 示例中看到问题,那是真的。 但是,如果您在多个样本上看到它,您就会知道它很大,而且您为多次查看它而获取的整体样本越少,它就越大。 Here are the statistics behind it. 关键在于:这将发现任何分析器会发现的任何问题,等等。

然后,当您发现并解决了一个问题时,您可以重新进行所有操作,因为解决一个问题会放大剩余的问题。 通过这种方式,您可以“阶梯式”加速,直到无法再加速为止。 这就是你如何获得最大速度as in this example。 然后,一定要使用-Ofast

【讨论】:

我认为“只有你可以解决的事情”是指更高级别的优化。但在这种情况下,我对手动进行低级优化很感兴趣。这就是为什么没有 -Ofast 我认为我无法准确表示代码的哪些部分值得我关注。我也很想了解为什么这个不相关的、未被调用的函数出现在统计数据中,标志是否不兼容或者这是否是一个错误。 @erjoalgo:天哪,多点。首先,即使 gprof 没有错误,it has many problems,所以人们可能想知道您为什么要费心与它搏斗。其次,如果速度是您的目标,为什么高级优化不值得您关注?第三,你可以做的任何事情来加速低级代码在未优化的代码中是显而易见的。如果编译器优化了一些低级代码,那不会使 you 可以做的事情更加明显(即使它确实 显示了所有函数)。如果您愿意,我会尝试更详细地解释这一点。 在这个特定的项目中,我确实在数据表示和算法上投入了足够的精力,以最大限度地降低复杂性。所以我已经在高级优化上花费了很多精力,我想抓住低垂的果实。例如,通过手动内联一些计算、远离排序库等,我已经获得了显着的加速。 我同意“你可以做的任何事情来加速低级代码在未优化的代码中是显而易见的”,但我不认为反过来是正确的,优化代码的分析器可能会声明函数 X真正优化好的时候才值得关注,不值得关注。例如,未优化程序图表中的grid_block_add 花费的总程序时间几乎是优化版本的两倍(7% 对 13%)。 我愿意接受您的回答,以便继续进行此操作。你介意添加你将使用的东西而不是 gprof 来分析积极优化的代码吗?

以上是关于gcc -Ofast -pg 错误的 gprof 结果的主要内容,如果未能解决你的问题,请参考以下文章

GCC 编译选项语法

GPROF 平面轮廓空结果

如何在 Linux 中使用 gprof?

gprof 不为需要合理时间执行的程序产生任何输出

gprof 错误:配置文件的版本不受支持

GCOV/LCOV,GPROF