FDO - Feedback directed optimization with GCC and Perf

Posted rtoax

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了FDO - Feedback directed optimization with GCC and Perf相关的知识,希望对你有一定的参考价值。

https://blog.wnohang.net/index.php/2015/04/29/feedback-directed-optimization-with-gcc-and-perf/https://blog.wnohang.net/index.php/2015/04/29/feedback-directed-optimization-with-gcc-and-perf/

Gcc 5.0 has added support for FDO which uses perf to generate profile. There is documentation for this in gcc manual, to quote:

-fauto-profile=path
Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize,
-finline-functions, -fipa-cp, -fipa-cp-clone, -fpredictive-commoning, -funswitch-loops, -fgcse-after-reload, and -ftree-loop-distribute-patterns.
path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.
Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU/Linux target system. For more information, see .
E.g.
perf record -e br_inst_retired:near_taken -b -o perf.data \\
— your_program
Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC. You must also supply the unstripped binary for your program to this tool. See .
E.g.
create_gcov –binary=your_program.unstripped –profile=perf.data \\
–gcov=profile.afdo

However, this skims over a few details:

  • br_inst_retired:near_taken is not available as shown there. See this gcc thread for details.

    I did with:

    perf record \\
    -e cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp \\
    -p ...  -b -o perf.data

    You can use the ocperf from pmu-tools here to get the correct event (with ocperf.py list).

  • create_gcov is not packaged with gcc and is only available with autofdo from google.

  • However, you can run into incompatibility due to autofdo being incompatible with latest perf. I am using perf with linux 4.0. You can apply the patches here.

    • I also have a github branch with patches applied here.
  • Finally, you can also run into gcov version incompatibility:
AutoFDO profile version 875575082 does match 1.
  • You need to explicitly provide the gcov_version for this:
create_gcov --binary=/pxc56/bin/mysqld
 --profile=perf.data -gcov_version 1
 --gcov=perf.ado

Now, with all tools in place, all you need to do is:

  1. Build the program. In my case, I built percona-xtradb-cluster with RelWithDebInfo profile. The debug symbols are required.
  • Run it against representative workload. I used sysbench oltp for this.

    sysbench --test=/pxc56/db/oltp.lua --db-driver=mysql \\
    --mysql-engine-trx=yes --mysql-table-engine=innodb \\
    --mysql-user=root --mysql-password=test --oltp-table-size=100000 \\
    --num-threads=4 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=60 \\
    --max-requests=100000 --oltp-tables-count=5 run
  • While the workload is running, run perf concurrently.
    perf record -e \\
     cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp \\
    -p $(pidof mysqld)  -b -o perf.data
  • After sysbench ends, stop perf and then convert perf.data to gcov format.
    create_gcov --binary=/pxc56/bin/mysqld \\
    --profile=perf.data -gcov_version 1 --gcov=perf.ado
  • Now, rebuild the program again but this time with:
    export CFLAGS+=-fauto-profile=/tmp/perf.ado
    export CXXFLAGS+=-fauto-profile=/tmp/perf.ado
  • The binary produced now is the one which would be optimized with hints/feedback from profile captured by perf.

I have skipped the results for now, that is for another post with actual benchmarking in place and a better representative workload.

To conclude, even though gcc has had gcov profiling before, it wasn’t that convenient to use. perf has been a good low-overhead profiler in use in various environments, so using its output/profile certainly makes it easier for optimization based on it.

以上是关于FDO - Feedback directed optimization with GCC and Perf的主要内容,如果未能解决你的问题,请参考以下文章

MAPGUIDE 中的 FDO 图层

`O_DIRECT | 有啥区别? O_SYNC` + write() 和 `O_DIRECT` + write() + fsync()

O_DIRECT 的真正含义是啥?

O_DIRECT 与 AIO_RAW

Arch LINux 上未定义 O_DIRECT

open(2) 中的 O_SYNC 和 O_DIRECT 标志有何不同/相似?