FDO - Feedback directed optimization with GCC and Perf
Posted rtoax
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了FDO - Feedback directed optimization with GCC and Perf相关的知识,希望对你有一定的参考价值。
https://blog.wnohang.net/index.php/2015/04/29/feedback-directed-optimization-with-gcc-and-perf/https://blog.wnohang.net/index.php/2015/04/29/feedback-directed-optimization-with-gcc-and-perf/
Gcc 5.0 has added support for FDO which uses perf to generate profile. There is documentation for this in gcc manual, to quote:
-fauto-profile=path
Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize,
-finline-functions, -fipa-cp, -fipa-cp-clone, -fpredictive-commoning, -funswitch-loops, -fgcse-after-reload, and -ftree-loop-distribute-patterns.
path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.
Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU/Linux target system. For more information, see .
E.g.
perf record -e br_inst_retired:near_taken -b -o perf.data \\
— your_program
Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC. You must also supply the unstripped binary for your program to this tool. See .
E.g.
create_gcov –binary=your_program.unstripped –profile=perf.data \\
–gcov=profile.afdo
However, this skims over a few details:
- br_inst_retired:near_taken is not available as shown there. See this gcc thread for details.
I did with:
perf record \\ -e cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp \\ -p ... -b -o perf.data
You can use the ocperf from pmu-tools here to get the correct event (with ocperf.py list).
-
create_gcov is not packaged with gcc and is only available with autofdo from google.
-
However, you can run into incompatibility due to autofdo being incompatible with latest perf. I am using perf with linux 4.0. You can apply the patches here.
- I also have a github branch with patches applied here.
- Finally, you can also run into gcov version incompatibility:
AutoFDO profile version 875575082 does match 1. |
- You need to explicitly provide the gcov_version for this:
create_gcov --binary=/pxc56/bin/mysqld
--profile=perf.data -gcov_version 1
--gcov=perf.ado
Now, with all tools in place, all you need to do is:
- Build the program. In my case, I built percona-xtradb-cluster with RelWithDebInfo profile. The debug symbols are required.
-
Run it against representative workload. I used sysbench oltp for this.
sysbench --test=/pxc56/db/oltp.lua --db-driver=mysql \\ --mysql-engine-trx=yes --mysql-table-engine=innodb \\ --mysql-user=root --mysql-password=test --oltp-table-size=100000 \\ --num-threads=4 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=60 \\ --max-requests=100000 --oltp-tables-count=5 run
- While the workload is running, run perf concurrently.
perf record -e \\ cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp \\ -p $(pidof mysqld) -b -o perf.data
- After sysbench ends, stop perf and then convert perf.data to gcov format.
create_gcov --binary=/pxc56/bin/mysqld \\ --profile=perf.data -gcov_version 1 --gcov=perf.ado
- Now, rebuild the program again but this time with:
export CFLAGS+=-fauto-profile=/tmp/perf.ado export CXXFLAGS+=-fauto-profile=/tmp/perf.ado
- The binary produced now is the one which would be optimized with hints/feedback from profile captured by perf.
I have skipped the results for now, that is for another post with actual benchmarking in place and a better representative workload.
To conclude, even though gcc has had gcov profiling before, it wasn’t that convenient to use. perf has been a good low-overhead profiler in use in various environments, so using its output/profile certainly makes it easier for optimization based on it.
以上是关于FDO - Feedback directed optimization with GCC and Perf的主要内容,如果未能解决你的问题,请参考以下文章
`O_DIRECT | 有啥区别? O_SYNC` + write() 和 `O_DIRECT` + write() + fsync()