嵌入式 GCC 优化魔法

Posted 2023-02-22

技术标签:

【中文标题】嵌入式 GCC 优化魔法【英文标题】：Embedded GCC optimization magic 【发布时间】：2020-01-24 15:28:24 【问题描述】：

我有一个项目，我尝试为微控制器构建固件，并尝试更好地控制所使用的优化标志。我想，而不是使用 -O<number> 标志单独指定不同的优化标志。不幸的是，-O 标志似乎发生了一些优化魔法，我无法用单个优化标志重现，我不明白为什么。

这是我尝试过的，什么不起作用：

我知道我可以使用-O1 编译项目。所以我使用-Q 和--help 标志来输出在我激活-O1 标志时处于活动状态的标志。我使用此信息在构建过程中手动指定不同的标志，编译工作正常，但在链接阶段失败，因为 .bss 部分不再适合我的 RAM（我只有 384 kByte 可用）。

当我在链接器脚本中增加 RAM 大小时，链接工作正常，但 .bss 部分的末尾放置在 416 kByte，二进制图像比直接使用 -O1 时大 75%。

当我比较 gcc 报告的标志和参数时，两个构建之间没有区别，但没有-O1 的那个仍然大得多。

根据 GCC 文档 (GCC Manual)，-O 标志是否只激活特定的优化标志，因此也应该可以手动执行此操作（或不这样做？）

这是我的 gcc 命令：

带有单个优化标志的 GCC 调用

gcc -std=c99 -msoft-float -fno-inline -fdata-sections -ffunction-sections -Wall -Wextra\
-faggressive-loop-optimizations -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments\
-fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdelete-null-pointer-checks\
-fdse -fearly-inlining -ffast-math -fforward-propagate -ffp-contract=fast -ffp-int-builtin-inexact\
-ffunction-cse -fgcse-lm -fguess-branch-probability -fhandle-exceptions -fif-conversion -fif-conversion2\
-finline-atomics -finline-functions-called-once -fipa-profile -fipa-pure-const -fipa-reference\
-fira-algorithm=CB -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots -fivopts\
-fjump-tables -flifetime-dse -flifetime-dse=2 -fmath-errno -fmove-loop-invariants -fomit-frame-pointer\
-fpeephole -fplt -fprefetch-loop-arrays -fprintf-return-value -frename-registers -freorder-blocks
-frtti -fsched-critical-path-heuristic -fsched-dep-count-heuristic -fsched-group-heuristic\
-fsched-interblock -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic\
-fsched-stalled-insns-dep -fschedule-fusion -fshort-enums -fshrink-wrap -fshrink-wrap-separate\
-fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-backprop -fssa-phiopt -fstack-reuse=all\
-fstdarg-opt -fstrict-volatile-bitfields -fno-threadsafe-statics -ftrapping-math -ftree-bit-ccp\
-ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-cselim\ 
-ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im\
-ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=1 -ftree-phiprop -ftree-pta\
-ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -fvar-tracking -fvar-tracking-assignments\
-fweb -fmerge-constants -fno-associative-math -fno-cx-limited-range -fno-exceptions -fno-finite-math-only\
-fno-reciprocal-math -fno-unsafe-math-optimizations -fexcess-precision=standard -qbsp=leon2 -DCPU_FREQ=CPU_FREQ_125MHz\
-fno-builtin-strtok -c -o timer.o timer.c

GCC 与 -O1

gcc -O1 -std=c99 -msoft-float -qbsp=leon2 -DCPU_FREQ=CPU_FREQ_125MHz -fno-builtin-strtok -c -o timer.o timer.c

如果需要，我还可以提供 gcc 的输出，以查看在这两种情况下哪些标志处于活动状态。我发现的唯一区别是，-fexcess-precision 使用-O1 设置为“默认”。我尝试了两种可能性（快速和标准），但这没有任何区别。

有谁知道我忽略的-O 选项还激活了什么魔法？

【问题讨论】：

-O 激活不能用标志指定的优化。您可以禁用或启用具有标志的优化。某些优化无法禁用。它是什么微控制器？不同的供应商在他们的“gcc compliant”编译器中做了不同的奇怪的事情。 @S.S.Anne 感谢您提供此信息。您是否知道是否有任何方法可以找出编译器仅使用 -O 标志执行的操作？我也刚刚在上面链接的手册中找到了这一点“如果未在命令行上设置 -O 级别，大多数优化将完全禁用 [...]，即使指定了单独的优化标志。”这可以解释这种行为。 @yhyrcanus 我使用具有 Sparc V8 架构的 Leon2 微控制器。我使用的是gaisler提供的编译器嗯，是的，也是。如果我的脑袋想不出来，但你可以浏览 GCC 源代码（谷歌它）来找出答案。 【参考方案1】：

根据 GCC 手册

Most optimizations are only enabled if an -O level is set on the command line.
Otherwise they are disabled, even if individual optimization flags are specified.

因此仅指定优化标志是不够的。例如here，您可以看到只有在-O 和-fweb 都启用时才启用某些分析：

class pass_web : public rtl_opt_pass

  ...
  virtual bool gate (function *)  return (optimize > 0 && flag_web);

即使指定 -O1 并选择性地从更高的优化级别启用优化也不会可靠地工作，因为某些传递明确依赖于 -O 值。例如。 here 你可以看到在-O1 处禁用了部分 CSE 优化：

else if (tem == 1 || optimize > 1)
  cse_cfg_altered |= cleanup_cfg (0);

【讨论】：

以上是关于嵌入式 GCC 优化魔法的主要内容，如果未能解决你的问题，请参考以下文章