哪个 gcc O2 标志可能导致 fp 计算失败？

Posted 2023-02-24

技术标签:

【中文标题】哪个 gcc O2 标志可能导致 fp 计算失败？【英文标题】：Which gcc O2 flag may cause failure in fp calculation? 【发布时间】：2013-08-11 10:34:57 【问题描述】：

我使用 GCC O2 级别的优化在 pc386 系统上编译了 paranoia 浮点测试套件，并遇到了几次失败，但随后使用相同的 GCC 编译它没有优化并得到正确的结果。我读到了 O2 中启用的标志，但似乎都没有问题。可能是什么原因？妄想症代码可以在here 找到，这是经过 O2 优化的输出：

*** PARANOIA TEST ***
paranoia version 1.1 [cygnus]
Program is now RUNNING tests on small integers:
TEST: 0+0 != 0, 1-1 != 0, 1 <= 0, or 1+1 != 2
PASS: 0+0 != 0, 1-1 != 0, 1 <= 0, or 1+1 != 2
TEST: 3 != 2+1, 4 != 3+1, 4+2*(-2) != 0, or 4-3-1 != 0
PASS: 3 != 2+1, 4 != 3+1, 4+2*(-2) != 0, or 4-3-1 != 0
TEST: -1+1 != 0, (-1)+abs(1) != 0, or -1+(-1)*(-1) != 0
PASS: -1+1 != 0, (-1)+abs(1) != 0, or -1+(-1)*(-1) != 0
TEST: 1/2 + (-1) + 1/2 != 0
PASS: 1/2 + (-1) + 1/2 != 0
TEST: 9 != 3*3, 27 != 9*3, 32 != 8*4, or 32-27-4-1 != 0
PASS: 9 != 3*3, 27 != 9*3, 32 != 8*4, or 32-27-4-1 != 0
TEST: 5 != 4+1, 240/3 != 80, 240/4 != 60, or 240/5 != 48
PASS: 5 != 4+1, 240/3 != 80, 240/4 != 60, or 240/5 != 48
-1, 0, 1/2, 1, 2, 3, 4, 5, 9, 27, 32 & 240 are O.K.

Searching for Radix and Precision.
Radix = 2.000000 .
Closest relative separation found is U1 = 5.4210109e-20 .

Recalculating radix and precision
 confirms closest relative separation U1 .
Radix confirmed.
TEST: Radix is too big: roundoff problems
PASS: Radix is too big: roundoff problems
TEST: Radix is not as good as 2 or 10
PASS: Radix is not as good as 2 or 10
TEST: (1-U1)-1/2 < 1/2 is FALSE, prog. fails?
ERROR: Severity: FAILURE:  (1-U1)-1/2 < 1/2 is FALSE, prog. fails?.
PASS: (1-U1)-1/2 < 1/2 is FALSE, prog. fails?
TEST: Comparison is fuzzy,X=1 but X-1/2-1/2 != 0
PASS: Comparison is fuzzy,X=1 but X-1/2-1/2 != 0
The number of significant digits of the Radix is 64.000000 .
TEST: Precision worse than 5 decimal figures  
PASS: Precision worse than 5 decimal figures  
TEST: Subtraction is not normalized X=Y,X+Z != Y+Z!
PASS: Subtraction is not normalized X=Y,X+Z != Y+Z!
Subtraction appears to be normalized, as it should be.
Checking for guard digit in *, /, and -.
TEST: * gets too many final digits wrong.

PASS: * gets too many final digits wrong.

TEST: Division lacks a Guard Digit, so error can exceed 1 ulp
or  1/3  and  3/9  and  9/27 may disagree
PASS: Division lacks a Guard Digit, so error can exceed 1 ulp
or  1/3  and  3/9  and  9/27 may disagree
TEST: Computed value of 1/1.000..1 >= 1
PASS: Computed value of 1/1.000..1 >= 1
TEST: * and/or / gets too many last digits wrong
PASS: * and/or / gets too many last digits wrong
TEST: - lacks Guard Digit, so cancellation is obscured
ERROR: Severity: SERIOUS DEFECT:  - lacks Guard Digit, so cancellation is obscured.
PASS: - lacks Guard Digit, so cancellation is obscured
Checking rounding on multiply, divide and add/subtract.
TEST: X * (1/X) differs from 1
PASS: X * (1/X) differs from 1
* is neither chopped nor correctly rounded.
/ is neither chopped nor correctly rounded.
TEST: Radix * ( 1 / Radix ) differs from 1
PASS: Radix * ( 1 / Radix ) differs from 1
TEST: Incomplete carry-propagation in Addition
PASS: Incomplete carry-propagation in Addition
Addition/Subtraction neither rounds nor chops.
Sticky bit used incorrectly or not at all.
TEST: lack(s) of guard digits or failure(s) to correctly round or chop
(noted above) count as one flaw in the final tally below
ERROR: Severity: FLAW:  lack(s) of guard digits or failure(s) to correctly round or chop
(noted above) count as one flaw in the final tally below.
PASS: lack(s) of guard digits or failure(s) to correctly round or chop
(noted above) count as one flaw in the final tally below

Does Multiplication commute?  Testing on 20 random pairs.
     No failures found in 20 integer pairs.

Running test of square root(x).
TEST: Square root of 0.0, -0.0 or 1.0 wrong
PASS: Square root of 0.0, -0.0 or 1.0 wrong
Testing if sqrt(X * X) == X for 20 Integers X.
Test for sqrt monotonicity.
ERROR: Severity: DEFECT:  sqrt(X) is non-monotonic for X near 2.0000000e+00 .
Testing whether sqrt is rounded or chopped.
Square root is neither chopped nor correctly rounded.
Observed errors run from -5.5000000e+00 to 5.0000000e-01 ulps.
TEST: sqrt gets too many last digits wrong
ERROR: Severity: SERIOUS DEFECT:  sqrt gets too many last digits wrong.
PASS: sqrt gets too many last digits wrong
Testing powers Z^i for small Integers Z and i.
ERROR: Severity: DEFECT:  computing
        (1.30000000000000000e+01) ^ (1.70000000000000000e+01)
        yielded 8.65041591938133811e+18;
        which compared unequal to correct 8.65041591938133914e+18 ;
                they differ by -1.02400000000000000e+03 .
Errors like this may invalidate financial calculations
        involving interest rates.
Similar discrepancies have occurred 5 times.
Seeking Underflow thresholds UfThold and E0.
ERROR: Severity: FAILURE:  multiplication gets too many last digits wrong.
Smallest strictly positive number found is E0 = 0 .
ERROR: Severity: FAILURE:  Either accuracy deteriorates as numbers
approach a threshold = 0.00000000000000000e+00
 coming down from 0.00000000000000000e+00
 or else multiplication gets too many last digits wrong.

The Underflow threshold is 0.00000000000000000e+00,  below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+00) ^ (-inf)
only underflow should afflict the expression
        (2.00000000000000000e+00) ^ (-inf);
actually calculating yields: 0.00000000000000000e+00 .
This computed value is O.K.

Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065041e+00 as X -> 1.
ERROR: Severity: DEFECT:  Calculated 1.00000000000000000e+00 for
        (1 + (0.00000000000000000e+00) ^ (inf);
        differs from correct value by -6.38905609893065041e+00 .
        This much error may spoil financial
        calculations involving tiny interest rates.
Testing powers Z^Q at four nearly extreme values.
 ... no discrepancies found.

Searching for Overflow threshold:
This may generate an error.
Can `Z = -Y' overflow?
Trying it on Y = -inf .
finds a ERROR: Severity: FLAW:  -(-Y) differs from Y.
Overflow threshold is V  = -inf .
Overflow saturates at V0 = inf .
No Overflow should be signaled for V * 1 = -inf
                           nor for V / 1 = -inf .
Any overflow signal separating this * from the one
above is a DEFECT.
ERROR: Severity: FAILURE:  Comparisons involving +--inf, +-inf
and +-0 are confused by Overflow.
ERROR: Severity: SERIOUS DEFECT:    X / X differs from 1 when X = 1.00000000000000000e+00
  instead, X / X - 1/2 - 1/2 = 1.08420217248550443e-19 .
ERROR: Severity: SERIOUS DEFECT:    X / X differs from 1 when X = -inf
  instead, X / X - 1/2 - 1/2 = nan .
ERROR: Severity: SERIOUS DEFECT:    X / X differs from 1 when X = 0.00000000000000000e+00
  instead, X / X - 1/2 - 1/2 = nan .

What message and/or values does Division by Zero produce?
    Trying to compute 1 / 0 produces ...  inf .

    Trying to compute 0 / 0 produces ...  nan .

The number of  FAILUREs  encountered =       4.
The number of  SERIOUS DEFECTs  discovered = 5.
The number of  DEFECTs  discovered =         3.
The number of  FLAWs  discovered =           2.

The arithmetic diagnosed has unacceptable Serious Defects.
Potentially fatal FAILURE may have spoiled this program's subsequent diagnoses.
END OF TEST.
*** END OF PARANOIA TEST ***

EXECUTIVE SHUTDOWN! Any key to reboot...

【问题讨论】：

我认为如果您提供故障的详细信息以及相关代码（尽可能实用），这将是一个更好的问题。知道gcc 的版本号也无妨。当然，如果您可以将一个或多个故障减少到 SSCCE (sscce.org)，那就更好了。现代硬件上最新版本的 GCC 可以利用选项 -msse2 -mfpmath=sse 来生成程序集，该程序集可以精确地计算每个表达式的类型精度。从这个描述的角度来看 Paranoia 的源代码也可能是有益的：gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html。如果最近的 GCC 还没有为浮点生成严格的 IEEE 754 代码，-std=c99 或 -fexcess-precision=standard 使生成的程序集符合 Joseph S. Myers 提出的解释。 【参考方案1】：

优化和-O2 不是这里的罪魁祸首。您正在运行的测试套件在具有其他优化方案的 C 实现中可能会失败。这种情况下的主要问题似乎是偏执狂测试是测试浮点运算是否一致并具有各种属性，但是您使用的 C 实现中的浮点运算并不一致，因为有时它使用 80 位算术，有时它使用 64 位算术（或它的近似值，例如使用 80 位算术但将结果四舍五入为 64 位浮点）。

最初，测试找到一个数字U1，使得1-U1 不同于1，并且1-U1 和1 之间没有可表示的值。也就是说，U1 是从1 向下到浮点格式下一个可表示值的步长。在您的情况下，测试发现 U1 约为 5.4210109e-20。这个U1 正好是 2^-64。您正在运行的英特尔处理器具有 80 位浮点格式，其中有效数（浮点表示的小数部分）具有 64 位。有效数字的这个 64 位宽度导致步长为 2^-64，所以这就是为什么 U1 是 2^-64。

稍后，测试评估 (1-U1)-1/2 并将其与 1/2 进行比较。由于1-U1 小于 1，因此减去 1/2 应该会产生小于 1/2 的结果。但是，在这种情况下，您的 C 实现正在使用 64 位算术评估 1-U1，它具有 53 位有效数。对于 53 位有效位，1-U1 无法准确表示。由于它非常接近 1，因此 1-U1 的数学值在 64 位格式中四舍五入为 1。然后从这个 1 中减去 1/2 得到 1/2。这个1/2不小于1/2，所以比较失败，程序报错。

这是您的 C 实现的缺陷。它实际上对1-U1 的评估在一个地方与在另一个地方不同。它在一个地方使用 80 位算术，在另一个地方使用 64 位，并且它没有提供控制它的好方法。（但可能有开关只使用 64 位算术；我不知道你的 GCC 版本。）

虽然从想要良好浮点运算的人的标准来看，这是一个缺陷，但根据 C 标准，这不是一个缺陷。 C 语言标准允许这种行为。

我没有检查过第一次之后报告的故障。它们可能源于类似的原因。

【讨论】：

GCC 版本是 4.4.7。优化会改变浮点精度吗？如果不是这样，为什么在没有优化的情况下编译会产生正确的答案？ @ArdalanPouyabahar 较旧的 GCC 版本不保证无论有无优化的浮点计算都会产生相同的结果。可以在编译时使用与运行时语义不同的语义来完成简单的计算。有关使用这种编译器预测浮点行为的困难的相当完整的报告，请参阅 arxiv.org/abs/cs/0701192，或 blog.frama-c.com/index.php?post/2013/07/06/… 和 blog.frama-c.com/index.php?post/2013/07/24/… @PascaCuoq 链接很有帮助，但是当我检查 4.4.7 版本是 2012 年时，它也有这个问题吗？ @ArdalanPouyabahar：浮点精度随编译器的“突发奇想”而变化。考虑一下当编译器试图评估一个表达式并确定它不在寄存器中时，它通过将一个值从 80 位浮点格式四舍五入到 64 位格式来临时保存一个值并将其写入堆栈。当重新组织表达式、更改编译器版本、更改优化开关时，无论何时，这种临时保存可能会或可能不会发生。它是不受控制的。为了它的价值，我用 Apple Clang 4.0 编译 paranoia.c 并在 MacPro4,1 上运行它，使用 -pedantic -std=c99 和 -O0 和 -O3，两次运行都报告了没有错误。

以上是关于哪个 gcc O2 标志可能导致 fp 计算失败？的主要内容，如果未能解决你的问题，请参考以下文章