Python中的浮点字符串到IEEE 754 binary128

Posted 2023-02-16

技术标签:

【中文标题】Python中的浮点字符串到IEEE 754 binary128【英文标题】：String of float to IEEE 754 binary128 in Python 【发布时间】：2021-08-10 12:51:42 【问题描述】：

我想要这样的功能：

>>> binary128_to_hex("1.0")
'3fff0000000000000000000000000000'

我目前使用 C 和 qemu-aarch64 在我的 x86 笔记本电脑上执行此操作。如何“本机”实现这样的功能？我发现 numpy.float128 和 struct 包没有帮助。另外，感谢Mark Dickinson’s answer，我想出了做反向转换（虽然只处理标准化值）：

import decimal


def hex_to_binary128(x: str):
    with decimal.localcontext() as context:
        context.prec = 34
        x = int(x, 16)
        significand_mask = (1 << 112) - 1
        exponent_mask = (1 << 127) - (1 << 112)
        trailing_significand = x & significand_mask
        significand = 1 + decimal.Decimal(trailing_significand) / (1 << 112)
        biased_exponent = (x & exponent_mask) >> 112
        exponent = biased_exponent - 16383
        f = significand * decimal.Decimal(2) ** exponent
        return f


if __name__ == "__main__":
    print(hex_to_binary128("0000ffffffffffffffffffffffffffff"))
    # 3.362103143112093506262677817321752E-4932

【问题讨论】：

这有两个部分：首先找到一种在 Python 中表示 IEEE 754 binary128 值的方法（并创建这些值的实例），然后为该值生成十六进制输出。你坚持这两个部分中的哪一个，你想如何在 Python 中表示binary128 实例？是的，numpy.float128 不会有帮助，因为它不是任何通用平台上 IEEE 754 float128 类型的表示。见***.com/questions/9062562/…。 【参考方案1】：

这里有一系列可能的解决方案，具体取决于您希望允许的复杂程度、您需要什么样的性能、您准备在多大程度上依赖外部库，以及您需要在多大程度上处理 IEEE 754 种特殊情况（溢出、次正规、有符号零等）。

这里有一些工作代码，我希望能给出合理的折衷方案。它 (a) 相当简单，(b) 不依赖于标准库之外的任何东西，(c) 可以很好地处理次正规、有符号零和溢出（但不会尝试解释像“inf”或“nan”这样的字符串")，并且 (d) 的性能可能很糟糕。但是，如果您只对休闲用途感兴趣，那可能就足够了。

代码的想法是通过使用fractions.Fraction 解析器将字符串输入解析为Fraction 对象来回避所有解析困难。然后，我们可以挑选出 Fraction 对象并构造我们需要的信息。

我将分三部分介绍解决方案。首先，我们需要的基本工具之一是能够计算正数Fraction 的二进制指数（换句话说，以 2 为底的对数的底）。这是代码：

def exponent(f):
    """
    Binary exponent (IEEE 754 style) of a positive Fraction f.

    Returns the unique integer e such that 2**e <= f < 2**(e + 1). Results
    for negative or zero f are not defined.
    """
    n, d = f.numerator, f.denominator
    e = n.bit_length() - d.bit_length()
    if e >= 0:
        adjust = (n >> e) < d     # n / d < 2**e  <=>  floor(n / 2**e) < d
    else:
        adjust = (-d >> -e) < -n  # n / d < 2**e  <=>  floor(-d / 2**-e) < -n
    return e - adjust

这很简单：分子和分母的位长度差异e 要么给我们正确的指数，要么比它应该的大一个。要找出哪个，我们必须将分数的值与2**e 进行比较，其中e 是我们的测试指数。我们可以直接这样做，将2**e 计算为Fraction，然后进行比较，但是使用一些位移会更有效，所以我们就是这样做的。

接下来我们定义一些描述 IEEE 754 binary128 格式的基本常量和派生常量。（这使得测试下面的代码变得很容易，通过将这些常量替换为 binary64 格式的常量并检查结果是否符合预期。）格式的位宽是128；精度是113，其他的一切都可以从这两个值推导出来。

# Basic and derived constants for the format.
WIDTH = 128
PRECISION = 113
EMAX = (1 << WIDTH - PRECISION - 1) - 1
EMIN = 1 - EMAX
INF = (1 << WIDTH - 1) - (1 << PRECISION - 1)
SIGN_BIT = 1 << WIDTH - 1

大部分内容应该是不言自明的。常量INF 是正无穷大常量的位表示，我们将使用它来处理溢出。

最后，主要功能如下：

from fractions import Fraction as F

def binary128_to_hex(s):
    """
    Convert a decimal numeric string to its binary128 representation.

    Given a decimal string 's' (for example "1.2", or "-0.13e-123"), find the
    closest representable IEEE 754 binary128 float to the value represented
    by that string, and return a hexadecimal representation of the bits
    of that float in the corresponding IEEE 754 interchange format.
    """
    # Convert absolute value to a Fraction. Capture the sign separately.
    f, negative = abs(F(s)), s.lstrip().startswith("-")

    # Find the bits representing the significand and exponent of the result.
    if f == 0:
        bits = 0  # Handle special case of zero.
    else:
        # Find exponent; adjust for possible subnormal.
        exp = max(exponent(f), EMIN)
        if exp > EMAX:
            bits = INF  # Overflow to infinity
        else:
            significand = round(f / F(2) ** (exp + 1 - PRECISION))
            bits = (exp - EMIN << PRECISION - 1) + significand

    # Merge sign bit if necessary, then format as a hex string.
    if negative:
        bits |= SIGN_BIT
    return f'bits:0WIDTH//4x'

上面有两个鬼鬼祟祟的花招值得特别一提：首先，当使用表达式(exp - EMIN << PRECISION - 1) + significand从exponent和significand构造bits时，我们什么都不做专门处理次要问题。代码仍然正确处理次正规：对于正常情况，指数值exp - EMIN 实际上比它应该小一，但是当我们执行加法时，有效数的最高有效位最终会增加指数字段。（因此，重要的是我们使用+ 而不是| 将指数部分与有效数字结合起来。）

另一个观察结果是，虽然exp 的选择确保round 调用的参数严格小于2**PRECISION，但结果是可能的round 调用中的 em> 恰好是 2**PRECISION。那时您可能希望我们必须针对这种情况进行测试，并相应地调整指数和有效数。但同样，没有必要特别处理这种情况 - 当使用(exp - EMIN << PRECISION - 1) + significand 组合字段时，我们得到指数字段的额外增量并且一切正常，即使在我们最终溢出到无穷大的极端情况下也是如此。 IEEE 754 二进制交换格式的优雅设计使这种诡计成为可能。

以下是在几个示例上测试上述代码的结果：

>>> binary128_to_hex("1.0")
'3fff0000000000000000000000000000'
>>> binary128_to_hex("-1.0")
'bfff0000000000000000000000000000'
>>> binary128_to_hex("-0.0")
'80000000000000000000000000000000'
>>> binary128_to_hex("3.362103143112093506262677817321752E-4932")
'0000ffffffffffffffffffffffffffff'
>>> binary128_to_hex("1.1897314953572317650857593266280071308E+4932")
'7fff0000000000000000000000000000'
>>> binary128_to_hex("1.1897314953572317650857593266280070162E+4932")
'7ffeffffffffffffffffffffffffffff'
>>> binary128_to_hex("1.2345E-4950")  # subnormal value
'00000000000000000006c5f6731b03b8'
>>> binary128_to_hex("3.14159265358979323846264338327950288")
'4000921fb54442d18469898cc51701b8'

【讨论】：

以上是关于Python中的浮点字符串到IEEE 754 binary128的主要内容，如果未能解决你的问题，请参考以下文章