在 Python 3 中将 int 转换为字节

Posted 2023-02-15

技术标签:

【中文标题】在 Python 3 中将 int 转换为字节【英文标题】：Converting int to bytes in Python 3 【发布时间】：2014-01-27 20:31:08 【问题描述】：

我试图在 Python 3 中构建这个字节对象：

b'3\r\n'

所以我尝试了明显的（对我来说），发现了一个奇怪的行为：

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

显然：

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

阅读文档时，我一直无法看到任何关于字节转换为何以这种方式工作的指针。但是，我确实在这个 Python 问题中发现了一些关于将 format 添加到字节的令人惊讶的消息（另请参阅 Python 3 bytes formatting）：

http://bugs.python.org/issue3982

这与现在返回零的字节（int）这样的怪事交互更差

和：

如果 bytes(int) 返回该 int 的 ASCII 化，对我来说会更方便；但老实说，即使是错误也会比这种行为更好。（如果我想要这种行为——我从来没有过——我宁愿它是一个类方法，像“bytes.zeroes(n)”一样调用。）

有人能解释一下这种行为的来源吗？

【问题讨论】：

相关标题：3 .to_bytes 您的问题不清楚您是想要整数值 3，还是表示数字 3 的 ASCII 字符的值（整数值 51）。第一个是字节（[3]）== b'\x03'。后者是 bytes([ord('3')]) == b'3'。怎么了：("3" + "\r\n").encode()? 【参考方案1】：

这就是它的设计方式 - 这很有意义，因为通常，您会在可迭代而不是单个整数上调用 bytes：

>>> bytes([3])
b'\x03'

docs state this，以及bytes 的文档字符串：

 >>> help(bytes)
 ...
 bytes(int) -> bytes object of size given by the parameter initialized with null bytes

【讨论】：

请注意，上述内容仅适用于 python 3。在 python 2 中，bytes 只是str 的别名，这意味着bytes([3]) 为您提供'[3]'。在 Python 3 中，请注意 bytes([n]) 仅适用于从 0 到 255 的 int n。对于其他任何情况，它都会引发 ValueError。 @A-B-B：并不奇怪，因为一个字节只能存储 0 到 255 之间的值。还应该注意，bytes([3]) 仍然与 OP 想要的不同——即用于在 ASCII 中编码数字“3”的字节值，即。 bytes([51])，即b'3'，而不是b'\x03'。 bytes(500) 创建一个带有 len == 500 的字节串。它不会创建一个编码整数 500 的字节串。我同意 bytes([500]) 不能工作，这就是为什么也答错了。对于 >= 3.1 的版本，正确的答案可能是 int.to_bytes()。【参考方案2】：

来自bytes docs：

因此，构造函数参数被解释为 bytearray()。

那么，来自bytearray docs：

可选的源参数可用于以几种不同的方式初始化数组：
如果它是一个整数，则该数组将具有该大小并使用空字节进行初始化。

注意，这不同于 2.x（其中 x >= 6）的行为，其中 bytes 只是 str：

>>> bytes is str
True

PEP 3112:

2.6 str 与 3.0 的 bytes 类型在很多方面有所不同；最值得注意的是，构造函数完全不同。

【讨论】：

【参考方案3】：

该行为源于以下事实：在 Python 3 之前的版本中，bytes 只是str 的别名。在 Python3.x 中，bytes 是 bytearray 的不可变版本——全新的类型，不向后兼容。

【讨论】：

【参考方案4】：

文档说：

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

顺序：

b'3\r\n'

它是字符'3'（十进制51）字符'\r'（13）和'\n'（10）。

因此，方式会这样对待它，例如：

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

在 IPython 1.1.0 和 Python 3.2.3 上测试

【讨论】：

我最终选择了bytes(str(n), 'ascii') + b'\r\n' 或str(n).encode('ascii') + b'\r\n'。谢谢！ :) @Juanlu001，还有"\r\n".format(n).encode()我认为使用默认的utf8编码没有任何危害【参考方案5】：

您可以使用struct's pack:

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

“>”是byte-order (big-endian)，“I”是format character。因此，如果您想做其他事情，可以具体说明：

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

这在 python 2 和 python 3 上都是一样的。

注意：逆运算（字节到 int）可以用 unpack 完成。

【讨论】：

@AndyHayden 澄清一下，由于结构具有标准大小而与输入无关，I、H 和 B 一直工作到 2**k - 1，其中 k 是 32、16 和分别为 8 个。对于更大的输入，他们提出struct.error。可能被否决了，因为它没有回答问题：OP 想知道如何生成b'3\r\n'，即包含 ASCII 字符“3”而不是 ASCII 字符“的字节串\x03" @DaveJones 是什么让您认为这是 OP 想要的？ 接受的答案返回\x03，如果你只想要b'3'，解决方案是微不足道的。 A-B-B 引用的理由更合理……或者至少可以理解。 @DaveJones 另外，我添加这个答案的原因是因为谷歌在搜索时会把你带到这里。所以这就是它在这里的原因。这不仅在 2 和 3 中的工作方式相同，而且比 Python 3.5 中的 bytes([x]) 和 (x).to_bytes() 方法都快。这是出乎意料的。【参考方案6】：

从 python 3.2 你可以做

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

https://docs.python.org/3/library/stdtypes.html#int.to_bytes

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

因此，x == int_from_bytes(int_to_bytes(x))。请注意，上述编码仅适用于无符号（非负）整数。

对于有符号整数，位长计算起来有点棘手：

def int_to_bytes(number: int) -> bytes:
    return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)

def int_from_bytes(binary_data: bytes) -> Optional[int]:
    return int.from_bytes(binary_data, byteorder='big', signed=True)

【讨论】：

虽然这个答案很好，但它仅适用于无符号（非负）整数。我已经对其进行了改编，编写了一个answer，它也适用于有符号整数。这无助于从3 获取b"3"，正如问题所问的那样。（它会给b"\x03"。）可能值得指出的是to_bytes 和from_bytes 都支持signed 参数。这允许存储正数和负数，但需要额外的位。（***.com/a/64502258/5267751 解释了+7 的用途。）为什么需要括号，我在哪里可以找到关于它们的文档？【参考方案7】：

3 的 ASCII 化是"\x33" 而不是"\x03"！

这就是 python 对 str(3) 所做的事情，但对于字节来说这是完全错误的，因为它们应该被视为二进制数据数组，而不是被滥用为字符串。

实现你想要的最简单的方法是bytes((3,))，它比bytes([3]) 更好，因为初始化一个列表要昂贵得多，所以当你可以使用元组时不要使用列表。您可以使用int.to_bytes(3, "little") 转换更大的整数。

初始化具有给定长度的字节是有意义的并且是最有用的，因为它们通常用于创建某种类型的缓冲区，您需要为其分配一些给定大小的内存。我经常在初始化数组或通过向其写入零来扩展某些文件时使用它。

【讨论】：

这个答案有几个问题：（a）b'3'的转义符号是b'\x33'，而不是b'\x32'。 (b) (3) 不是一个元组——你必须添加一个逗号。 (c) 用零初始化序列的场景不适用于bytes 对象，因为它们是不可变的（不过，这对bytearrays 有意义）。感谢您的评论。我修正了这两个明显的错误。对于bytes 和bytearray，我认为主要是一致性问题。但是，如果您想将一些零推入缓冲区或文件中，它也很有用，在这种情况下，它仅用作数据源。【参考方案8】：

Python 3.5+ introduces %-interpolation (printf-style formatting) for bytes:

>>> b'%d\r\n' % 3
b'3\r\n'

见PEP 0461 -- Adding % formatting to bytes and bytearray。

在早期版本中，您可以使用str 和.encode('ascii') 结果：

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

注意：与what int.to_bytes produces不同：

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != '\x03'
True

【讨论】：

【参考方案9】：

int（包括Python2的long）可以使用以下函数转换为bytes：

import codecs

def int2bytes(i):
    hex_value = '0:x'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

反向转换可以由另一个完成：

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

这两个函数都适用于 Python2 和 Python3。

【讨论】：

'hex_value = '%x' % i' 在 Python 3.4 下不起作用。你得到一个 TypeError，所以你必须使用 hex() 来代替。 @bjmc 替换为 str.format。这应该适用于 Python 2.6+。谢谢，@renskiy。您可能想使用 'hex_codec' 而不是 'hex' 因为似乎 'hex' 别名在所有 Python 3 版本上都不可用，请参阅***.com/a/12917604/845210 @bjmc 已修复。谢谢这在 python 3.6 上的负整数上失败【参考方案10】：

我对 [0, 255] 范围内的单个 int 的各种方法的性能很好奇，所以我决定做一些时序测试。

根据下面的时间安排，以及我通过尝试许多不同的值和配置观察到的总体趋势，struct.pack 似乎是最快的，其次是int.to_bytes、bytes，然后是str.encode（不足为奇） ) 是最慢的。请注意，结果显示的变化比所表示的要多，int.to_bytes 和 bytes 在测试期间有时会切换速度排名，但 struct.pack 显然是最快的。

Windows 上 CPython 3.7 的结果：

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

测试模块（命名为int_to_byte.py）：

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://***.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://***.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://***.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://***.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921

    Similar to g10guang's answer:
    https://***.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with value:')
    for c in converters:
        print(f'c.__name__: ', end='')
        # Uses technique borrowed from https://***.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import c.__name__; value = value' " +
            f"'c.__name__(value)'"
        ))

【讨论】：

@A-B-B 正如我在第一句话中提到的，我只是针对[0, 255] 范围内的单个 int 进行测量。我假设“错误指标”是指我的测量结果不够通用，无法适应大多数情况？还是我的测量方法很差？如果是后者，我很想听听你要说什么，但如果是前者，我从来没有声称我的测量对所有用例都是通用的。对于我（也许是利基）的情况，我只处理[0, 255] 范围内的整数，这就是我打算用这个答案来解决的受众。我的回答不清楚吗？为了清楚起见，我可以对其进行编辑... 仅索引范围的预计算编码的技术怎么样？预计算不受时间限制，只有索引受。 @A-B-B 这是个好主意。听起来它会比其他任何东西都快。当我有时间时，我会做一些时间并将其添加到这个答案中。如果你真的想计算可迭代字节的时间，你应该使用bytes((i,)) 而不是bytes([i])，因为列表更复杂，使用更多内存并且需要很长时间来初始化。在这种情况下，一无所获。【参考方案11】：

虽然之前的answer by brunsgaard 是一种高效的编码，但它只适用于无符号整数。这个建立在它之上，适用于有符号和无符号整数。

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

对于编码器，使用(i + ((i * signed) < 0)).bit_length() 而不仅仅是i.bit_length()，因为后者会导致-128、-32768 等的低效编码。

感谢：CervEd 修复了一个小问题。

【讨论】：

int_to_bytes(-128, signed=True) == (-128).to_bytes(1, byteorder="big", signed=True) 是False 您没有使用长度 2，而是计算有符号整数的位长，加上 7，如果是有符号整数，则加 1。最后，您将其转换为以字节为单位的长度。这会为-128、-32768 等产生意想不到的结果。让我们continue this discussion in chat. 这就是你修复它的方法(i+(signed*i<0)).bit_length()【参考方案12】：

有些答案不适用于大数字。

将整数转换为十六进制表示，然后将其转换为字节：

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

结果：

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

【讨论】：

“所有其他方法都不适用于大数字。”这不是真的，int.to_bytes 适用于任何整数。 @juanpa.arrivillaga 是的，我的错。我已经编辑了我的答案。【参考方案13】：

如果问题是如何将整数本身（不是其等效字符串）转换为字节，我认为可靠的答案是：

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

在此处了解有关这些方法的更多信息：

https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes

【讨论】：

这与 5 年前发布的 brunsgaard 的答案有何不同，目前是投票率最高的答案？【参考方案14】：

由于要处理二进制表示，最好使用ctypes。

import ctypes
x = ctypes.c_int(1234)
bytes(x)

您必须使用特定的整数表示（有符号/无符号和位数：c_uint8、c_int8、c_unit16、...）。

【讨论】：

【参考方案15】：

我认为您可以先将 int 转换为 str，然后再转换为 byte。这应该会产生你想要的格式。

bytes(str(your_number),'UTF-8') + b'\r\n'

它在 py3.8 中对我有用。

【讨论】：

以上是关于在 Python 3 中将 int 转换为字节的主要内容，如果未能解决你的问题，请参考以下文章

在python中将bytearray转换为short int

如何在objective-c中将字节值转换为int

在 Python 3 中将字节转换为十六进制字符串的正确方法是啥？

如何在 Objective-C 中将 int 转换为字节？

如何在c ++中将int数组转换为字节数组[重复]

golang 在go中将字节数组转换为int