如何在 C++ 中从字节数组（在 BIG-ENDIAN 中）中提取单个字段

Posted 2023-02-24

技术标签:

【中文标题】如何在 C++ 中从字节数组（在 BIG-ENDIAN 中）中提取单个字段【英文标题】：How to extract individual fields from byte array (which is in BIG-ENDIAN) in C++ 【发布时间】：2013-10-15 22:12:36 【问题描述】：

我正在尝试从byteData 读取几个字节，如下面的 C++ 代码中所述。 byteData 中的实际值是 BIG-ENDIAN 字节顺序格式的二进制 blob 字节数组。所以我不能简单地将字节数组“转换”为字符串..

byteData字节数组就是由这三样东西组成的——

First is `schemaId` which is of two bytes (short datatype in Java)
Second is `lastModifiedDate` which is of eight bytes (long datatype in Java)
Third is the length of actual `byteArray` within `byteData` which we need from `byteData`.
Fourth is the actual value of that `byteArray` in `byteData`.

现在我正在尝试从 C++ 中的 byteData 中提取上述特定信息...不知何故我能够提取 schemaId 但即将出现的值是错误的..而且我不知道如何提取其他的东西......

uint16_t schemaId;
uint64_t lastModifiedDate;
uint16_t attributeLength;
const char* actual_binary_value;

while (result.next()) 
    for (size_t i = 0; i < result.column_count(); ++i) 
        cql::cql_byte_t* byteData = NULL;
        cql::cql_int_t size = 0;
        result.get_data(i, &byteData, size);

        if (!flag) 

            // I cannot just "cast" the byte array into a String
            // value = reinterpret_cast<char*>(byteData);

            // now how to retrieve schemaId, lastModifiedDate and actual_binary_value from byteData?

            schemaId = *reinterpret_cast<uint16_t*>(byteData);

            flag = false;
        
    

// this prints out 65407 somehow but it should be printing out 32767
    cout<< schemaId <<endl;

如果有人需要查看我的 java 代码，那么这是我的 java 代码 -

    byte[] avroBinaryValue = text.getBytes();

    long lastModifiedDate = 1289811105109L;
    short schemaId = 32767;

    int size = 2 + 8 + 4 + avroBinaryValue.length; // short is 2 bytes, long 8 and int 4

    ByteBuffer bbuf = ByteBuffer.allocate(size); 
    bbuf.order(ByteOrder.BIG_ENDIAN);

    bbuf.putShort(schemaId);
    bbuf.putLong(lastModifiedDate);
    bbuf.putInt(avroBinaryValue.length);
    bbuf.put(avroBinaryValue);

    // merge everything into one bytearray.
    byte[] bytesToStore = bbuf.array();

            Hex.encodeHexString(bytesToStore)

谁能帮助我在我的 C++ 代码中做错了什么以及为什么我无法从它和其他字段中正确提取 schemaId？

更新：-

使用后-

schemaId = ntohs(*reinterpret_cast<uint16_t*>(data));

我开始正确取回 schemaId 的值。

但是现在如何提取其他东西，例如lastModifiedDate，实际byteArray withinbyteDataand actual value of thatbyteArrayinbyteData`的长度。

我将它用于lastModifiedDate，但它无法正常工作--

std::copy(reinterpret_cast<uint8_t*>(byteData + 2), reinterpret_cast<uint8_t*>(byteData + 10), lastModifiedDate);

【问题讨论】：

我应该在这里使用ntohl吗？ ntohs 可能是要走的路。额外的好处：无论代码运行的系统的字节顺序如何，它都会构造正确的值。 @IInspectable：谢谢，现在说得通了。知道我应该如何从中提取其他字段，如我的问题中提到的那样吗？如果所有字段都以大端格式存储，您需要使用相应的ntoh<type> 变体将转换应用于所有字段。对于 8 字节字段，您必须使用 ntohll。我不知道这是否是一个标准的 C++ 函数。 @llnspectable：感谢您的建议。我已经尝试了很多方法来从 bytearray 中提取 lastModifiedDate 和其他内容，但每次我得到错误的结果。你能帮我解决这个问题吗？ 【参考方案1】：

32767 是 0x7fff。 65407 是 0xff7f。请注意，交换了高位和低位字节。您需要交换这些字节以将数字恢复为原始值。幸运的是，有一个名为ntohs（网络到主机简称）的宏或函数可以完全满足您的需求。这是宏还是函数，以及在哪个标头中定义，取决于您的系统。但是宏/函数的名称始终是ntohs，无论是使用 Windows、Linux、Sun 还是 Mac。

在小端机器上，这个宏或函数交换两个字节，形成一个 16 位整数。在大端机器上，这个宏/函数什么都不做（这正是我们想要的）。请注意，现在大多数家用电脑都是小端的。

【讨论】：

为什么我需要交换字节？我在 Java 端以 BIG-ENDIAN 字节顺序格式存储它，对吗？所以它应该与 ntohl 一起使用吗？如果我的理解有误，请纠正我？您需要在您的 C++ 代码中交换字节，因为您的机器是小端，并且因为您的 Java 代码以大端顺序存储数字。它不适用于 ntohl。最后的“l”表示“long”——32位。您正在存储一个短的 -- 16 位。为此，您使用ntohs（“s”代表“short”）。谢谢，现在对我来说有点道理。除此之外，如何确定我的机器是 BIG-ENDIAN 还是 LITTLE-ENDIAN？你不需要知道。在小端机器上，ntohs 及其近亲（htons、ntohl、htonl）执行所需的字节交换以在主机顺序和网络顺序之间进行转换。在大端机器上，这些函数只是将输入值作为输出返回。这些功能是可移植的；无论主机架构如何，他们都会做正确的事。当然.. 非常感谢您的帮助... 现在回到我的第二个问题 - 如何提取 lastModifiedDate 和其他内容？我试过这样做 -

std::copy(reinterpret_cast&lt;uint8_t*&gt;(byteData + 2), reinterpret_cast&lt;uint8_t*&gt;(byteData + 10), lastModifiedDate)

但它对我不起作用？你能看出我做错了什么吗？

以上是关于如何在 C++ 中从字节数组（在 BIG-ENDIAN 中）中提取单个字段的主要内容，如果未能解决你的问题，请参考以下文章