如何将无符号字符数组解析为数值数据

Posted 2023-02-16

技术标签:

【中文标题】如何将无符号字符数组解析为数值数据【英文标题】：how to parse unsigned char array to numerical data 【发布时间】：2014-08-22 19:49:33 【问题描述】：

我的问题设置如下：

我有一个源向我的接收计算机发送 UDP 数据包接收计算机将 UDP 数据包接收到unsigned char *message。

我可以使用逐字节打印数据包

for(int i = 0; i < sizeof(message); i++) 
    printf("0x%02 \n", message[i];

这就是我所在的地方！现在我想开始将我收到的这些字节解析为短裤、整数、长整数和字符串。

我写了一系列函数，比如：

short unsignedShortToInt(char[] c) 
    short i = 0;
    i |= c[1] & 0xff;
    i <<= 8;
    i |= c[0] & 0xff;
   return i;

解析字节并将它们转换为整数、长整数和短整数。我可以使用sprintf() 从字节数组创建字符串。

我的问题是——从我的大量 UDP 数据包中获取子字符串的最佳方法是什么？数据包的长度超过 100 个字符，所以我想要一种简单的方法将 message[0:6] 或 message[20:22] 传递给这些变体实用程序函数。

可能的选择：

我可以使用strcpy()为每个函数调用创建一个临时数组，但这似乎有点乱。

我可以将整个数据包变成一个字符串并使用std::string::substr。这看起来不错，但我担心将无符号字符转换为有符号字符（字符串转换过程的一部分）可能会导致一些错误（也许这种担心是没有根据的？）。

也许是另一种方式？

所以我请你，***，推荐一个干净，简洁的方法来完成这个任务！

谢谢！

【问题讨论】：

我看到在您的“函数系列”中，您引入了变量 c 作为函数输入，但在函数体中使用了变量 b；这只是这个问题的错字吗？你需要知道发送（应用）协议。消息的布局是什么？一旦知道了，就可以声明对应的struct，并进行相应的转换。并且return i; 最好用错误标记您的编译器。函数unsignedShortToInt 返回void（这可能是另一个错字）。我认为您可以毫无问题地使用std::string。如果你想保留无符号字符，你可以改用std::basic_string<unsigned char>，但我认为它不会影响你用于将字节转换为整数的方法取决于平台。通常用于确保平台的字节顺序正确的函数是htons() 等...beej.us/guide/bgnet/output/html/multipage/htonsman.html 【参考方案1】：

为什么不使用properserialization?

即MsgPack

您需要一个方案来区分消息。例如，您可以将它们设为self-describing，例如：

struct my_message 
  string protocol;
  string data;
;

并根据协议调度解码。

您最好使用经过测试的序列化库，而不是发现您的系统容易受到缓冲区溢出攻击和故障的影响。

【讨论】：

【参考方案2】：

我认为你有两个问题需要解决。首先，在从字符缓冲区中提取整数数据后，您需要确保它们在内存中正确对齐。接下来，您需要确保整数数据在提取后的正确字节顺序。

对齐问题可以通过union 解决，该union 包含叠加在正确大小的字符数组上的整数数据类型。网络字节顺序问题可以使用标准的ntohs() 和ntohl() 函数来解决。这只有在发送软件也使用这些函数的逆函数产生的标准字节顺序时才有效。

见：http://www.beej.us/guide/bgnet/output/html/multipage/htonsman.html

这里有几个未经测试的函数，您可能会觉得有用。我认为他们应该做你所追求的。

#include <netinet/in.h>

/**
 * General routing to extract aligned integral types
 * from the UDP packet.
 *
 * @param data Pointer into the UDP packet data
 * @param type Integral type to extract
 *
 * @return data pointer advanced to next position after extracted integral.
 */
template<typename Type>
unsigned char const* extract(unsigned char const* data, Type& type)

    // This union will ensure the integral data type is correctly aligned
    union tx_t
    
        unsigned char cdata[sizeof(Type)];
        Type tdata;
     tx;

    for(size_t i(0); i < sizeof(Type); ++i)
        tx.cdata[i] = data[i];

    type = tx.tdata;

    return data + sizeof(Type);


/**
 * If strings are null terminated in the buffer then this could be used to extract them.
 *
 * @param data Pointer into the UDP packet data
 * @param s std::string type to extract
 *
 * @return data pointer advanced to next position after extracted std::string.
 */
unsigned char const* extract(unsigned char const* data, std::string& s)

    s.assign((char const*)data, std::strlen((char const*)data));
    return data + s.size();


/**
 *  Function to parse entire UDP packet
 *
 * @param data The entire UDP packet data
 */
void read_data(unsigned char const* const data)

    uint16_t i1;
    std::string s1;
    uint32_t i2;
    std::string s2;

    unsigned char const* p = data;

    p = extract(p, i1); // p contains next position to read
    i1 = ntohs(i1);

    p = extract(p, s1);

    p = extract(p, i2);
    i2 = ntohl(i2);

    p = extract(p, s2);

希望对您有所帮助。

编辑：

我已编辑示例以包含字符串。这在很大程度上取决于字符串如何存储在流中。此示例假定字符串是以 null 结尾的 c 字符串。

EDIT2：

糟糕，已将代码更改为根据问题接受 unsigned 字符。

【讨论】：

【参考方案3】：

如果数组只有 100 个字符的长度，只需创建一个 char buffer[100] 和一个 queue，这样您就不会错过处理任何消息。

接下来，您可以按照您的描述索引该缓冲区，如果您知道消息的结构，那么您就知道索引点。

接下来你可以union类型，即

union myType
    char buf[4];
    int x;

如果你需要的话，将值作为 int 从 char 给你

【讨论】：

只有在发送机器与接收机器具有相同字节序的情况下才有效，这不能保证。两端都应该使用 hton* 和 ntoh* 函数。是的，但如果您知道数据包有效负载结构，则可以简单地检查接收情况。

以上是关于如何将无符号字符数组解析为数值数据的主要内容，如果未能解决你的问题，请参考以下文章