在 Node 中的缓冲区上调用 toString 时出现意外结果

Posted 2023-02-24

技术标签:

【中文标题】在 Node 中的缓冲区上调用 toString 时出现意外结果【英文标题】：Unexpected result when calling toString on a buffer in Node 【发布时间】：2021-08-12 11:10:01 【问题描述】：

我需要将数据恢复到已调用 toString 的缓冲区。例如：

const buffer // I need this, or equivalent
const bufferString = buffer.toString() // This is all I have

node documentation 表示.toString() 默认为“utf8”编码，我可以使用Buffer.from(bufferString, 'utf8') 恢复它，但这不起作用，我得到不同的数据。（在转换为字符串时可能会丢失一些数据，尽管文档似乎没有提到这一点）。

有谁知道为什么会发生这种情况或如何解决？

这是我必须重现的数据：

const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr) // The buffer I want!
const bufferString = buffer.toString() // The string I have!, note .toString() and .toString('utf8') are equivalent
const differentBuffer = Buffer.from(bufferString, 'utf8')

您可以通过执行以下操作从缓冲区中获取初始 intArr：

JSON.parse(JSON.stringify(Buffer.from(buffer)))['data']

编辑：有趣的是，在 differentBuffer 上调用 .toString() 会给出相同的初始字符串。

【问题讨论】：

【参考方案1】：

我认为您链接的文档的重要部分是 When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors. 当您将缓冲区转换为 utf8 字符串时，并非所有字符都是有效的 utf8，正如您通过 console.log(bufferString); 看到的那样，几乎所有字符都来自胡言乱语。因此，当从缓冲区转换为 utf8 字符串时，您将无法挽回地丢失数据，并且在转换回缓冲区时无法找回丢失的数据。

在您的示例中，如果您使用 utf16 而不是 utf8，您不会丢失信息，因此您的缓冲区在转换回来后是相同的。即

const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr);
const bufferString = buffer.toString('utf16le');
const differentBuffer = Buffer.from(bufferString, 'utf16le') ;
console.log(buffer); // same as the below log
console.log(differentBuffer); // same as the above log

【讨论】：

嗯，不，那行不通。就像 UTF-8 一样，有些字节序列不是有效的 UTF-16。【参考方案2】：

将'latin1' 或'binary' 编码与Buffer.toString 和Buffer.from 一起使用。这些编码是相同的，并将字节映射到 unicode 字符 U+0000 到 U+00FF。

【讨论】：

以上是关于在 Node 中的缓冲区上调用 toString 时出现意外结果的主要内容，如果未能解决你的问题，请参考以下文章