生成的 blob 不包含完整的数据集

Posted

技术标签:

【中文标题】生成的 blob 不包含完整的数据集【英文标题】:Generated blob doesn't contain full dataset 【发布时间】:2021-06-02 04:57:42 【问题描述】:

我正在尝试在 javascript 中拼凑一个二进制包(字符串、整数和文件的组合),然后将其传递给 C# WebSocket 服务器,但我无法将整个数据集包含在 blob 中。我看到的是最终输出的 blob 远小于其各部分的总和。

例如,数据文件本身是 8890,但 finalBlob.size 只有 119。

最终我只需要一个简洁的小二进制包,它用 Javascript 组装的混合数据,我可以在 C# 内存流中运行,而无需将所有内容都转换为 UTF8 文本。

fetch(request).then(data => AssemblePackage(data)).catch(error => 
    console.error("Unable to retrieve asset: " + error);
);
    
async function AssemblePackage(data)
    //Example Data
    var token = "3F73B3E6-7351-416F-ACA3-AE639A3F587F";
    var requestKey = 3;
    var requestId = "6BFF91D6-4346-424C-AE01-62E5C7F3C9BC";
    var assetName = "MyAssetFile.dat";
    
    let parts = [];
    //36 UTF8 chars (Includes guid '-' chars)
    parts.push(token);

    //2 UTF8 bytes max '00'-'99' (workaround for blob not storing an int as 
    //an int but a UTF8 string...)
    let chars = requestKey.toString().split('');
    while(chars.length < 2)
        chars.unshift('0');
    
    parts.push(chars.join(''));

    //36 UTF8 chars (Includes guid '-' chars)
    parts.push(requestId);

    //3 UTF8 bytes max '000'-'999' (workaround for blob not storing an int as 
    //an int but a UTF8 string...)
    chars = assetName.length.toString().split('');
    while(chars.length < 3)
        chars.unshift('0');
    
    parts.push(chars.join(''));

    //1-999 character string
    parts.push(assetName);

    //
    let fileBlob = await data.blob();

    parts.push(await this.ReadFileBlobAsUint8(fileBlob));

    var finalBlob = new Blob([parts], type: "application/octet-stream");


ReadFileBlobAsUint8(fileBlob)
    return new Promise((resolve, reject) =>
        let rdr = new FileReader();

        rdr.onerror = () => 
            rdr.abort();
            reject(rdr.error);
        ;

        rdr.onload = () =>
            resolve(new Uint8Array(rdr.result));
        ;
        rdr.readAsArrayBuffer(fileBlob);
    );

【问题讨论】:

【参考方案1】:

这就是我最终需要做的。它并不优雅,但它可以解决问题。

获取请求。

fetch(request).then(data => AssemblePackage(data)).catch(error => 
    console.error("Unable to retrieve asset: " + error);
);

收集数据

async AssemblePackage(data)
    var token = "3F73B3E6-7351-416F-ACA3-AE639A3F587F";
    var requestKey = 3;
    var requestId = "6BFF91D6-4346-424C-AE01-62E5C7F3C9BC";
    var assetName = "MyAssetFile.dat";
    
    if(data.ok)
        let textEnc = new TextEncoder();

        //36 char UTF8 GUID (Includes guid '-' chars)
        let rawToken = textEnc.encode(token);

        //single uint representing request Key
        let rawKey = new Uint8Array(1);
        rawKey[0] = requestKey;

        //36 char UTF8 GUID (Includes guid '-' chars)
        let rawId = textEnc.encode(requestId);

        //UTF8 string representing asset name max-length: 255 
        let rawName = textEnc.encode(assetName);

        //Single uint representing the length of the assetName
        let nameSize = new Uint8Array(1);
        nameSize[0] = rawName.byteLength;

        //byte[] representing the fetched asset
        let asset = new Uint8Array(await data.arrayBuffer());
             
           
        //Calculate combined size
        let size = rawToken.byteLength + rawKey.byteLength + rawId.byteLength;
        size += nameSize.byteLength + rawName.byteLength + asset.byteLength;

        //Alloclate new buffer
        let result = new Uint8Array(size);

        // Build the new array
        let offset = 0;

        result.set(rawToken, offset);
        offset += rawToken.byteLength;

        result.set(rawKey, offset);
        offset += rawKey.byteLength;

        result.set(rawId, offset);
        offset += rawId.byteLength;

        result.set(nameSize, offset);
        offset += nameSize.byteLength;

        result.set(rawName, offset);
        offset += rawName.byteLength;

        result.set(asset, offset);
        offset += asset.byteLength;

        //Do something with your byte[]
        SendData(result.buffer);

    else
        //Text Message
        console.error("Asset fetch failed: "+ data);
    

【讨论】:

以上是关于生成的 blob 不包含完整的数据集的主要内容,如果未能解决你的问题,请参考以下文章

使用 Scikit-Learn 生成高维数据集

R语言使用random包生成随机数或者随机字符串实战:randomNumbers函数创建随机整数的数据集(包含重复项)randomSequence函数创建不含重复项的随机序列数据集创建随机字符串

Tianchi发布完整开源机器学习数据集!

为啥这个 for 循环不处理完整的数据集?

机器学习项目实战10例目录项目详解 + 数据集 + 完整源码

机器学习(TensorFlow)---Fashion MNIST数据集使用范例(计算机视觉)