使用增强和过滤流解压缩档案

Posted 2023-02-17

技术标签:

【中文标题】使用增强和过滤流解压缩档案【英文标题】：Decompressing archives with boost and filtering streams 【发布时间】：2018-01-22 20:11:43 【问题描述】：

我正在解压缩大文件，其中包含指定的压缩数据块以各种方式。我写了如下代码：

// input_file - path to file
std::ifstream file(input_file, std::ios_base::in | std::ios_base::binary);
//move to begin of n-th data block, compressed by zlib
file.seekg(offset, std::ios_base::beg);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(file);
// write decompressed data to output file
boost::iostreams::copy(in, output);

我的理解是这一行

boost::iostreams::copy(in, output);

将开始解压缩和复制数据，直到文件结束，在这种情况下这是不需要的。

重要的是，我知道压缩数据的正确偏移量和长度。

Boost 文档说：

Source的模型可以定义如下：

struct Source 
    typedef char        char_type;
    typedef source_tag  category;
    std::streamsize read(char* s, std::streamsize n) 
    
        // Read up to n characters from the input 
        // sequence into the buffer s, returning   
        // the number of characters read, or -1 
        // to indicate end-of-sequence.
    
;

我想从 ifstream 类继承，覆盖它的 read 方法，并在该方法中计算读取了多少字节，如果该块中没有更多数据，则返回 -1，但不幸的是，它似乎不起作用。

我写道：

class ifstream_t : public std::ifstream
     public:
     ifstream_t(const std::string& fp, std::ios_base::openmode mode = std::ios_base::in) : std::ifstream(fp, mode)
     std::streamsize read(char* s, std::streamsize n) 
         // calculate remaining bytes 
         return -1;
        
;

并将其用于：

ifstream_t file(this->fp, std::ios_base::in | std::ios_base::binary);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(file);
boost::iostreams::copy(in, output);

从我的类中读取的方法没有被调用。

【问题讨论】：

【参考方案1】：

我的理解是这一行
 boost::iostreams::copy(in, output);
将开始解压缩和复制数据，直到文件结束，在这种情况下这是不需要的。

我刚刚对此进行了测试，但事实并非如此。当压缩数据完成时，解压器会正确检测到流的结束。

我创建了一个文件，其中包含一些随机数据，其中包含自己的压缩源：¹

(dd if=/dev/urandom bs=1 count=$((0x3214a)); cat main.cpp | zlib-flate -compress; dd if=/dev/urandom bs=1 count=$((0x3214a))) > input.txt

当使用带有硬编码偏移量和该文件的程序时：

Live On Coliru

#include <boost/iostreams/filter/zlib.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <fstream>
#include <iostream>

int main() 
    static std::string const input_file = "input.txt";
    static size_t      const offset     = 0x3214a;
    std::ostream& output = std::cout;

    // input_file - path to file
    std::ifstream file(input_file, std::ios_base::in | std::ios_base::binary);

    //move to begin of n-th data block, compressed by zlib
    file.seekg(offset, std::ios_base::beg);
    boost::iostreams::filtering_streambuf<boost::iostreams::input> in;

    in.push(boost::iostreams::zlib_decompressor());
    in.push(file);

    // write decompressed data to output file
    boost::iostreams::copy(in, output);

它很高兴地复制了自己的来源，正如您在 coliru 上看到的那样

¹coliru 上没有 zib-flate，所以我使用了 python：

python -c 'import zlib; import sys; sys.stdout.write(zlib.compress(sys.stdin.read()))'

【讨论】：

以上是关于使用增强和过滤流解压缩档案的主要内容，如果未能解决你的问题，请参考以下文章