通过 GZIPOutputStream 形成的 BufferedWriter 计算写入文件的字节数

Posted 2023-03-06

技术标签:

【中文标题】通过 GZIPOutputStream 形成的 BufferedWriter 计算写入文件的字节数【英文标题】：Count the bytes written to file via BufferedWriter formed by GZIPOutputStream 【发布时间】：2014-08-29 15:04:00 【问题描述】：

我有一个BufferedWriter，如下图：

BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
        new GZIPOutputStream( hdfs.create(filepath, true ))));

String line = "text";
writer.write(line);

我想找出写入文件的字节而不像

这样查询文件

hdfs = FileSystem.get( new URI( "hdfs://localhost:8020" ), configuration );

filepath = new Path("path");
hdfs.getFileStatus(filepath).getLen();

因为它会增加开销，我不希望这样。

我也不能这样做：

line.getBytes().length;

因为它在压缩之前给出了大小。

【问题讨论】：

听起来你想要某种 Java tee。 【参考方案1】：

您可以使用 Apache commons IO 库中的 CountingOutputStream。

将它放在 GZIPOutputStream 和文件 Outputstream (hdfs.create(..)) 之间。

将内容写入文件后，您可以从 CountingOutputStream 实例中读取写入的字节数。

【讨论】：

【参考方案2】：

如果这还不算太晚，并且您使用的是 1.7+，并且您不想像 Guava 或 Commons-IO 这样拉入整个库，您可以扩展 GZIPOutputStream 并获得来自关联Deflater 的数据如下：

public class MyGZIPOutputStream extends GZIPOutputStream 

  public MyGZIPOutputStream(OutputStream out) throws IOException 
      super(out);
  

  public long getBytesRead() 
      return def.getBytesRead();
  

  public long getBytesWritten() 
      return def.getBytesWritten();
  

  public void setLevel(int level) 
      def.setLevel(level);

【讨论】：

【参考方案3】：

您可以让自己成为OutputStream 的后代并计算write 方法被调用的次数

【讨论】：

【参考方案4】：

这与 Olaseni 的响应类似，但我将计数移至 BufferedOutputStream 而不是 GZIPOutputStream，这更可靠，因为 Olasen 的答案中的 def.getBytesRead() 在流已被关闭后不可用关闭。

通过下面的实现，您可以将自己的 AtomicLong 提供给构造函数，以便您可以在 try-with-resources 块中分配 CountingBufferedOutputStream，但在块退出后（即在文件已关闭）。

public static class CountingBufferedOutputStream extends BufferedOutputStream 
    private final AtomicLong bytesWritten;

    public CountingBufferedOutputStream(OutputStream out) throws IOException 
        super(out);
        this.bytesWritten = new AtomicLong();
    

    public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException 
        super(out, bufSize);
        this.bytesWritten = new AtomicLong();
    

    public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
            throws IOException 
        super(out, bufSize);
        this.bytesWritten = bytesWritten;
    

    @Override
    public void write(byte[] b) throws IOException 
        super.write(b);
        bytesWritten.addAndGet(b.length);
    

    @Override
    public void write(byte[] b, int off, int len) throws IOException 
        super.write(b, off, len);
        bytesWritten.addAndGet(len);
    

    @Override
    public synchronized void write(int b) throws IOException 
        super.write(b);
        bytesWritten.incrementAndGet();
    

    public long getBytesWritten() 
        return bytesWritten.get();

【讨论】：

我想知道我们是否可以解决重复计数的问题，假设一些内部实现会这样调用 write：write(byte[] b, int off, int len) 后跟 write(int b ) 对于数组中的每个字节。我们遇到了这些字节计数两次的问题。 @kolboc 否，在这种情况下 write(byte[] b, int off, int len) 调用不会在 write(int b) 调用的循环中，所以世界没有重复计算。对于这个特定的实现，它可能是安全的，但一般来说，这是一种可能性。这可能是一个很好的理由将此类计数流编写为包装器，而不是超级调用，而是使用包装流的实例。例如。类 CountintOutputStream(val outputStream: OutputStream): OutputStream override write(int b) byteCount++; outputStream.write(b); 这样，无论幕后发生什么事情，您都可以安全地执行此操作。对不起，这根本不可能错误地计算字节数。我给出的实现准确地计算了写入的字节数。没有办法调用它并且写入的字节数与计数的不同。你展示的实现几乎就是我的write(int) 方法。

以上是关于通过 GZIPOutputStream 形成的 BufferedWriter 计算写入文件的字节数的主要内容，如果未能解决你的问题，请参考以下文章