从 OutputStream 创建 InputStream 的最有效方法

Posted 2023-02-25

技术标签:

【中文标题】从 OutputStream 创建 InputStream 的最有效方法【英文标题】：Most efficient way to create InputStream from OutputStream 【发布时间】：2010-11-16 14:09:04 【问题描述】：

本页：http://blog.ostermiller.org/convert-java-outputstream-inputstream 描述了如何从 OutputStream 创建一个 InputStream：

new ByteArrayInputStream(out.toByteArray())

其他替代方法是使用 PipedStreams 和新线程，这很麻烦。

我不喜欢将许多兆字节复制到内存字节数组中的新想法。有没有比这更有效的库？

编辑：

根据 Laurence Gonsalves 的建议，我尝试了 PipedStreams，结果证明它们并不难处理。这是clojure中的示例代码：

(defn #^PipedInputStream create-pdf-stream [pdf-info]
  (let [in-stream (new PipedInputStream)
        out-stream (PipedOutputStream. in-stream)]
    (.start (Thread. #(;Here you write into out-stream)))
    in-stream))

【问题讨论】：

【参考方案1】：

如果您不想一次将所有数据复制到内存缓冲区中，那么您将必须拥有使用 OutputStream（生产者）的代码和使用 InputStream 的代码（消费者）要么在同一个线程中交替，要么在两个单独的线程中同时操作。让它们在同一个线程中运行可能比使用两个单独的线程要复杂得多，更容易出错（您需要确保使用者从不阻塞等待输入，或者您' ll 有效地死锁）并且需要让生产者和消费者在同一个循环中运行，这看起来耦合得太紧密了。

所以使用第二个线程。真的没那么复杂。您链接到的页面有合理的例子。这是一个有点现代化的版本，它也关闭了流：

try (PipedInputStream in = new PipedInputStream()) 
    new Thread(() -> 
        try (PipedOutputStream out = new PipedOutputStream(in)) 
            writeDataToOutputStream(out);
         catch (IOException iox) 
            // handle IOExceptions
        
    ).start();
    processDataFromInputStream(in);

【讨论】：

我认为您还需要为每个消费者线程创建新的 PipedInputStream。如果你从另一个线程读取管道，它会给你一个错误。 Stephen：在写完之前你无法阅读。因此，只有一个线程，您要么需要先编写所有内容（创建一个 Vagif 想要避免的大型内存数组），要么需要让它们交替进行，非常小心地让读者永远不会阻塞等待输入（因为如果他这样做了，作者也永远不会执行）。在容器可能运行很多自己的线程的 JEE 环境中使用此建议是否安全？ @Toskan 如果 new Thread 由于某种原因在您的容器中不合适，那么看看是否有可以使用的线程池。 @LaurenceGonsalves，如果您不关闭 PipedOutputStream，阅读器将无限阻塞，或者，如果编写器线程死亡，将引发异常，请参阅 ***.com/a/29725367。 PipedInputStream 也应该关闭，以确保如果（出于某种原因）写入器再次写入 PipedOutputStream，如果缓冲区已满，它将不会无限阻塞等待空间。【参考方案2】：

还有另一个名为 EasyStream 的开源库，它以透明的方式处理管道和线程。如果一切顺利，这并不复杂。问题出现时（查看 Laurence Gonsalves 示例）

class1.putDataOnOutputStream(out);

抛出异常。在该示例中，线程简单地完成并且异常丢失，而外部 InputStream 可能会被截断。

Easystream 处理异常传播和其他令人讨厌的问题，我已经调试了大约一年。（我是图书馆的管理员：显然我的解决方案是最好的；））这是一个如何使用它的示例：

final InputStreamFromOutputStream<String> isos = new InputStreamFromOutputStream<String>()
 @Override
 public String produce(final OutputStream dataSink) throws Exception 
   /*
    * call your application function who produces the data here
    * WARNING: we're in another thread here, so this method shouldn't 
    * write any class field or make assumptions on the state of the outer class. 
    */
   return produceMydata(dataSink)
 
;

还有一个很好的introduction，其中解释了将 OutputStream 转换为 InputStream 的所有其他方法。值得一看。

【讨论】：

使用他们的课程的教程可在code.google.com/p/io-tools/wiki/Tutorial_EasyStream获得【参考方案3】：

避免复制缓冲区的简单解决方案是创建一个专用的ByteArrayOutputStream：

public class CopyStream extends ByteArrayOutputStream 
    public CopyStream(int size)  super(size); 

    /**
     * Get an input stream based on the contents of this output stream.
     * Do not use the output stream after calling this method.
     * @return an @link InputStream
     */
    public InputStream toInputStream() 
        return new ByteArrayInputStream(this.buf, 0, this.count);

根据需要写入上述输出流，然后调用toInputStream 获取底层缓冲区上的输入流。将输出流视为在该点之后关闭。

【讨论】：

【参考方案4】：

我认为将 InputStream 连接到 OutputStream 的最佳方式是通过 管道流 - 在 java.io 包中可用，如下所示：

// 1- Define stream buffer
private static final int PIPE_BUFFER = 2048;

// 2 -Create PipedInputStream with the buffer
public PipedInputStream inPipe = new PipedInputStream(PIPE_BUFFER);

// 3 -Create PipedOutputStream and bound it to the PipedInputStream object
public PipedOutputStream outPipe = new PipedOutputStream(inPipe);

// 4- PipedOutputStream is an OutputStream, So you can write data to it
// in any way suitable to your data. for example:
while (Condition) 
     outPipe.write(mByte);


/*Congratulations:D. Step 4 will write data to the PipedOutputStream
which is bound to the PipedInputStream so after filling the buffer
this data is available in the inPipe Object. Start reading it to
clear the buffer to be filled again by the PipedInputStream object.*/

在我看来，这段代码有两个主要优点：

1 - 除了缓冲区，没有额外的内存消耗。

2 - 您无需手动处理数据排队

【讨论】：

这太棒了，但是javadocs 说如果你在同一个线程中读写这些，你可能会陷入僵局。我希望他们已经用 NIO 更新了这个！【参考方案5】：

我通常会尽量避免创建单独的线程，因为这样会增加死锁的机会，增加理解代码的难度，以及处理异常的问题。

这是我提出的解决方案：ProducerInputStream 通过重复调用produceChunk() 以块的形式创建内容：

public abstract class ProducerInputStream extends InputStream 

    private ByteArrayInputStream bin = new ByteArrayInputStream(new byte[0]);
    private ByteArrayOutputStream bout = new ByteArrayOutputStream();

    @Override
    public int read() throws IOException 
        int result = bin.read();
        while ((result == -1) && newChunk()) 
            result = bin.read();
        
        return result;
    

    @Override
    public int read(byte[] b, int off, int len) throws IOException 
        int result = bin.read(b, off, len);
        while ((result == -1) && newChunk()) 
            result = bin.read(b, off, len);
        
        return result;
    

    private boolean newChunk() 
        bout.reset();
        produceChunk(bout);
        bin = new ByteArrayInputStream(bout.toByteArray());
        return (bout.size() > 0);
    

    public abstract void produceChunk(OutputStream out);

【讨论】：

有趣的想法，但遗憾的是，这只有在您控制生成数据的代码时才有效。如果另一个 3rd 方库将 GB 的数据写入 OutputStream 而不返回控制权，那么您不妨将所有内容复制到内存中，这违背了此类的要点。

以上是关于从 OutputStream 创建 InputStream 的最有效方法的主要内容，如果未能解决你的问题，请参考以下文章