当我使用 GZIPOutputStream 将文件发布到 servlet 时文件损坏

Posted 2023-03-06

技术标签:

【中文标题】当我使用 GZIPOutputStream 将文件发布到 servlet 时文件损坏【英文标题】：file corrupted when I post it to the servlet using GZIPOutputStream 【发布时间】：2013-09-17 18:29:59 【问题描述】：

我尝试修改@BalusC优秀教程here发送gzip压缩文件。这是一个有效的 java 类：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.zip.GZIPOutputStream;

public final class NetworkService 

    // *** EDIT THOSE AS APROPRIATE
    private static final String FILENAME = "C:/Dropbox/TMP.txt";
    private static final String URL =
            "http://192.168.1.64:8080/DataCollectionServlet/";
    // *** END EDIT
    private static final CharSequence CRLF = "\r\n";
    private static boolean isServerGzip = true; // ***
    private static String charsetForMultipartHeaders = "UTF-8";

    public static void main(String[] args) 
        HttpURLConnection connection = null;
        OutputStream serverOutputStream = null;
        try 
            File file = new File(FILENAME);
            final String boundary = Long
                    .toHexString(System.currentTimeMillis());
            connection = connection(true, boundary);
            serverOutputStream = connection.getOutputStream();
            try 
                flushMultiPartData(file, serverOutputStream, boundary);
             catch (IOException e) 
            System.out.println(connection.getResponseCode()); // 200
         catch (IOException e) 
            // Network unreachable : not connected
            // No route to host : probably on an encrypted network
            // Connection timed out : Server DOWN
         finally 
            if (connection != null) connection.disconnect();
        
    

    private static HttpURLConnection connection(boolean isMultiPart,
            String boundary) throws MalformedURLException, IOException 
        HttpURLConnection connection = (HttpURLConnection) new URL(URL)
                .openConnection();
        connection.setDoOutput(true); // triggers POST
        connection.setUseCaches(false); // *** no difference
        connection.setRequestProperty("Connection", "Keep-Alive");
        connection.setRequestProperty("User-Agent",
            "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) "
                + "Gecko/20100401"); // *** tried others no difference
        connection.setChunkedStreamingMode(1024); // *** no difference
        if (isMultiPart) 
            if (boundary == null || "".equals(boundary.trim()))
                throw new IllegalArgumentException("Boundary can't be "
                    + ((boundary == null) ? "null" : "empty"));
            connection.setRequestProperty("Content-Type",
                "multipart/form-data; boundary=" + boundary);
        
        return connection;
    

    // =========================================================================
    // Multipart
    // =========================================================================
    private static void flushMultiPartData(File file,
            OutputStream serverOutputStream, String boundary)
            throws IOException 
        PrintWriter writer = null;
        try 
            // true = autoFlush, important!
            writer = new PrintWriter(new OutputStreamWriter(serverOutputStream,
                    charsetForMultipartHeaders), true);
            appendBinary(file, boundary, writer, serverOutputStream);
            // End of multipart/form-data.
            writer.append("--" + boundary + "--").append(CRLF);
         finally 
            if (writer != null) writer.close();
        
    

    private static void appendBinary(File file, String boundary,
            PrintWriter writer, OutputStream output)
            throws FileNotFoundException, IOException 
        // Send binary file.
        writer.append("--" + boundary).append(CRLF);
        writer.append(
            "Content-Disposition: form-data; name=\"binaryFile\"; filename=\""
                + file.getName() + "\"").append(CRLF);
        writer.append(
            "Content-Type: " // ***
                + ((isServerGzip) ? "application/gzip" : URLConnection
                        .guessContentTypeFromName(file.getName())))
                .append(CRLF);
        writer.append("Content-Transfer-Encoding: binary").append(CRLF);
        writer.append(CRLF).flush();
        InputStream input = null;
        OutputStream output2 = output;
        if (isServerGzip) 
            output2 = new GZIPOutputStream(output);
        
        try 
            input = new FileInputStream(file);
            byte[] buffer = new byte[1024]; // *** tweaked, no difference
            for (int length = 0; (length = input.read(buffer)) > 0;) 
                output2.write(buffer, 0, length);
            
            output2.flush(); // Important! Output cannot be closed. Close of
            // writer will close output as well.
         finally 
            if (input != null) try 
                input.close();
             catch (IOException logOrIgnore) 
        
        writer.append(CRLF).flush(); // CRLF is important! It indicates end of
        // binary boundary.

您必须编辑FILENAME 和URL 字段并在URL 中设置一个servlet - 它的doPost() 方法是：

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
        throws ServletException, IOException 
    Collection<Part> parts = req.getParts();
    for (Part part : parts) 
        File save = new File(uploadsDirName, getFilename(part) + "_"
            + System.currentTimeMillis() + ".zip");
        final String absolutePath = save.getAbsolutePath();
        log.debug(absolutePath);
        part.write(absolutePath);
        sc.getRequestDispatcher(DATA_COLLECTION_JSP).forward(req, resp);

现在，当 isServerGzip 字段设置为 true 时，FILENAME 被正确压缩并发送到服务器，但是当我尝试提取它时它已损坏（我在 Windows 上使用 7z，它打开 gzip 文件作为存档但是当我尝试提取文件 inside 它说它已损坏的 gzip 存档 - 尽管它确实提取了（确实已损坏的）文件）。尝试了各种文件 - 较大的文件最终损坏，较小的文件提取为空 - 存档中较大文件的报告大小比实际大小大得多，而较小的文件为 0。我标记了那些需要注意// ***。我可能会错过一些连接配置，或者我压缩流的方式可能完全错误或者......？尝试调整连接属性、缓冲区、缓存等无济于事

【问题讨论】：

Encoding issues ? @SotiriosDelimanolis：是的 - 你必须在最近的 servlet 容器中 - 我在 tomcat 7.0.32 - getParts() 有它的 share of bugs。我得到了文件并保存在我的文件系统中，它只是损坏了我忘记了@MultipartConfig。 【参考方案1】：

你需要打电话

((GZIPOutputStream)output2).finish();

冲洗前。请参阅 javadoc here。它指出

在不关闭的情况下完成将压缩数据写入输出流底层流。应用多个过滤器时使用此方法连续输出相同的流。

你正在做什么。所以

for (int length = 0; (length = input.read(buffer)) > 0;) 
    output2.write(buffer, 0, length);

((GZIPOutputStream)output2).finish(); //Write the compressed parts
// obviously make sure output2 is truly GZIPOutputStream
output2.flush(); //

关于对同一个输出流连续应用多个过滤器，我是这样理解的：

您有一个到 HTTP 服务器的OutputStream，即一个套接字连接。 HttpUrlConnection 写入标题，然后您直接写入正文。在这种情况下（多部分），您将边界和标头作为解压缩字节发送，压缩文件内容，然后再发送边界。所以流最终看起来像这样：

                            start writing with GZIPOutputStream
                                          v
    |---boundary---|---the part headers---|---gzip encoded file content bytes---|---boundary---|
    ^                                                                           ^
write directly with PrintWriter                                      use PrintWriter again

因此，您可以看到如何使用不同的过滤器连续编写不同的部分。将PrintWriter 视为未经过滤的过滤器，您给它的任何内容都是直接写入的。 GZIPOutputStream 是一个 gzip 过滤器，它对给定的字节进行编码（gzip）。

至于源代码，查看你的 Java JDK 安装，你应该有一个 src.zip 文件，其中包含公共源代码，java.lang*、java.util.*、java.io.*、javax.* 等。

【讨论】：

现在将对其进行测试-同时您能否详细说明applying multiple filters？我在哪里做呢？打电话给finish() 之前 flush() 也很重要吗？你有一个OutputStream，连接的。我认为 javadoc 的那部分意味着您不仅要编写 gzip 的字节，而且还要在 gzip 的文件内容之前和之后编写未编码的字节。我不确定flush() 订单。我还没看源码呢。好吧，似乎顺序无关紧要（实际上，当包装的流（输出）关闭时会执行刷新 - 但比抱歉更安全）。接受 - 如果您能详细说明“过滤器”部分，我们将不胜感激；）（源代码链接） @Mr_and_Mrs_D 查看我对那个 javadoc 的最后一次编辑。您可能会感兴趣：developer.android.com/reference/java/util/zip/…

以上是关于当我使用 GZIPOutputStream 将文件发布到 servlet 时文件损坏的主要内容，如果未能解决你的问题，请参考以下文章

使用 GZIPOutputStream 压缩字符串

java gzipoutputstream压缩比例大概是多少

Java压缩文件以及解压文件

170814Java使用gzip压缩文件还原文件

tar.gz 压缩不适用于 GIF 文件

java程序压缩和解压zip文件