在 Google App Engine 上解压缩 Java 中的大 blob

Posted 2023-03-06

技术标签:

【中文标题】在 Google App Engine 上解压缩 Java 中的大 blob【英文标题】：Decompressing a large blob in java on Google App Engine 【发布时间】：2011-09-21 02:12:26 【问题描述】：

我正在用 Java (JDO) 在 Google App Engine 上构建一些东西。我正在使用 Deflater 以编程方式压缩一个大字节 []，然后将压缩的字节 [] 存储在 blobstore 中。这很好用：

 public class Functions 

public static byte[] compress(byte[] input) throws UnsupportedEncodingException, IOException, MessagingException
    

        Deflater df = new Deflater();       //this function mainly generate the byte code
        df.setLevel(Deflater.BEST_COMPRESSION);
        df.setInput(input);

        ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);   //we write the generated byte code in this array
        df.finish();
        byte[] buff = new byte[1024];   //segment segment pop....segment set 1024
        while(!df.finished())
        
            int count = df.deflate(buff);       //returns the generated code... index
            baos.write(buff, 0, count);     //write 4m 0 to count
        
        baos.close();

        int baosLength = baos.toByteArray().length;
        int inputLength = input.length;
        //System.out.println("Original: "+inputLength);
        // System.out.println("Compressed: "+ baosLength);

        return baos.toByteArray();

    

 public static byte[] decompress(byte[] input) throws UnsupportedEncodingException, IOException, DataFormatException
    

        Inflater decompressor = new Inflater();
        decompressor.setInput(input);

        // Create an expandable byte array to hold the decompressed data
        ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);

        // Decompress the data
        byte[] buf = new byte[1024];
        while (!decompressor.finished()) 
            try 
                int count = decompressor.inflate(buf);
                bos.write(buf, 0, count);
             catch (DataFormatException e) 
            
        
        try 
            bos.close();
         catch (IOException e) 
        

        // Get the decompressed data
        byte[] decompressedData = bos.toByteArray();

        return decompressedData;


    

 public static BlobKey putInBlobStore(String contentType, byte[] filebytes) throws IOException 

        // Get a file service
          FileService fileService = FileServiceFactory.getFileService();


          AppEngineFile file = fileService.createNewBlobFile(contentType);

          // Open a channel to write to it
          boolean lock = true;
          FileWriteChannel writeChannel = fileService.openWriteChannel(file, lock);

          // This time we write to the channel using standard Java
          BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(filebytes));
          byte[] buffer;
          int defaultBufferSize = 524288;
          if(filebytes.length > defaultBufferSize)
              buffer = new byte[defaultBufferSize]; // 0.5 MB buffers
          
          else
              buffer = new byte[filebytes.length]; // buffer the size of the data
          

            int read;
            while( (read = in.read(buffer)) > 0 ) //-1 means EndOfStream
                System.out.println(read);
                if(read < defaultBufferSize)
                    buffer = new byte[read];
                
                ByteBuffer bb = ByteBuffer.wrap(buffer);
                writeChannel.write(bb);
            
            writeChannel.closeFinally();

        return fileService.getBlobKey(file);

在我的 Functions 类中使用静态 compress() 和 putInBlobStore() 函数，我可以像这样压缩和存储一个 byte[]：

BlobKey dataBlobKey =  Functions.putInBlobStore("MULTIPART_FORM_DATA", Functions.compress(orginalDataByteArray));

很甜。我真的在挖掘 GAE。

但是现在，问题来了：

我正在存储压缩的 html，我想即时检索和解压缩以显示在 JSP 页面内的 iframe 中。压缩很快，但解压需要永远！即使压缩的 HTML 为 15k，有时解压缩也会死掉。

这是我的减压方法：

 URL file = new URL("/blobserve?key=" + htmlBlobKey);
         URLConnection conn = file.openConnection();
         conn.setReadTimeout(30000);
         conn.setConnectTimeout(30000);
         InputStream inputStream = conn.getInputStream();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(data));

关于如何最好地从 blobstore 获取压缩的 HTML、解压缩并显示它有什么想法吗？即使我需要将它传递给任务队列并在显示进度条的同时轮询完成 - 那也很好。我真的不在乎，只要它高效且最终有效。如果您能在此处与我分享任何指导，我将不胜感激。

感谢您的帮助。

【问题讨论】：

延迟肯定是在解压吗？您是否检查过仅输出压缩的检索数据以查看它是否同样慢？为什么要从自己那里获取 blob？为什么不直接使用 blob 读取 API？另外，考虑到它会带来额外的延迟，为什么要将数据压缩存储在 blobstore 中？仍在努力实施 Sasha 的以下建议，但对于 Nick 的问题，它是一个会变得庞大的档案（每个客户 20TB 或更多），客户可能每月只能访问 5 到 10 项法律证词.所以我很乐意牺牲速度来换取存储大小。尼克，我在 python 文档中看到了 blobReader 对象，但是 java 的等价物是什么？谢谢 ...看起来 BlobstoreInputStream 是等价的。我也会看看这个。 【参考方案1】：

您可以查看运行异步的 RequestBuilder

RequestBuilder requestBuilder = new RequestBuilder(RequestBuilder.GET,"/blobserve?key=" + htmlBlobKey);
try 
requestBuilder.sendRequest(null, new RequestCallback() 
public void onError(Request request, Throwable exception) 
  GWT.log(exception.getMessage());

public void onResponseReceived(Request request, Response response) 
  doSomething(response.getText());//here update your iframe and stop progress indicator

);
 catch (RequestException ex) 
  GWT.log(ex.getMessage());

【讨论】：

非常酷。明天我会试一试，并用我的发现更新帖子。谢谢萨沙！【参考方案2】：

我采纳了 Nick Johnson 的想法，并直接从 Blobstore 中阅读，而不是为 Blob 提供服务。现在快如闪电了！代码如下：

try
        ChainedBlobstoreInputStream inputStream = new ChainedBlobstoreInputStream(this.getHtmlBlobKey());
        //StringWriter writer = new StringWriter();
         byte[] data = IOUtils.toByteArray(inputStream);
         return new String(Functions.decompress(Encrypt.AESDecrypt(data)));
         //return new String(data);
     
    catch(Exception e)
            return "No HTML Version";

我从这里得到了 ChainedBlobstoreInputStream 类： Reading a BlobstoreInputStream >= 1MB in size

【讨论】：

以上是关于在 Google App Engine 上解压缩 Java 中的大 blob的主要内容，如果未能解决你的问题，请参考以下文章