使用Docx4j和PdfBox将Docx转换为图像会导致OutOfMemoryError

Question

[我正在使用dox4j和pdfbox分两步将docx文件的第一页转换为图像，但是我每次都得到OutOfMemoryError。

我已经能够确定在调用convertToImage方法的过程的最后一步抛出了异常，但是我一直在使用该方法的第二步来将pdf转换为某些现在没有问题了，所以我可能不知道原因是什么，除非dox4j编码pdf是我尚未测试或损坏的方式。

[我尝试用ByteArrayOutputStream替换FileOutputStream，并且pdf似乎可以正确呈现，但没有比我期望的大。

这是我正在使用的代码：

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);

((org.docx4j.convert.out.pdf.viaXSLFO.Conversion)c).setSaveFO(File.createTempFile("fonts", ".fo"));
ByteArrayOutputStream os = new ByteArrayOutputStream();
c.output(os, new PdfSettings());

byte[] bytes = os.toByteArray();
os.close();

ByteArrayInputStream is = new ByteArrayInputStream(bytes);

PDDocument document = PDDocument.load(is);

PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 96);

is.close();
document.close();

编辑为了在这种情况下提供更多的上下文，此代码正在grails Web应用程序中运行。我已经尝试了此代码的几种不同变体，包括使不再需要的所有内容无效，使用FileInputStream和FileOutputStream尝试节省更多的物理内存，并检查docx4j和pdfbox的输出，它们似乎都可以正常工作。

我正在使用docx4j 2.8.1和pdfbox 0.7.3，我也尝试了pdf-renderer，但仍然收到OutOfMemoryError。我的怀疑是docx4j使用了过多的内存，但是直到pdf到图像的转换才产生错误。

[我很乐意将docx文件转换为pdf或直接转换为图像作为答案的另一种方法，但是我目前正在尝试替换在服务器上运行有问题的jodconverter。

Answer 1

另一答案