Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程

Posted 洛阳泰山

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程相关的知识,希望对你有一定的参考价值。

从 Maven 下载 Aspose.PDF

通过将以下配置添加到 pom.xml, 您可以直接从基于Maven的项目 轻松地使用Aspose.PDF for Java 。

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-pdf</artifactId>
    <version>22.4</version>
</dependency>

 核心代码实现(单类)

import com.aspose.pdf.Document;
import com.aspose.pdf.SaveFormat;
import com.aspose.pdf.TextAbsorber;
import com.aspose.pdf.devices.PngDevice;
import com.aspose.pdf.devices.Resolution;

import java.io.*;

public class PDFHelper3 

    public static void main(String[] args) throws IOException 
        pdf2txt("C:\\\\Users\\\\liuya\\\\Desktop\\\\pdf\\\\示例文件.pdf");
    


    //转word
    public static void pdf2word(String pdfPath) 
        long old = System.currentTimeMillis();
        try 
            String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".docx";
            FileOutputStream os = new FileOutputStream(wordPath);
            Document doc = new Document(pdfPath);
            doc.save(os, SaveFormat.DocX);
            os.close();
            //转化用时
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 Word 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (Exception e) 
            System.out.println("Pdf 转 Word 失败...");
            e.printStackTrace();
        
    

    //转ppt
    public static void pdf2ppt(String pdfPath) 
        long old = System.currentTimeMillis();
        try 
            //新建一个word文档
            String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".pptx";
            FileOutputStream os = new FileOutputStream(wordPath);
            //doc是将要被转化的word文档
            Document doc = new Document(pdfPath);
            //全面支持DOC, DOCX, OOXML, RTF html, OpenDocument, PDF, EPUB, XPS, SWF 相互转换
            doc.save(os, SaveFormat.Pptx);
            os.close();
            //转化用时
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 PPT 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (Exception e) 
            System.out.println("Pdf 转 PPT 失败...");
            e.printStackTrace();
        
    

    //转excel
    public static void pdf2excel(String pdfPath) 
        long old = System.currentTimeMillis();
        try 
            String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".xlsx";
            FileOutputStream os = new FileOutputStream(wordPath);
            Document doc = new Document(pdfPath);
            doc.save(os, SaveFormat.Excel);
            os.close();
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 EXCEL 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (Exception e) 
            System.out.println("Pdf 转 EXCEL 失败...");
            e.printStackTrace();
        
    

    //转html
    public static void pdf2Html(String pdfPath) 
        long old = System.currentTimeMillis();
        try 
            String htmlPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".html";
            Document doc = new Document(pdfPath);
            doc.save(htmlPath, SaveFormat.Html);
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 HTML 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (Exception e) 
            System.out.println("Pdf 转 HTML 失败...");
            e.printStackTrace();
        
    

    //转图片
    public static void pdf2image(String pdfPath) 
        long old = System.currentTimeMillis();
        try 
            Resolution resolution = new Resolution(300);
            String dataDir = pdfPath.substring(0, pdfPath.lastIndexOf("."));
            File imageDir = new File(dataDir + "_images");
            if (!imageDir.exists()) 
                imageDir.mkdirs();
            
            Document doc = new Document(pdfPath);
            PngDevice pngDevice = new PngDevice(resolution);
            for (int pageCount = 1; pageCount <= doc.getPages().size(); pageCount++) 
                OutputStream imageStream = new FileOutputStream(imageDir + "/" + pageCount + ".png");
                pngDevice.process(doc.getPages().get_Item(pageCount), imageStream);
                imageStream.close();
            
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 PNG 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (Exception e) 
            System.out.println("Pdf 转 PNG 失败...");
            e.printStackTrace();
        
    

    //转txt
    public static void pdf2txt(String pdfPath) 
        long old = System.currentTimeMillis();
        Document pdfDocument = new Document(pdfPath);
        TextAbsorber ta = new TextAbsorber();
        ta.visit(pdfDocument);
        String txtPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".txt";
        try 
            BufferedWriter writer = new BufferedWriter(new FileWriter(txtPath));
            writer.write(ta.getText());
            writer.close();
            long now = System.currentTimeMillis();
            System.out.println("Pdf 转 TXT 共耗时:" + ((now - old) / 1000.0) + "秒");
         catch (IOException e) 
            System.out.println("Pdf 转 TXT 失败...");
            e.printStackTrace();
        
    


运行方法,idea里右键运行,如果要做成web系统可以将代码封装程web服务,调用方法就行。

 转换文件结果

 以一个十四的pdf文件转化为例,大部分转换时间在10-12s,只有转ppt花费的时间久一点需要20s.可能pdf里面不是表格类的内容,所以转换excel文件后,样式差别会有点大,其他文件转换后样式和之前是保持一样的。

以上是关于Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程的主要内容,如果未能解决你的问题,请参考以下文章

百度文库的实现——java利用openoffice,word转pdf

Java 实现pdf转HTML | WORD | EXCEL | PPT | PNG 代码教程

Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程

java实现word转pdf在线预览(前端使用PDF.js;后端使用openofficeaspose)

JAVA实现无损word转pdf文件完整代码教程

JAVA实现无损word转pdf文件完整代码教程