Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程
Posted 洛阳泰山
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程相关的知识,希望对你有一定的参考价值。
从 Maven 下载 Aspose.PDF
通过将以下配置添加到 pom.xml, 您可以直接从基于Maven的项目 轻松地使用Aspose.PDF for Java 。
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>22.4</version>
</dependency>
核心代码实现(单类)
import com.aspose.pdf.Document;
import com.aspose.pdf.SaveFormat;
import com.aspose.pdf.TextAbsorber;
import com.aspose.pdf.devices.PngDevice;
import com.aspose.pdf.devices.Resolution;
import java.io.*;
public class PDFHelper3
public static void main(String[] args) throws IOException
pdf2txt("C:\\\\Users\\\\liuya\\\\Desktop\\\\pdf\\\\示例文件.pdf");
//转word
public static void pdf2word(String pdfPath)
long old = System.currentTimeMillis();
try
String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".docx";
FileOutputStream os = new FileOutputStream(wordPath);
Document doc = new Document(pdfPath);
doc.save(os, SaveFormat.DocX);
os.close();
//转化用时
long now = System.currentTimeMillis();
System.out.println("Pdf 转 Word 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (Exception e)
System.out.println("Pdf 转 Word 失败...");
e.printStackTrace();
//转ppt
public static void pdf2ppt(String pdfPath)
long old = System.currentTimeMillis();
try
//新建一个word文档
String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".pptx";
FileOutputStream os = new FileOutputStream(wordPath);
//doc是将要被转化的word文档
Document doc = new Document(pdfPath);
//全面支持DOC, DOCX, OOXML, RTF html, OpenDocument, PDF, EPUB, XPS, SWF 相互转换
doc.save(os, SaveFormat.Pptx);
os.close();
//转化用时
long now = System.currentTimeMillis();
System.out.println("Pdf 转 PPT 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (Exception e)
System.out.println("Pdf 转 PPT 失败...");
e.printStackTrace();
//转excel
public static void pdf2excel(String pdfPath)
long old = System.currentTimeMillis();
try
String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".xlsx";
FileOutputStream os = new FileOutputStream(wordPath);
Document doc = new Document(pdfPath);
doc.save(os, SaveFormat.Excel);
os.close();
long now = System.currentTimeMillis();
System.out.println("Pdf 转 EXCEL 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (Exception e)
System.out.println("Pdf 转 EXCEL 失败...");
e.printStackTrace();
//转html
public static void pdf2Html(String pdfPath)
long old = System.currentTimeMillis();
try
String htmlPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".html";
Document doc = new Document(pdfPath);
doc.save(htmlPath, SaveFormat.Html);
long now = System.currentTimeMillis();
System.out.println("Pdf 转 HTML 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (Exception e)
System.out.println("Pdf 转 HTML 失败...");
e.printStackTrace();
//转图片
public static void pdf2image(String pdfPath)
long old = System.currentTimeMillis();
try
Resolution resolution = new Resolution(300);
String dataDir = pdfPath.substring(0, pdfPath.lastIndexOf("."));
File imageDir = new File(dataDir + "_images");
if (!imageDir.exists())
imageDir.mkdirs();
Document doc = new Document(pdfPath);
PngDevice pngDevice = new PngDevice(resolution);
for (int pageCount = 1; pageCount <= doc.getPages().size(); pageCount++)
OutputStream imageStream = new FileOutputStream(imageDir + "/" + pageCount + ".png");
pngDevice.process(doc.getPages().get_Item(pageCount), imageStream);
imageStream.close();
long now = System.currentTimeMillis();
System.out.println("Pdf 转 PNG 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (Exception e)
System.out.println("Pdf 转 PNG 失败...");
e.printStackTrace();
//转txt
public static void pdf2txt(String pdfPath)
long old = System.currentTimeMillis();
Document pdfDocument = new Document(pdfPath);
TextAbsorber ta = new TextAbsorber();
ta.visit(pdfDocument);
String txtPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".txt";
try
BufferedWriter writer = new BufferedWriter(new FileWriter(txtPath));
writer.write(ta.getText());
writer.close();
long now = System.currentTimeMillis();
System.out.println("Pdf 转 TXT 共耗时:" + ((now - old) / 1000.0) + "秒");
catch (IOException e)
System.out.println("Pdf 转 TXT 失败...");
e.printStackTrace();
运行方法,idea里右键运行,如果要做成web系统可以将代码封装程web服务,调用方法就行。
转换文件结果
以一个十四的pdf文件转化为例,大部分转换时间在10-12s,只有转ppt花费的时间久一点需要20s.可能pdf里面不是表格类的内容,所以转换excel文件后,样式差别会有点大,其他文件转换后样式和之前是保持一样的。
以上是关于Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程的主要内容,如果未能解决你的问题,请参考以下文章
百度文库的实现——java利用openoffice,word转pdf
Java 实现pdf转HTML | WORD | EXCEL | PPT | PNG 代码教程
Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程