读取word文档,暂时不能读取图片,案例代码

Posted koaler

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了读取word文档,暂时不能读取图片,案例代码相关的知识,希望对你有一定的参考价值。

import org.apache.poi.POIXMLDocument;
import org.apache.poi.POIXMLTextExtractor;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class Wordfile {

    public static void main(String[] args) throws Exception {
        String path = "G:\样题.doc";
        String context = readWord(path);
        System.out.println(context);
    }


    public static String readWord(String path) {
        InputStream is = null;
        String content = "";
        String suffix = path.substring(path.lastIndexOf(".") + 1);
        try {
            if (suffix.equals("doc")) {
                 // word 2003: 图片不会被读取
                is = new FileInputStream(new File(path));
                WordExtractor ex = new WordExtractor(is);// is是WORD文件的InputStream
                content = ex.getText().trim()
            } else if (suffix.equals("docx")) {
               // word 2007 图片不会被读取, 表格中的数据会被放在字符串的最后
                OPCPackage opcPackage = POIXMLDocument.openPackage(path);
                POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
                content = (extractor).getText().trim();
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return content;
    }
}
  <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>



        <!--2010年EXCEL需要的包-->
        <dependency>
            <groupId>dom4j</groupId>
            <artifactId>dom4j</artifactId>
            <version>1.6.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-schemas</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.xmlbeans</groupId>
            <artifactId>xmlbeans</artifactId>
            <version>2.6.0</version>
        </dependency>

  

以上是关于读取word文档,暂时不能读取图片,案例代码的主要内容,如果未能解决你的问题,请参考以下文章

阅读Microsoft Word文档时出现Android Apache POI错误:org.apache.xmlbeans.SchemaTypeLoaderException无法解析句柄的类型(代码片

Java读取word文件,字体,颜色

php 怎么实现读取word文档内容,显示到html上面?能给个案例最好了,谢谢!

java poi读取pdf word excel文档,读取pdf文字图片

java poi读取pdf word excel文档,读取pdf文字图片

java poi读取pdf word excel文档,读取pdf文字图片