pdfminer获取每页的layout
Posted greenseer
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pdfminer获取每页的layout相关的知识,希望对你有一定的参考价值。
#! python2 # coding: utf-8 import sys from pdfminer import pdfparser from pdfminer import pdfdocument from pdfminer import pdfinterp from pdfminer import pdfpage from pdfminer import converter from pdfminer import layout with file(file_path, ‘rb‘) as fp: parser = pdfparser.PDFParser(fp) document = pdfdocument.PDFDocument(parser) if not document.is_extractable: raise pdfdocument.PDFTextExtractionNotAllowed rsrcmgr = pdfinterp.PDFResourceManager() laparams = layout.LAParams() device = converter.PDFPageAggregator(rsrcmgr, laparams=laparams) interpreter = pdfinterp.PDFPageInterpreter(rsrcmgr, device) pdf_pages = pdfpage.PDFPage.create_pages(document) for page in pdf_pages: interpreter.process_page(page) page_layout = device.get_result()
以上是关于pdfminer获取每页的layout的主要内容,如果未能解决你的问题,请参考以下文章