python-docx-lxml.etree.XMLSyntaxError:AttValue长度太长
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python-docx-lxml.etree.XMLSyntaxError:AttValue长度太长相关的知识,希望对你有一定的参考价值。
[我正在编写程序来检查一堆.docx文件中是否存在单词(我们正在谈论大约2500个.docx文件。
这是代码中多汁的部分:
for filename in directorylist:
if filename.endswith(".docx"):
i = Document(filename)
print(filename)
for destination in destinationlist:
for paragraph in i.paragraphs:
if destination in paragraph.text:
destinationcount[destination] = 1
break
else:
destinationcount[destination] = 0
continue
for destination in destinationcount:
destinationcountnobool[destination] += destinationcount[destination]
else:
continue
现在,我知道您在想什么,一般来说,这是一堆令人讨厌的循环和糟糕的编程,但这是快速而又肮脏的工作,请多多指教。
这是我得到的错误:
Traceback (most recent call last):
File "ICrunchMeSomeFiles.py", line 27, in <module>
i = Document(filename)
File "C:UsersUserAnaconda3libsite-packagesdocxapi.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "C:UsersUserAnaconda3libsite-packagesdocxopcpackage.py", line 130, in open
Unmarshaller.unmarshal(pkg_reader, package, PartFactory)
File "C:UsersUserAnaconda3libsite-packagesdocxopcpackage.py", line 199, in unmarshal
pkg_reader, package, part_factory
File "C:UsersUserAnaconda3libsite-packagesdocxopcpackage.py", line 216, in _unmarshal_parts
partname, content_type, reltype, blob, package
File "C:UsersUserAnaconda3libsite-packagesdocxopcpart.py", line 191, in __new__
return PartClass.load(partname, content_type, blob, package)
File "C:UsersUserAnaconda3libsite-packagesdocxopcpart.py", line 231, in load
element = parse_xml(blob)
File "C:UsersUserAnaconda3libsite-packagesdocxoxml\__init__.py", line 28, in parse_xml
root_element = etree.fromstring(xml, oxml_parser)
File "srclxmletree.pyx", line 3236, in lxml.etree.fromstring
File "srclxmlparser.pxi", line 1876, in lxml.etree._parseMemoryDocument
File "srclxmlparser.pxi", line 1764, in lxml.etree._parseDoc
File "srclxmlparser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
File "srclxmlparser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "srclxmlparser.pxi", line 711, in lxml.etree._handleParseResult
File "srclxmlparser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 2
lxml.etree.XMLSyntaxError: AttValue length too long, line 2, column 11011745
该程序适用于较小的样本,所以我认为这是内存问题。帮助将不胜感激
答案
我认为可以解决您的问题:
以上是关于python-docx-lxml.etree.XMLSyntaxError:AttValue长度太长的主要内容,如果未能解决你的问题,请参考以下文章