python对于操作xml 忽略命名空间处理解析修改xml

Posted 2021-12-16 Recar

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python对于操作xml 忽略命名空间处理解析修改xml相关的知识，希望对你有一定的参考价值。

如果我们直接处理xml的话会遇到一些有命名空间的xml
如果正常要添加一个含有命名空间的节点需要这么创建

        header = Element(r'http://schemas.openxmlformats.org/wordprocessingml/2006/mainheaderReference',
            r"http://schemas.openxmlformats.org/wordprocessingml/2006/maintype": "first",
            r"http://schemas.openxmlformats.org/officeDocument/2006/relationshipsid": "rId6"
        )

这里 http://schemas.openxmlformats.org/wordprocessingml/2006/main 就是一个命名空间本来xml的是 <w:headerReference>
还有在判断的时候也是因为tag也是这样的 http://schemas.openxmlformats.org/wordprocessingml/2006/mainheaderReference

使用命名空间处理
如下写的时候就不需要那么麻烦了对这个et注册命名空间

from xml.etree.ElementTree import ElementTree,Element, register_namespace
register_namespace('w', "http://schemas.openxmlformats.org/wordprocessingml/2006/main")

还可以这样直接就返回了一个注册了命名空间的root

def xml_parse(xml_file):
    """
    Parse an XML file, returns a tree of nodes and a dict of namespaces
    :param xml_file: the input XML file
    :returns: (doc, ns_map)
    """
    root = None
    ns_map =  # prefix -> ns_uri
    for event, elem in ET.iterparse(xml_file, ['start-ns', 'start', 'end']):
        if event == 'start-ns':
            # elem = (prefix, ns_uri)
            ns_map[elem[0]] = elem[1]
        elif event == 'start':
            if root is None:
                root = elem
    for prefix, uri in ns_map.items():
        ET.register_namespace(prefix, uri)

    return (ET.ElementTree(root), ns_map)

对于复杂的xml
我们可以看一个复杂的xml

<?xml version="1.0" encoding="utf-8"?>

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se wp14">
  <w:body>
    <w:p w:rsidR="009A7561" w:rsidRDefault="00244BF9">
      <w:r>
        <w:rPr>
          <w:rFonts w:hint="eastAsia"/>
        </w:rPr>
        <w:t>1</w:t>
      </w:r>
      <w:r>
        <w:t>1111</w:t>
      </w:r>
      <w:bookmarkStart w:id="0" w:name="_GoBack"/>
      <w:bookmarkEnd w:id="0"/>
    </w:p>
    <w:sectPr w:rsidR="009A7561">
      <w:pgSz w:h="16838" w:w="11906"/>
      <w:pgMar w:bottom="1440" w:footer="992" w:gutter="0" w:header="851" w:left="1800" w:right="1800" w:top="1440"/>
      <w:cols w:space="425"/>
      <w:docGrid w:linePitch="312" w:type="lines"/>
      <w:titlePg/>
      <w:headerReference r:id="rId6" w:type="first"/>
    </w:sectPr>
  </w:body>
</w:document>

像这里这么多的命名空间的话不好处理
主要是有个坑是如果下面没有使用命名空间注册后也不会写入到xml里
这个时候这么处理 直接忽略命名空间的处理

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate
with DisableXmlNamespaces():
    tree = ET.parse(document_path)
    title_pg = Element('w:titlePg',)
    header = Element('w:headerReference',
        "w:type": "first",
        "r:id": "rId6"
    )

参考
https://stackoverflow.com/questions/8983041/saving-xml-files-using-elementtree
https://www.it-swarm.cn/zh/python/python-elementtree%E6%A8%A1%E5%9D%97%EF%BC%9A%E5%BD%93%E4%BD%BF%E7%94%A8find%E2%80%9D%EF%BC%8Cfindall%E2%80%9D%E6%96%B9%E6%B3%95%E6%97%B6%EF%BC%8C%E5%A6%82%E4%BD%95%E5%BF%BD%E7%95%A5xml%E6%96%87%E4%BB%B6%E7%9A%84%E5%91%BD%E5%90%8D%E7%A9%BA%E9%97%B4%E4%BB%A5%E6%89%BE%E5%88%B0%E5%8C%B9%E9%85%8D%E7%9A%84%E5%85%83%E7%B4%A0/1070619036/

以上是关于python对于操作xml 忽略命名空间处理解析修改xml的主要内容，如果未能解决你的问题，请参考以下文章

python对于操作xml 忽略命名空间 处理解析修改xml

python对于操作xml 忽略命名空间处理解析修改xml