用python3解析xml文件
Posted
技术标签:
【中文标题】用python3解析xml文件【英文标题】:parse xml file with python3 【发布时间】:2022-01-11 21:51:38 【问题描述】:我根本不熟悉 xml 文件,但试图解析这个:
<?xml version="1.0" encoding="ISO-8859-1"?>
<modeling>
<generator>
<i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex parallel </i>
<i name="platform" type="string">LinuxIFC </i>
<i name="date" type="string">2019 07 11 </i>
<i name="time" type="string">11:56:12 </i>
</generator>
<incar>
<i type="int" name="ISTART"> 0</i>
<i type="string" name="PREC">accurate</i>
<i type="int" name="ISPIN"> 2</i>
<i type="int" name="NELMDL"> -8</i>
<i type="int" name="IBRION"> 2</i>
<i name="EDIFF"> 0.00001000</i>
<i name="EDIFFG"> -0.01000000</i>
<i type="int" name="NSW"> 200</i>
<i type="int" name="ISIF"> 2</i>
<i type="int" name="ISYM"> 2</i>
<i name="ENCUT"> 750.00000000</i>
<i name="POTIM"> 0.30000000</i>
</incar>
到目前为止,我已经设法编写代码以获取Elements
:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
tree = ET.parse("vasprun.xml")
root = tree.getroot()
for child in root:
print(x for x in root.findall(child.tag))
输出如下:
<Element 'generator' at 0x7f342220ca90>
<Element 'incar' at 0x7f342220cd10>
我正在尝试从incar
获取文件:
IStart=0
Prec=accurate
谁能帮我搞定这个?
【问题讨论】:
[n.get("name"): n.text.strip() for n in node for node in root]
【参考方案1】:
以下作品(XPath)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modeling>
<generator>
<i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex parallel</i>
<i name="platform" type="string">LinuxIFC</i>
<i name="date" type="string">2019 07 11</i>
<i name="time" type="string">11:56:12</i>
</generator>
<incar>
<i type="int" name="ISTART">0</i>
<i type="string" name="PREC">accurate</i>
<i type="int" name="ISPIN">2</i>
<i type="int" name="NELMDL">-8</i>
<i type="int" name="IBRION">2</i>
<i name="EDIFF">0.00001000</i>
<i name="EDIFFG">-0.01000000</i>
<i type="int" name="NSW">200</i>
<i type="int" name="ISIF">2</i>
<i type="int" name="ISYM">2</i>
<i name="ENCUT">750.00000000</i>
<i name="POTIM">0.30000000</i>
</incar>
</modeling>'''
root = ET.fromstring(xml)
names = ['ISTART','PREC']
for name in names:
i = root.find(f'.//i[@name="name"]')
print(i.text)
输出
0
accurate
【讨论】:
谢谢,但我不是那个意思。我正在尝试获取 incar 中的所有 name=value。虽然赞成。【参考方案2】:在附加缺少的最终标记 </modeling>
后,将示例 XML 添加到文件中
然后:
import xml.etree.ElementTree as ET
with open('vasprun.xml') as xml:
root = ET.fromstring(xml.read())
for name in ['ISTART', 'PREC']:
if (t := root.find(f'.//i[@name="name"]')) is not None:
print(f'name:t.text.strip()')
【讨论】:
【参考方案3】:如果存在关闭建模标签,您可以使用 XPath 来完成这项工作。
获取 ISTART 值的 xpath 是:
//incar/*[@name='ISTART']
获取 PREC 值的 xpath 是:
//incar/*[@name='PREC']
然后:
from lxml import etree
xml_doc = """
<?xml version="1.0" encoding="ISO-8859-1"?>
<modeling>
<generator>
<i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex parallel </i>
<i name="platform" type="string">LinuxIFC </i>
<i name="date" type="string">2019 07 11 </i>
<i name="time" type="string">11:56:12 </i>
</generator>
<incar>
<i type="int" name="ISTART"> 0</i>
<i type="string" name="PREC">accurate</i>
<i type="int" name="ISPIN"> 2</i>
<i type="int" name="NELMDL"> -8</i>
<i type="int" name="IBRION"> 2</i>
<i name="EDIFF"> 0.00001000</i>
<i name="EDIFFG"> -0.01000000</i>
<i type="int" name="NSW"> 200</i>
<i type="int" name="ISIF"> 2</i>
<i type="int" name="ISYM"> 2</i>
<i name="ENCUT"> 750.00000000</i>
<i name="POTIM"> 0.30000000</i>
</incar>
</modeling>
"""
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
xml_tree = etree.fromstring(xml_doc.encode(), parser=parser)
istart = xml_tree.xpath('//incar/*[@name="ISTART"]')
prec = xml_tree.xpath('//incar/*[@name="PREC"]')
print(f'ISTART=int(istart[0].text)')
print(f'Prec=prec[0].text')
【讨论】:
以上是关于用python3解析xml文件的主要内容,如果未能解决你的问题,请参考以下文章