如何解析简单的xml文档?
Posted richardo-m-q
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何解析简单的xml文档?相关的知识,希望对你有一定的参考价值。
需求:
xml是一种十分常用的标记性语言,可提供统一的方法来描述应用程序的结构化数据:
由字母和数字组成,不能包含空格
#由字母和数字组成,不能包含空格
python中如何解析xml文件?
思路:
使用标准库中的xml.etree.ElementTree,其中的parse函数可以解析xml文档
代码:
kvm.xml:
<domain type=‘kvm‘>
<name>centos_x86_6.4</name>
#由字母和数字组成,不能包含空格
<uuid>b9dcdd92-9b9b-14d6-3938-1982a9746a12</uuid>
<memory unit=‘KiB‘>2097152</memory>
#由字母和数字组成,不能包含空格
<currentMemory unit=‘KiB‘>2097152</currentMemory>
<vcpu placement=‘static‘>1</vcpu>
<os>
<type arch=‘x86_64‘ machine=‘pc-1.2‘>hvm</type>
#type 表示全虚拟化还是半虚拟化,hvm表示全虚拟化
<boot dev=‘hd‘/>
#boot 怎么启动的,如"fd"表示从文件启动, "hd"从硬盘启动, "cdrom"从光驱启动 和 "network"从网络启动 #可以重复多行,指定不同的值,作为一个启动设备列表。 #The dev attribute takes one of the values "fd", "hd", "cdrom" or "network"
</os>
#处理器特性
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset=‘localtime‘>
<timer name=‘pit‘ tickpolicy=‘delay‘/>
<timer name=‘rtc‘ tickpolicy=‘catchup‘/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
#Guest需要的设备
<emulator>/bin/qemu-kvm</emulator>
<disk type=‘file‘ device=‘disk‘>
<driver name=‘qemu‘ type=‘qcow2‘/>
#目的镜像路径 在这个例子中,在guest中显示为IDE设备。
<source file=‘/home/template_make/centos_x86_6.4.img‘>
<seclabel model=‘selinux‘ relabel=‘no‘/>
</source>
<target dev=‘hda‘ bus=‘ide‘/>
<alias name=‘ide0-0-0‘/>
<address type=‘drive‘ controller=‘0‘ bus=‘0‘ target=‘0‘ unit=‘0‘/>
</disk>
<disk type=‘file‘ device=‘cdrom‘>
<driver name=‘qemu‘ type=‘raw‘/>
<source file=‘/home/template_make/CentOS-6.4-x86_64-bin-DVD1.iso‘/>
<target dev=‘hdc‘ bus=‘ide‘/>
<readonly/>
<alias name=‘ide0-1-0‘/>
<address type=‘drive‘ controller=‘0‘ bus=‘1‘ target=‘0‘ unit=‘0‘/>
</disk>
<controller type=‘usb‘ index=‘0‘>
<alias name=‘usb0‘/>
<address type=‘pci‘ domain=‘0x0000‘ bus=‘0x00‘ slot=‘0x01‘ function=‘0x2‘/>
</controller>
<controller type=‘ide‘ index=‘0‘>
<alias name=‘ide0‘/>
<address type=‘pci‘ domain=‘0x0000‘ bus=‘0x00‘ slot=‘0x01‘ function=‘0x1‘/>
</controller>
<interface type=‘bridge‘>
#虚拟机网络连接方式
<mac address=‘52:54:00:78:f9:5a‘/>
<source bridge=‘br0‘/>
<target dev=‘vnet27‘/>
## 使用virtio: 采用普通的驱动,即硬盘和网卡都采用默认配置情况下,硬盘是 ide 模式, 而网卡工作在 模拟的rtl 8139 网卡下,速度为100M 全双工。 采用 virtio 驱动后,网卡工作在 1000M 的模式下,硬盘工作是SCSI模式下
<model type=‘virtio‘/>
<alias name=‘net0‘/>
<address type=‘pci‘ domain=‘0x0000‘ bus=‘0x00‘ slot=‘0x03‘ function=‘0x0‘/>
</interface>
<input type=‘mouse‘ bus=‘ps2‘/>
#vnc方式登录,端口号自动分配 可以通过virsh vncdisplay来查询[vncdisplay domainId]
<graphics type=‘vnc‘ port=‘5915‘ autoport=‘yes‘ listen=‘0.0.0.0‘>
<listen type=‘address‘ address=‘0.0.0.0‘/>
</graphics>
<video>
<model type=‘cirrus‘ vram=‘9216‘ heads=‘1‘/>
<alias name=‘video0‘/>
<address type=‘pci‘ domain=‘0x0000‘ bus=‘0x00‘ slot=‘0x02‘ function=‘0x0‘/>
</video>
<memballoon model=‘virtio‘>
<alias name=‘balloon0‘/>
<address type=‘pci‘ domain=‘0x0000‘ bus=‘0x00‘ slot=‘0x04‘ function=‘0x0‘/>
</memballoon>
</devices>
<seclabel type=‘dynamic‘ model=‘selinux‘ relabel=‘yes‘>
<label>unconfined_u:system_r:svirt_t:s0:c362,c396</label>
<imagelabel>unconfined_u:object_r:svirt_image_t:s0:c362,c396</imagelabel>
</seclabel>
</domain>
=========================================================================================
In [1]: from xml.etree.ElementTree import parse
In [3]: f = open(‘kvm.xml‘)
In [4]: et = parse(f)
In [5]: root = et.getroot()
In [6]: root
Out[6]: <Element ‘domain‘ at 0x7f88ac6448b8>
In [7]: root.tag
Out[7]: ‘domain‘
In [8]: root.attrib
Out[8]: {‘type‘: ‘kvm‘}
In [9]: root.text
Out[9]: ‘
‘
In [10]: root.text.strip()
Out[10]: ‘‘
In [11]: root.getchildren # 获取子元素
Out[11]: <function Element.getchildren()>
In [12]: root.getchildren()
/usr/bin/ipython:1: DeprecationWarning: This method will be removed in future versions. Use ‘list(elem)‘ or iteration over elem instead.
#!/usr/local/python3/bin/python3.7
Out[12]:
[<Element ‘name‘ at 0x7f88ac644b88>,
<Element ‘uuid‘ at 0x7f88ac6445e8>,
<Element ‘memory‘ at 0x7f88ac6449f8>,
<Element ‘currentMemory‘ at 0x7f88ac6446d8>,
<Element ‘vcpu‘ at 0x7f88ac644548>,
<Element ‘os‘ at 0x7f88ac644728>,
<Element ‘features‘ at 0x7f88ac644f48>,
<Element ‘clock‘ at 0x7f88ac644098>,
<Element ‘on_poweroff‘ at 0x7f88ac6444a8>,
<Element ‘on_reboot‘ at 0x7f88ac6440e8>,
<Element ‘on_crash‘ at 0x7f88ac644638>,
<Element ‘devices‘ at 0x7f88ac644f98>,
<Element ‘seclabel‘ at 0x7f88ac9f6ea8>]
In [13]: for child in root:
...: print(child.get(‘name‘))
...:
None
None
None
None
None
None
None
None
None
None
None
None
None
In [14]: for child in root:
...: print(child)
...:
...:
<Element ‘name‘ at 0x7f88ac644b88>
<Element ‘uuid‘ at 0x7f88ac6445e8>
<Element ‘memory‘ at 0x7f88ac6449f8>
<Element ‘currentMemory‘ at 0x7f88ac6446d8>
<Element ‘vcpu‘ at 0x7f88ac644548>
<Element ‘os‘ at 0x7f88ac644728>
<Element ‘features‘ at 0x7f88ac644f48>
<Element ‘clock‘ at 0x7f88ac644098>
<Element ‘on_poweroff‘ at 0x7f88ac6444a8>
<Element ‘on_reboot‘ at 0x7f88ac6440e8>
<Element ‘on_crash‘ at 0x7f88ac644638>
<Element ‘devices‘ at 0x7f88ac644f98>
<Element ‘seclabel‘ at 0x7f88ac9f6ea8>
In [15]: for child in root:
...: print(child.name)
...:
...:
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-e8a3aa266c61> in <module>
1 for child in root:
----> 2 print(child.name)
3
4
5
AttributeError: ‘xml.etree.ElementTree.Element‘ object has no attribute ‘name‘
In [16]: for child in root:
...: print(child.get())
...:
...:
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-18e15cbfadbf> in <module>
1 for child in root:
----> 2 print(child.get())
3
4
5
TypeError: get() missing required argument ‘key‘ (pos 1)
In [17]: root.find(‘Element‘)
In [18]:
In [18]: root.find(‘devices‘) # 寻找第一个子元素
Out[18]: <Element ‘devices‘ at 0x7f88ac644f98>
In [19]: root.findall(‘country‘)
Out[19]: []
In [20]: root.findall(‘devices‘) # 寻找所有的包含devices的子元素
Out[20]: [<Element ‘devices‘ at 0x7f88ac644f98>]
In [21]: root.iterfind(‘devices‘) # 返回一个生成器
Out[21]: <generator object prepare_child.<locals>.select at 0x7f88aca1e9a8>
In [22]: for e in root.iterfind(‘devices‘):print(e)
<Element ‘devices‘ at 0x7f88ac644f98>
In [23]: root.findall(‘disk‘)
Out[23]: []
In [24]: root.iter() # 寻找所有的元素,包括子元素和孙元素
Out[24]: <_elementtree._element_iterator at 0x7f88ac5864c0>
In [25]: list(root.iter())
Out[25]:
[<Element ‘domain‘ at 0x7f88ac6448b8>,
<Element ‘name‘ at 0x7f88ac644b88>,
<Element ‘uuid‘ at 0x7f88ac6445e8>,
<Element ‘memory‘ at 0x7f88ac6449f8>,
<Element ‘currentMemory‘ at 0x7f88ac6446d8>,
<Element ‘vcpu‘ at 0x7f88ac644548>,
<Element ‘os‘ at 0x7f88ac644728>,
<Element ‘type‘ at 0x7f88ac644318>,
<Element ‘boot‘ at 0x7f88ac644a48>,
<Element ‘features‘ at 0x7f88ac644f48>,
<Element ‘acpi‘ at 0x7f88ac644278>,
<Element ‘apic‘ at 0x7f88ac644908>,
<Element ‘pae‘ at 0x7f88ac644db8>,
<Element ‘clock‘ at 0x7f88ac644098>,
<Element ‘timer‘ at 0x7f88ac644e08>,
<Element ‘timer‘ at 0x7f88ac6441d8>,
<Element ‘on_poweroff‘ at 0x7f88ac6444a8>,
<Element ‘on_reboot‘ at 0x7f88ac6440e8>,
<Element ‘on_crash‘ at 0x7f88ac644638>,
<Element ‘devices‘ at 0x7f88ac644f98>,
<Element ‘emulator‘ at 0x7f88ac644cc8>,
<Element ‘disk‘ at 0x7f88ac644e58>,
<Element ‘driver‘ at 0x7f88adac1ea8>,
<Element ‘source‘ at 0x7f88adac1318>,
<Element ‘seclabel‘ at 0x7f88adac1d68>,
<Element ‘target‘ at 0x7f88accd29a8>,
<Element ‘alias‘ at 0x7f88accd2cc8>,
<Element ‘address‘ at 0x7f88accd2458>,
<Element ‘disk‘ at 0x7f88accd2db8>,
<Element ‘driver‘ at 0x7f88acc91e08>,
<Element ‘source‘ at 0x7f88acc914a8>,
<Element ‘target‘ at 0x7f88acc91408>,
<Element ‘readonly‘ at 0x7f88acc91db8>,
<Element ‘alias‘ at 0x7f88acc91d68>,
<Element ‘address‘ at 0x7f88acc915e8>,
<Element ‘controller‘ at 0x7f88adaaaae8>,
<Element ‘alias‘ at 0x7f88adaaa728>,
<Element ‘address‘ at 0x7f88adaaa408>,
<Element ‘controller‘ at 0x7f88adaaac78>,
<Element ‘alias‘ at 0x7f88adaaa4f8>,
<Element ‘address‘ at 0x7f88aca04138>,
<Element ‘interface‘ at 0x7f88aca04188>,
<Element ‘mac‘ at 0x7f88aca04228>,
<Element ‘source‘ at 0x7f88adb59728>,
<Element ‘target‘ at 0x7f88adb594f8>,
<Element ‘model‘ at 0x7f88adb59ea8>,
<Element ‘alias‘ at 0x7f88adb59ae8>,
<Element ‘address‘ at 0x7f88adb59a98>,
<Element ‘input‘ at 0x7f88adb64458>,
<Element ‘graphics‘ at 0x7f88adb64b88>,
<Element ‘listen‘ at 0x7f88adb64408>,
<Element ‘video‘ at 0x7f88adb64098>,
<Element ‘model‘ at 0x7f88adb64db8>,
<Element ‘alias‘ at 0x7f88adb647c8>,
<Element ‘address‘ at 0x7f88adb64f48>,
<Element ‘memballoon‘ at 0x7f88adb64958>,
<Element ‘alias‘ at 0x7f88adb64048>,
<Element ‘address‘ at 0x7f88adb64138>,
<Element ‘seclabel‘ at 0x7f88ac9f6ea8>,
<Element ‘label‘ at 0x7f88ac9f6d68>,
<Element ‘imagelabel‘ at 0x7f88ac9f6f48>]
In [26]: root.iter(‘disk‘)
Out[26]: <_elementtree._element_iterator at 0x7f88ac590ca8>
In [27]: list(root.iter(‘disk‘))
Out[27]: [<Element ‘disk‘ at 0x7f88ac644e58>, <Element ‘disk‘ at 0x7f88accd2db8>]
In [28]: root.findall(‘emulator/*‘)
Out[28]: []
In [29]: root.findall(‘devices/*‘) # 寻找子元素devices下面的所有孙元素
Out[29]:
[<Element ‘emulator‘ at 0x7f88ac644cc8>,
<Element ‘disk‘ at 0x7f88ac644e58>,
<Element ‘disk‘ at 0x7f88accd2db8>,
<Element ‘controller‘ at 0x7f88adaaaae8>,
<Element ‘controller‘ at 0x7f88adaaac78>,
<Element ‘interface‘ at 0x7f88aca04188>,
<Element ‘input‘ at 0x7f88adb64458>,
<Element ‘graphics‘ at 0x7f88adb64b88>,
<Element ‘video‘ at 0x7f88adb64098>,
<Element ‘memballoon‘ at 0x7f88adb64958>]
In [30]: root.findall(‘.//video‘) # 可以寻找孙元素,哪怕不是在root根的直接元素下面。
Out[30]: [<Element ‘video‘ at 0x7f88adb64098>]
In [31]: root.findall(‘.//video/..‘) # 寻找孙元素的父元素
Out[31]: [<Element ‘devices‘ at 0x7f88ac644f98>]
In [32]: root.findall(‘vcps[@placement]‘)
Out[32]: []
In [33]: root.findall(‘vcpu[@placement]‘) # 寻找某个元素包含属性placement的
Out[33]: [<Element ‘vcpu‘ at 0x7f88ac644548>]
In [35]: root.findall(‘vcpu[@placement="static"]‘) # 寻找某个元素包含属性placement为特定值的
Out[35]: [<Element ‘vcpu‘ at 0x7f88ac644548>]
In [36]: root.findall(‘os[type]‘) # 寻找包含type这个孙元素的名为os的子元素。
Out[36]: [<Element ‘os‘ at 0x7f88ac644728>]
In [37]: root.findall(‘os[type="hvm"]‘)
Out[37]: [<Element ‘os‘ at 0x7f88ac644728>]
In [38]: root.findall(‘name‘)
Out[38]: [<Element ‘name‘ at 0x7f88ac644b88>]
In [39]: root.findall(‘name[1]‘) # 寻找到的元素中的第一个
Out[39]: [<Element ‘name‘ at 0x7f88ac644b88>]
In [40]: root.findall(‘name[2]‘)
Out[40]: []
In [41]: root.findall(‘name[last()]‘) # 倒数第一个
Out[41]: [<Element ‘name‘ at 0x7f88ac644b88>]
In [42]: root.findall(‘name[last()-1]‘) # 倒数第二个
Out[42]: []
以上是关于如何解析简单的xml文档?的主要内容,如果未能解决你的问题,请参考以下文章