Python通过lxml库遍历xml通过xpath查询（标签，属性名称，属性值，标签对属性）

Posted 2020-10-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python通过lxml库遍历xml通过xpath查询（标签，属性名称，属性值，标签对属性）相关的知识，希望对你有一定的参考价值。

xml实例：

版本一：

<?xml version="1.0" encoding="UTF-8"?><country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">晴</wulumuqi></xinjiang></provinces></country>

没有空格，换行，的版本

python操作操作实例：

from lxml import etree
class r_xpath_xml(object):
    def __init__(self):
        self.xmetrpa=etree.parse(‘info.xml‘) #读取xml数据
        pass
    def xpxm(self):
        xpxlm=self.xmetrpa
        print etree.tostring(xpxlm) #打印xml数据
        root=xpxlm.getroot() #获得该树的树根
        print root.tag,‘ ‘,  #打印根标签名
        print root.items() #获得标签属性名称和属性值
        for a in root:  ##遍历根下一集级标签
            print a.tag,a.items(),a.text,‘ 被打印的类型为： ‘,type(a)  #打印标签名称，标签属性，标签数据
        for b in a:
            print b.tag,b.items(),b.text#,b
            for c in b:
                print c.tag,c.items(),c.text#,c
        for d in c:
            print d.tag,d.items(),d.test,d
        print xpxlm.xpath(‘//node()‘)#.items()#.tag
        print ‘=====================================================================================================‘
        xa=xpxlm.xpath(‘//heilongjiang/*‘)
        print xa
        for xb in xa:
            print xb.tag,xb.items(),xb.text
        xc=xpxlm.xpath(‘//xinjiang/*‘)
        print xc
        for xd in xc:
            print xd.tag,xd.items(),xd.text
if __name__ == ‘__main__‘:
    xpx=r_xpath_xml()
    xpx.xpxm()

应用for循环遍历标签层次结构，tag获取标签名，items()通过字典函数获取[（‘属性名‘ , ‘属性值‘）]，text获取标签对之间的数据。tag，items(),text针对的类型为：<type ‘lxml.etree._Element‘>
打印结果：

<country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">&#26228;</wulumuqi></xinjiang></provinces></country>
country   [(‘name‘, ‘chain‘)]
provinces [] None  被打印的类型为：  <type ‘lxml.etree._Element‘>
heilongjiang [(‘name‘, ‘citys‘)] None
haerbin [] None
daqing [] None
guangdong [(‘name‘, ‘citys‘)] None
guangzhou [] None
shenzhen [] None
huhai [] None
taiwan [(‘name‘, ‘citys‘)] None
taibei [] None
gaoxiong [] None
xinjiang [(‘name‘, ‘citys‘)] None
wulumuqi [(‘waith‘, ‘tianqi‘)] 晴
[<Element country at 0x2d47b20>, <Element provinces at 0x2d47990>, <Element heilongjiang at 0x2d479b8>, <Element haerbin at 0x2d47558>, <Element daqing at 0x2d47328>, <Element guangdong at 0x2d47300>, <Element guangzhou at 0x2d476e8>, <Element shenzhen at 0x2d47530>, <Element huhai at 0x2d472d8>, <Element taiwan at 0x2d47260>, <Element taibei at 0x2d47238>, <Element gaoxiong at 0x2d47080>, <Element xinjiang at 0x2d47710>, <Element wulumuqi at 0x2d47968>, u‘\u6674‘]
=====================================================================================================
[<Element haerbin at 0x2d479b8>, <Element daqing at 0x2d47148>]
haerbin [] None
daqing [] None
[<Element wulumuqi at 0x2d47968>] 类型为： <type ‘list‘>
wulumuqi [(‘waith‘, ‘tianqi‘)] 晴

xml实例：

版本二：

<?xml version="1.0" encoding="UTF-8"?>
<country name="chain">
    <provinces>
        <city:table xmlns:city="http://www.w3school.com.cn/furniture">
        <heilongjiang name="citys"><city:haerbin/><city:daqing/></heilongjiang>
        <guangdong name="citys"><city:guangzhou/><city:shenzhen/><city:zhuhai/></guangdong>
        <taiwan name="citys"><city:taibei/><city:gaoxiong/></taiwan>
        <xinjiang name="citys"><city:wulumuqi>晴</city:wulumuqi></xinjiang>
        </city:table>    
    </provinces>
</country>


实例：

print xpxlm.xpath(‘//node()‘)


打印结果：
空格回车字符，命名空间。

[<Element country at 0x2e79b20>, ‘\n    ‘, <Element provinces at 0x2e79990>, ‘\n        ‘, <Element {http://www.w3school.com.cn/furniture}table at 0x2e79710>, ‘\n        ‘, <Element heilongjiang at 0x2e799b8>, <Element {http://www.w3school.com.cn/furniture}haerbin at 0x2e79328>, <Element {http://www.w3school.com.cn/furniture}daqing at 0x2e79968>, ‘\n        ‘, <Element guangdong at 0x2e79530>, <Element {http://www.w3school.com.cn/furniture}guangzhou at 0x2e79300>, <Element {http://www.w3school.com.cn/furniture}shenzhen at 0x2e792d8>, <Element {http://www.w3school.com.cn/furniture}zhuhai at 0x2e79260>, ‘\n        ‘, <Element taiwan at 0x2e79238>, <Element {http://www.w3school.com.cn/furniture}taibei at 0x2e79080>, <Element {http://www.w3school.com.cn/furniture}gaoxiong at 0x2e79058>, ‘\n        ‘, <Element xinjiang at 0x2e796e8>, <Element {http://www.w3school.com.cn/furniture}wulumuqi at 0x2e79558>, u‘\u6674‘, ‘\n        ‘, ‘    \n    ‘, ‘\n‘]

去掉空格：

        xp=xpxlm.xpath(‘//node()‘)
        print xp,           #.items()#.tag
        for i in xp:
            if ‘‘ in i or ‘\n‘ in i:
                continue
            else: 
                print i.tag

通过判断去除空格换行符号

输出结果：

provinces
{city}table
heilongjiang
{city}haerbin
{city}daqing
guangdong
{city}guangzhou
{city}shenzhen
{city}zhuhai
taiwan
{city}taibei
{city}gaoxiong
xinjiang
{city}wulumuqi

以上是关于Python通过lxml库遍历xml通过xpath查询（标签，属性名称，属性值，标签对属性）的主要内容，如果未能解决你的问题，请参考以下文章