xpath的使用：定位，获取文本和属性值

Posted 2021-01-11

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了xpath的使用：定位，获取文本和属性值相关的知识，希望对你有一定的参考价值。

myPage = ‘‘‘<html>
<title>TITLE</title>
<body>
<h1></h1>
<div></div>
<div id="photos">
<img src="pic1.jpeg"/><span id="pic1">*</span>
<img src="pic2.jpeg"/><span id="pic2">****</span>
<p><a href="http://www.example.com/more_pic.html">;*</a></p>
<a href="http://www.baidu.com">****</a>;
<a href="http://www.163.com">*****</a>;
<a href="http://www.sohu.com">****</a>;
</div>
<p class="myclassname">Hello, world!<br/>-- by Adam</p>
<div class="foot">放在尾部的其他一些说明</div>
</body>
</html>‘‘‘

html = etree.fromstring(myPage)

#一、定位
divs1 = html.xpath(‘//div‘)
divs2 = html.xpath(‘//div[@id]‘)
divs3 = html.xpath(‘//div[@class="foot"]‘)
divs4 = html.xpath(‘//div[@]‘)
divs5 = html.xpath(‘//div[1]‘)
divs6 = html.xpath(‘//div[last()-1]‘)
divs7 = html.xpath(‘//div[position()<3]‘)
divs8 = html.xpath(‘//div|//h1‘)
divs9 = html.xpath(‘//div[not(@)]‘)

二、取文本 text() 区别 html.xpath(‘string()‘)

text1 = html.xpath(‘//div/text()‘)
text2 = html.xpath(‘//div[@id]/text()‘)
text3 = html.xpath(‘//div[@class="foot"]/text()‘)
text4 = html.xpath(‘//div[@*]/text()‘)
text5 = html.xpath(‘//div[1]/text()‘)
text6 = html.xpath(‘//div[last()-1]/text()‘)
text7 = html.xpath(‘//div[position()<3]/text()‘)
text8 = html.xpath(‘//div/text()|//h1/text()‘)

#三、取属性 @
value1 = html.xpath(‘//a/@href‘)
value2 = html.xpath(‘//img/@src‘)
value3 = html.xpath(‘//div[2]/span/@id‘)

#四、定位（进阶）
#1.文档(DOM)元素(Element)的find，findall方法
divs = html.xpath(‘//div[position()<3]‘)
for div in divs:
ass = div.findall(‘a‘) # 这里只能找到:div->a, 找不到:div->p->a
for a in ass:
if a is not None:
#print(dir(a))
print(a.text, a.attrib.get(‘href‘)) #文档(DOM)元素(Element)的属性：text, attrib

2.与1等价

a_href = html.xpath(‘//div[position()<3]/a/@href‘)
print(a_href)

#3.注意与1、2的区别
a_href = html.xpath(‘//div[position()<3]//a/@href‘)
print(a_href)

参考：https://www.cnblogs.com/hhh5460/p/5079465.html

以上是关于xpath的使用：定位，获取文本和属性值的主要内容，如果未能解决你的问题，请参考以下文章

selumium 中 xpath获取文本属性正确写法

RFS入门Xpath使用

xpath使用属性元素定位，包含 and ornot

Python3-Selenium自动化测试框架之xpath元素定位

爬虫进阶Selenium定位获取标签对象并提取数据

如何使用 XML::XPath 获取属性？