xtml
Posted 是璇子鸭
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了xtml相关的知识,希望对你有一定的参考价值。
html
<meta charset="utf-8">
<table class="tablelist" cellpadding="0" cellspacing="0">
<tbody>
<tr class="h">
<td class="l" width="374">职位名称</td>
<td>职位类别</td>
<td>人数</td>
<td>地点</td>
<td>发布时间</td>
</tr>
<tr class="even">
<td class="l square"><a target="_blank" href="position_detail.php?id=33824&keywords=python&tid=87&lid=2218">22989-金融云区块链高级研发工程师(深圳)</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-25</td>
</tr>
<tr class="odd">
<td class="l square"><a target="_blank" href="position_detail.php?id=29938&keywords=python&tid=87&lid=2218">22989-金融云高级后台开发</a></td>
<td>技术类</td>
<td>2</td>
<td>深圳</td>
<td>2017-11-25</td>
</tr>
<tr class="even">
<td class="l square"><a target="_blank" href="position_detail.php?id=31236&keywords=python&tid=87&lid=2218">SNG16-腾讯音乐运营开发工程师(深圳)</a></td>
<td>技术类</td>
<td>2</td>
<td>深圳</td>
<td>2017-11-25</td>
</tr>
<tr class="odd">
<td class="l square"><a target="_blank" href="position_detail.php?id=31235&keywords=python&tid=87&lid=2218">SNG16-腾讯音乐业务运维工程师(深圳)</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-25</td>
</tr>
<tr class="even">
<td class="l square"><a target="_blank" href="position_detail.php?id=34531&keywords=python&tid=87&lid=2218">TEG03-高级研发工程师(深圳)</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
<tr class="odd">
<td class="l square"><a target="_blank" href="position_detail.php?id=34532&keywords=python&tid=87&lid=2218">TEG03-高级图像算法研发工程师(深圳)</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
<tr class="even">
<td class="l square"><a target="_blank" href="position_detail.php?id=31648&keywords=python&tid=87&lid=2218">TEG11-高级AI开发工程师(深圳)</a></td>
<td>技术类</td>
<td>4</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
<tr class="odd">
<td class="l square"><a target="_blank" href="position_detail.php?id=32218&keywords=python&tid=87&lid=2218">15851-后台开发工程师</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
<tr class="even">
<td class="l square"><a target="_blank" href="position_detail.php?id=32217&keywords=python&tid=87&lid=2218">15851-后台开发工程师</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
<tr class="odd">
<td class="even hubei china"><a target="_blank" href="position_detail.php?id=34511&keywords=python&tid=87&lid=2218">SNG11-高级业务运维工程师(深圳)</a></td>
<td>技术类</td>
<td>1</td>
<td>深圳</td>
<td>2017-11-24</td>
</tr>
</tbody>
</table>
xpath
from lxml import etree
parser = etree.HTMLParser(encoding='utf-8')
html = etree.parse('tencent.html',parser=parser)
# print(html)
# print(etree.tostring(html, encoding='utf-8').decode('utf-8'))
# 1.获取所有tr标签
trs = html.xpath('//tr') #xpth返回的一定是个列表,取出元素注意下标
for tr in trs:
print(etree.tostring(html, encoding='utf-8').decode('utf-8'))
# 2.获取第2个标签
tr = html.xpath('//tr[2]')[0]
print(etree.tostring(html, encoding='utf-8').decode('utf-8'))
# 3.获取所有class等于even的tr标签
trs = html.xpath("//tr[contains(@class,'hubei')]")
for tr in trs:
print(etree.tostring(html, encoding='utf-8').decode('utf-8'))
# 4.获取所有a标签的href属性
alist = html.xpath('//a/@href')
for a in alist:
print('http://hr/tencent.com/' + a)
# 5.获取所有职位信息(纯文本)、
trs = html.xpath('//tr[position()>1]')
positions = []
for tr in trs:
href = tr.xpath('.//a/@href')[0]
fullurl = 'http://hr.tencent.com/' + href
title = tr.xpath('.//td[1]/text()')
category = tr.xpath('.//td[2]/text()')
number = tr.xpath('.//td[3]/text()')
city = tr.xpath('.//td[4]/text()')
pubtime = tr.xpath('.//td[5]/text()')
position = {
'title':title,
'url': fullurl,
'category': category,
'number': number,
'city': city,
}
以上是关于xtml的主要内容,如果未能解决你的问题,请参考以下文章