xtml

Posted 是璇子鸭

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了xtml相关的知识,希望对你有一定的参考价值。

html

<meta charset="utf-8">
<table class="tablelist" cellpadding="0" cellspacing="0">
    <tbody>
        <tr class="h">
            <td class="l" width="374">职位名称</td>
            <td>职位类别</td>
            <td>人数</td>
            <td>地点</td>
            <td>发布时间</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=33824&keywords=python&tid=87&lid=2218">22989-金融云区块链高级研发工程师(深圳)</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=29938&keywords=python&tid=87&lid=2218">22989-金融云高级后台开发</a></td>
            <td>技术类</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31236&keywords=python&tid=87&lid=2218">SNG16-腾讯音乐运营开发工程师(深圳)</a></td>
            <td>技术类</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31235&keywords=python&tid=87&lid=2218">SNG16-腾讯音乐业务运维工程师(深圳)</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34531&keywords=python&tid=87&lid=2218">TEG03-高级研发工程师(深圳)</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34532&keywords=python&tid=87&lid=2218">TEG03-高级图像算法研发工程师(深圳)</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31648&keywords=python&tid=87&lid=2218">TEG11-高级AI开发工程师(深圳)</a></td>
            <td>技术类</td>
            <td>4</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32218&keywords=python&tid=87&lid=2218">15851-后台开发工程师</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32217&keywords=python&tid=87&lid=2218">15851-后台开发工程师</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="even hubei china"><a target="_blank" href="position_detail.php?id=34511&keywords=python&tid=87&lid=2218">SNG11-高级业务运维工程师(深圳)</a></td>
            <td>技术类</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
    </tbody>
</table>

xpath

from lxml import etree

parser = etree.HTMLParser(encoding='utf-8')
html = etree.parse('tencent.html',parser=parser)
# print(html)
# print(etree.tostring(html, encoding='utf-8').decode('utf-8'))
# 1.获取所有tr标签
trs = html.xpath('//tr') #xpth返回的一定是个列表,取出元素注意下标
for tr in trs:
    print(etree.tostring(html, encoding='utf-8').decode('utf-8'))

# 2.获取第2个标签
tr = html.xpath('//tr[2]')[0]
print(etree.tostring(html, encoding='utf-8').decode('utf-8'))

# 3.获取所有class等于even的tr标签
trs = html.xpath("//tr[contains(@class,'hubei')]")
for tr in trs:
    print(etree.tostring(html, encoding='utf-8').decode('utf-8'))

# 4.获取所有a标签的href属性
alist = html.xpath('//a/@href')
for a in alist:
    print('http://hr/tencent.com/' + a)

# 5.获取所有职位信息(纯文本)、
trs = html.xpath('//tr[position()>1]')
positions = []
for tr in trs:
    href = tr.xpath('.//a/@href')[0]
    fullurl = 'http://hr.tencent.com/' + href
    title = tr.xpath('.//td[1]/text()')
    category = tr.xpath('.//td[2]/text()')
    number = tr.xpath('.//td[3]/text()')
    city = tr.xpath('.//td[4]/text()')
    pubtime = tr.xpath('.//td[5]/text()')
    position = 
        'title':title,
        'url': fullurl,
        'category': category,
        'number': number,
        'city': city,

    

以上是关于xtml的主要内容,如果未能解决你的问题,请参考以下文章

xtml

XTML 1.1标题

XTML 学前概述

用Ji框架进行XTML/XML解析的过程

html和xhtml的区别

01课堂测试