Python XPath抓取小说《三国演义》
Posted 星辰虎贲
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python XPath抓取小说《三国演义》相关的知识,希望对你有一定的参考价值。
from lxml import etree import requests """ 获取章节列表和地址 """ def getContents(): tagret = "https://www.kanunu8.com/files/old/2011/2447.html" req = requests.get(url=tagret) req.encoding = "gb2312" html = req.text bookdata = etree.HTML(html) table_list = bookdata.xpath(\'//table[9]//tr[1]//td[2]//table[4]//tr[1]//td[1]//table[1]//a//text()\') table_url = bookdata.xpath(\'//table[9]//tr[1]//td[2]//table[4]//tr[1]//td[1]//table[1]//a//@href\') for title in table_list: print(title) for u in table_url: print(u) """ 获取小说内容 """ def getContent(): tagret = "https://www.kanunu8.com/files/old/2011/2447/71775.html" req = requests.get(url=tagret) req.encoding = "gb2312" html = req.text bookdata = etree.HTML(html) table_list = bookdata.xpath(\'//table[5]//tr[1]//td[2]//text()\') print(table_list) if __name__ == \'__main__\': getContents()
以上是关于Python XPath抓取小说《三国演义》的主要内容,如果未能解决你的问题,请参考以下文章
Python爬虫编程思想(48):项目实战:抓取起点中文网的小说信息