通过bs4爬取三国演义
Posted J哥。
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了通过bs4爬取三国演义相关的知识,希望对你有一定的参考价值。
爬取三国演义:
import requests
from bs4 import BeautifulSoup # 新的认知 只能解析首页
url = 'https://www.shicimingju.com/book/sanguoyanyi.html'
response = requests.get(url) # 获取字符串型的数据
f = open('./sanguo.txt','w',encoding='utf-8')
response.encoding = 'utf-8'
response = response.text
# 数据解析
soup = BeautifulSoup(response, 'lxml')
a_list = soup.select('.book-mulu > ul > li > a')
for a in a_list:
title = a.string
url1 = 'https://www.shicimingju.com' + a['href']
# 对详情页发起请求 获取 章节内容
page_text = requests.get(url1)
page_text.encoding = 'utf-8'
page_text = page_text.text
soup = BeautifulSoup(page_text, 'lxml')
# a = soup.xpath('//*[@id="main_left"]/div[1]/div')
divs = soup.find('div',class_ ='chapter_content')
com = divs.text
f.write(title+':'+com+'\\n')
print('保存成功!')
print('结束!')
f.close()
以上是关于通过bs4爬取三国演义的主要内容,如果未能解决你的问题,请参考以下文章