Python爬取整本小说
Posted 、工藤新一
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python爬取整本小说相关的知识,希望对你有一定的参考价值。
Python爬取整本小说
文章目录
爬小说(一页)
from urllib.request import urlopen
#获取页面
from bs4 import BeautifulSoup
#从网页抓取数据
html = urlopen(r"https://www.jueshitangmen.info/tian-meng-bing-can-11.html").read().decode('utf-8')#utf-8
对中文解码
print(html)
print("===================================================================")
#分隔符
soup = BeautifulSoup(html,features="lxml")
# 第一个参数是解析文本 # 第二个参数是解析器 soup
title = soup.find("title")
#提取title标签中的内容
print(title)
print("===================================================================")
#返回所有P标签
all_p = soup.find_all("p")
for i in all_p:
print("\\n",i.get_text())
爬一本小说
from urllib.request import urlopen
from bs4 import BeautifulSoup
j=10
f = open("D://demofile3.txt",'w',encoding='utf-8')
while j<500:
##利用循环确保爬一本
j=j+1
try:
html = urlopen(r"https://www.jueshitangmen.info/tian-meng-bing-can-{page}.html".format(page=j)).read().decode('utf-8')
except:
continue
soup = BeautifulSoup(html,features="lxml")
title = soup.find("title")
#返回所有P标签
all_p = soup.find_all("p")
for i in all_p:
print("\\n",i.get_text())
i=i.get_text()
f.write(i+"\\n")
f.close()
爬整本斗破苍穹
from urllib.request import urlopen
from bs4 import BeautifulSoup
j=0
f = open("D://demofile3.txt",'w',encoding='utf-8')
#需要用utf-8解码
while j<=1627458-1625802:
#确保从第一章爬到最后一张
j=j+1
try:
html =
urlopen(r"https://www.xs98.com/xs922/{page}.html".format(page=1625802+j)).read().decode('utf-8')
except:
continue
soup = BeautifulSoup(html,features="lxml")
title = soup.find("title")
print(title.get_text())
f.write(title.get_text()+"\\n")
#返回所有P标签
all_p = soup.find_all("div",attrs={"id":"content"})
for i in all_p:
print("\\n",i.get_text())
i=i.get_text()
f.write(i+"\\n")
f.close()
--------------------------------------------------
附:Python制作二维码简易步骤
附:Python爬取整本小说
附:Python爬天气预报
附:Python爬取百度图片
附:python图片转字符画
以上是关于Python爬取整本小说的主要内容,如果未能解决你的问题,请参考以下文章