用简单的例子说明BeautifulSoup库的使用

Posted 2022-09-18 yiyea

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了用简单的例子说明BeautifulSoup库的使用相关的知识，希望对你有一定的参考价值。

"ISO-8859-1BeautifulSoup库，为python外库，使用前要先安装，主要功能就是用来解析html代码，挖出我们相要的数据
下面是一个例子，拿到500万网站的近30期双色球的开奖号码

import requests
from bs4 import BeautifulSoup　　#引入库
url="https://datachart.500.com/ssq/?expect=100"
re=requests.get(url)
if re.status_code==200:　　#表示请求成功，可以继续往下面走
   if re.encoding=="ISO-8859-1":
       html=re.text.encode("ISO-8859-1").decode("GBK")	#ISO-8859-1要转码，不然中文有乱码	
   else:
       html=re.text	#utf-8编码直接用，
soup=BeautifulSoup(html, ‘lxml‘)　　　　#没安装lxml的要安装，不然要报错
#获得期号
re_id=soup.tbody.findAll("td",attrs="align":"center")　　　　#属性选择器
qihao=[x.string for x in re_id]     #[x.get_text() for x in re_id]都可以，string是去掉标签，获得内容
print(qihao)        #[‘19072 ‘, ‘19073 ‘, ‘19074 ‘, ‘19075 ‘, ...]
#获得红与蓝球
re_red=soup.tbody.tr.findAll("td",attrs="class":"chartBall01")　　#通过属性选择器获得红球代码块
re_blue=soup.tbody.tr.findAll("td",attrs="class":"chartBall02")　　#通过属性选择器获得蓝球代码块
#再将标签去掉，得到纯内容
r=[x.string for x in re_red]	#python列表生成式，不用多说
b=[x.string for x in re_blue]
newr=r[-7:-1]　　#用切片获得最新一期红球,
newb=b[-1]   	    #最新一期的蓝球

#拿到自己相要的数据了，下面就是数据的整理，分析...

#下面再帖点findAll的一些用法

soup.find_all("p")
soup.find_all("title")
soup.find_all(id="link2")
soup.find_all(id="link2",limit=2)
soup.find_all(id=True)
soup.find_all(id=True)
soup.find_all("a",class_="classname")
soup.find_all(text="文本内容")
soup.find_all(text=["tanghao","laowang"])
soup.find_all("a",attrs="class":"classname")
soup.select("p.title")

以上是关于用简单的例子说明BeautifulSoup库的使用的主要内容，如果未能解决你的问题，请参考以下文章