????????????002 python3 +beautifulsoup4 +requests ??????????????????
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了????????????002 python3 +beautifulsoup4 +requests ??????????????????相关的知识,希望对你有一定的参考价值。
?????????3.5 ?????? ack ?????? col ???????????? ???????????? end lan
????????????: win7 python3.5 bs4 0.0.1 requests 2.19
???????????????2018-08-07
???????????????http://www.xhsd.cn/
??????????????????????????????????????????????????????????????????????????????????????????bs4?????????????????? http://www.xhsd.cn/ ????????????????????????????????????????????????????????????????????????????????????????????????????????????
2018-08-07 ???????????????????????????????????????????????????chrome????????? ?????? ???????????????????????????html?????????
??????
???????????? ???????????? ???????????? ???????????????
python ????????????
import requests
import bs4
import pandas as pd
import re
url="""http://www.xhsd.cn/"""
r=requests.get(url)
html=r.text
soup=bs4.BeautifulSoup(html,???lxml???)
tables=soup.find_all(???table???,bgcolor="#ffffff")
def etr(tb):
content={}
arr=list(filter(lambda x:len(str(x))>2,tb.children))
tr1=arr[0]
tr2=arr[1]
label=next(tr1.stripped_strings)
content[???label???]=label
print(label)
a_s=tr2.find_all(???a???,title=True)
cs=[]
for a in a_s:
try:
cts=list(a.stripped_strings)
#print(cts)
book,auth,price_now,price_before=cts
img=a.find(???img???)[???src???]
tmp={"book":book,"auth":auth,"price_now":price_now,"price_before":price_before,"image":img}
cs.append(tmp)
except:
continue
content["contents"]=cs
return content
tables=tables
dfs=[]
for tb in tables:
content=etr(tb)
df_tmp=pd.DataFrame(data=content[???contents???])
df_tmp[???label???]=content[???label???]
dfs.append(df_tmp)
df=pd.concat(dfs,ignore_index=True)
???????????????
??????????????????????????????df[???image???] ???http://www.xhsd.cn//upload/2017/7/1500881045493.jpg ??????
[???http://www.xhsd.cn//upload/2017/7/1500881045493.jpg???, ???http://www.xhsd.cn//upload/20160701\9787201077642.JPG???, ???http://www.xhsd.cn//upload/20160621\9787201088945.JPG???, ???http://www.xhsd.cn//upload/2017/6/1498807359861.jpg???]
?????????????????????????????? ???????????????
以上是关于????????????002 python3 +beautifulsoup4 +requests ??????????????????的主要内容,如果未能解决你的问题,请参考以下文章
????????????002 python3 +beautifulsoup4 +requests ??????????????????
Python3-笔记-B-002-数据结构-字典dict{ }
[Python3 填坑] 002 isdecimal() 与 isdigit() 的区别 + isnumeric() 的补充