????????????002 python3 +beautifulsoup4 +requests ??????????????????

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了????????????002 python3 +beautifulsoup4 +requests ??????????????????相关的知识,希望对你有一定的参考价值。

?????????3.5   ??????   ack   ??????   col   ????????????   ????????????   end   lan   

????????????: win7 python3.5 bs4 0.0.1   requests  2.19

???????????????2018-08-07

???????????????http://www.xhsd.cn/

??????????????????????????????????????????????????????????????????????????????????????????bs4?????????????????? http://www.xhsd.cn/ ????????????????????????????????????????????????????????????????????????????????????????????????????????????

2018-08-07 ???????????????????????????????????????????????????chrome????????? ?????? ???????????????????????????html?????????

 

??????

??????????????????

 

???????????? ???????????? ????????????  ???????????????

python ????????????

 

 

import requests 
import  bs4 
import pandas as pd 
import re 
url="""http://www.xhsd.cn/"""
r=requests.get(url)
html=r.text

soup=bs4.BeautifulSoup(html,???lxml???)



tables=soup.find_all(???table???,bgcolor="#ffffff")

def etr(tb):
    content={}
    arr=list(filter(lambda x:len(str(x))>2,tb.children))
    tr1=arr[0]
    tr2=arr[1]
    label=next(tr1.stripped_strings)
    content[???label???]=label
    print(label)

    a_s=tr2.find_all(???a???,title=True)
    cs=[]
    for a in a_s:
        try:
            cts=list(a.stripped_strings)
            #print(cts)
            book,auth,price_now,price_before=cts
            img=a.find(???img???)[???src???]
            tmp={"book":book,"auth":auth,"price_now":price_now,"price_before":price_before,"image":img}
            cs.append(tmp)
        except:
            continue

    content["contents"]=cs
    return content 

tables=tables
dfs=[]
for tb in tables:
    content=etr(tb)

    df_tmp=pd.DataFrame(data=content[???contents???])
    df_tmp[???label???]=content[???label???]
    dfs.append(df_tmp)

df=pd.concat(dfs,ignore_index=True)

 

??????????????????

 

 ???????????????

 

??????????????????????????????df[???image???]  ???http://www.xhsd.cn//upload/2017/7/1500881045493.jpg ??????

[???http://www.xhsd.cn//upload/2017/7/1500881045493.jpg???, ???http://www.xhsd.cn//upload/20160701\9787201077642.JPG???, ???http://www.xhsd.cn//upload/20160621\9787201088945.JPG???, ???http://www.xhsd.cn//upload/2017/6/1498807359861.jpg???]

 

?????????????????????????????? ???????????????

 

以上是关于????????????002 python3 +beautifulsoup4 +requests ??????????????????的主要内容,如果未能解决你的问题,请参考以下文章

002天-python3-基础知识-变量

????????????002 python3 +beautifulsoup4 +requests ??????????????????

Python3-笔记-C-002-函数-zip

Python3-笔记-B-002-数据结构-字典dict{ }

Python3-笔记-numpy学习指南-002-基础

[Python3 填坑] 002 isdecimal() 与 isdigit() 的区别 + isnumeric() 的补充