爬虫实例林青霞女神照片爬取——百度贴吧
Posted 是璇子鸭
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫实例林青霞女神照片爬取——百度贴吧相关的知识,希望对你有一定的参考价值。
记得事前备文件夹保存图片!!!
import time
import requests
head = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
'Referer': 'https://tieba.baidu.com/p/2266120243', # 必不可少的Referer 认证
'Cookie': 'BIDUPSID=111B6D3849982122AD1E8D89F964B58B; PSTM=1625456528; BAIDUID=111B6D3849982122D6E6DA9B619C2142:FG=1; __yjs_duid=1_c1c926319de3dc5cfd73d9bfe2d131ae1625457389715; H_PS_PSSID=34269_34099_34224_31660_34004_26350_34247; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDUSS=tPZ0lCYm5wd082ZTcyUUtHU1JRWmYteGFmUi1KYWg4RGo5QlNLfkwzan5GaEpoRVFBQUFBJCQAAAAAAAAAAAEAAAAyoCPPc2hlbGx5RDcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP-J6mD~iepgVm; BDUSS_BFESS=tPZ0lCYm5wd082ZTcyUUtHU1JRWmYteGFmUi1KYWg4RGo5QlNLfkwzan5GaEpoRVFBQUFBJCQAAAAAAAAAAAEAAAAyoCPPc2hlbGx5RDcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP-J6mD~iepgVm; delPer=0; PSINO=1; STOKEN=709ccef63b33ea4fab0b6fecdae0cda54d4b93b42685300afa4218a2a0c95fe5; st_key_id=17; BAIDU_WISE_UID=wapp_1625990541369_508; USER_JUMP=-1; 3475218482_FRSVideoUploadTip=1; video_bubble3475218482=1; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; userFrom=null; Hm_lvt_98b9d8c2fd6608d564bf2ac2ae642948=1625990540,1625990738; wise_device=0; st_data=0000ce93dba7d046132a9fcf1b1488566fbd3ec0ce7c099bb1f7e3a3805abefea3d7b8d6bd9189c7f498cf8c2f3e8bd2503ba0efa0567fded0b544451caeb3588bb0cab84c6bdcc6767a1bc765e35f62c8381069e93a475ec608560dd525bc13d66136b64a4ab1fd6a8a4e1c5ae41f4bc430a39b74adc9f349815f0c46a3b973; st_sign=f5a8ff30; ab_sr=1.0.1_ZDI3YzIwOGU3ZmQ4YmE2ZDM5ZmI0NDBlYmZiODQ1YzA4ZmQ5OGRlNjU5YThmNTQwYzc3NmU1YzUxN2I3NWVjZGVhMmFhNTQ0ZTQwMTJlNjBmNzhlYWNiNzc3MTUxMWQxNmM3NTg5MjA1MmMwYzhmNzcyZDhmODdkYmVlYzhkZmM2ZDgwM2ExMjZlY2Q2NWE4ZTNiODEwNDA4NjNkOGY2NA==; BA_HECTOR=a02l212hak2kal256v1gela340r; Hm_lpvt_98b9d8c2fd6608d564bf2ac2ae642948=1625991487',
}
# 通过requests获取访问的页面
def get_json(url):
r = requests.get(url, headers=head)
if r.status_code != 200: # 如果没有正常获得网页,产生异常
# 200表示请求成功
# 4开头的 状态 表示 浏览器这边出问题 请求方法不对 或者url不对
# 5开头的表示 服务器这边出问题 代码出bug 服务停止
raise Exception() # 抛异常
return r.json() # 将json信息转成普通的python类型
#获取图片
def get_pic(pic_url, name):
r = requests.get(pic_url, headers=head)
print(r)
if r.status_code != 200: # 如果没有正常获得网页,产生异常
raise Exception()
filename = pic_url.rsplit('/')[-1] # 图片的名称
print(filename)
with open('C:/Users/JSJSYS/PycharmProjects/untitled/beauty/' + name + filename, mode='wb') as sw:
sw.write(r.content) # 将图片 保存下来
print('下载图片:' + filename + '成功!')
# 这个是解析json
def parse_json(json, name):
pic_list = json.get('data').get('pic_list')
for pic in pic_list:
purl = pic.get('purl') # 取出图片地址
# print(purl)
get_pic(purl, name) # 调用上面获取图片的方法 下载下来
time.sleep(5)
# 生成13位的时间戳
def get_timestamp():
t = str(time.time())
return t.replace(".", "")[:-3] # 因为最后的时间是13位
# 主函数
def maindown(url, name):
num = 1
# 下载html页面
json = get_json(url)
# print(json)
# 从页面中提取链接
parse_json(json, name)
if __name__ == '__main__':
for page in range(1, 5):
for i in range(page, page + 5):
start = (i - page) * 40 + 1 + 200 * (page - 1) # 201. 241. 281. 321 361
end = 200 * (page - 1) + (i - page + 1) * 40 # 240 280 320 360 400 440. 3*40
ts = get_timestamp()
url = f'https://tieba.baidu.com/photo/g/bw/picture/list?kw=%E6%9E%97%E9%9D%92%E9%9C%9E&alt=jview&rn=200&tid=2266120243&pn={page}&ps={start}&pe={end}&info=1&_={ts}'
maindown(url, '青霞仙子')
time.sleep(5)
以上是关于爬虫实例林青霞女神照片爬取——百度贴吧的主要内容,如果未能解决你的问题,请参考以下文章