爬虫实例林青霞女神照片爬取——百度贴吧

Posted 是璇子鸭

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫实例林青霞女神照片爬取——百度贴吧相关的知识,希望对你有一定的参考价值。

记得事前备文件夹保存图片!!!

import time
import requests

head = {
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    'Referer': 'https://tieba.baidu.com/p/2266120243', # 必不可少的Referer 认证
    'Cookie': 'BIDUPSID=111B6D3849982122AD1E8D89F964B58B; PSTM=1625456528; BAIDUID=111B6D3849982122D6E6DA9B619C2142:FG=1; __yjs_duid=1_c1c926319de3dc5cfd73d9bfe2d131ae1625457389715; H_PS_PSSID=34269_34099_34224_31660_34004_26350_34247; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDUSS=tPZ0lCYm5wd082ZTcyUUtHU1JRWmYteGFmUi1KYWg4RGo5QlNLfkwzan5GaEpoRVFBQUFBJCQAAAAAAAAAAAEAAAAyoCPPc2hlbGx5RDcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP-J6mD~iepgVm; BDUSS_BFESS=tPZ0lCYm5wd082ZTcyUUtHU1JRWmYteGFmUi1KYWg4RGo5QlNLfkwzan5GaEpoRVFBQUFBJCQAAAAAAAAAAAEAAAAyoCPPc2hlbGx5RDcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP-J6mD~iepgVm; delPer=0; PSINO=1; STOKEN=709ccef63b33ea4fab0b6fecdae0cda54d4b93b42685300afa4218a2a0c95fe5; st_key_id=17; BAIDU_WISE_UID=wapp_1625990541369_508; USER_JUMP=-1; 3475218482_FRSVideoUploadTip=1; video_bubble3475218482=1; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; userFrom=null; Hm_lvt_98b9d8c2fd6608d564bf2ac2ae642948=1625990540,1625990738; wise_device=0; st_data=0000ce93dba7d046132a9fcf1b1488566fbd3ec0ce7c099bb1f7e3a3805abefea3d7b8d6bd9189c7f498cf8c2f3e8bd2503ba0efa0567fded0b544451caeb3588bb0cab84c6bdcc6767a1bc765e35f62c8381069e93a475ec608560dd525bc13d66136b64a4ab1fd6a8a4e1c5ae41f4bc430a39b74adc9f349815f0c46a3b973; st_sign=f5a8ff30; ab_sr=1.0.1_ZDI3YzIwOGU3ZmQ4YmE2ZDM5ZmI0NDBlYmZiODQ1YzA4ZmQ5OGRlNjU5YThmNTQwYzc3NmU1YzUxN2I3NWVjZGVhMmFhNTQ0ZTQwMTJlNjBmNzhlYWNiNzc3MTUxMWQxNmM3NTg5MjA1MmMwYzhmNzcyZDhmODdkYmVlYzhkZmM2ZDgwM2ExMjZlY2Q2NWE4ZTNiODEwNDA4NjNkOGY2NA==; BA_HECTOR=a02l212hak2kal256v1gela340r; Hm_lpvt_98b9d8c2fd6608d564bf2ac2ae642948=1625991487',
}

# 通过requests获取访问的页面
def get_json(url):
    r = requests.get(url, headers=head)
    if r.status_code != 200:  # 如果没有正常获得网页,产生异常
        # 200表示请求成功
        # 4开头的 状态 表示 浏览器这边出问题  请求方法不对  或者url不对
        # 5开头的表示 服务器这边出问题   代码出bug 服务停止

        raise Exception() # 抛异常
    return r.json() # 将json信息转成普通的python类型

#获取图片
def get_pic(pic_url, name):
    r = requests.get(pic_url, headers=head)
    print(r)
    if r.status_code != 200:  # 如果没有正常获得网页,产生异常
        raise Exception()
    filename = pic_url.rsplit('/')[-1] # 图片的名称
    print(filename)
    with open('C:/Users/JSJSYS/PycharmProjects/untitled/beauty/' + name + filename, mode='wb') as sw:
        sw.write(r.content) # 将图片 保存下来
    print('下载图片:' + filename + '成功!')

# 这个是解析json
def parse_json(json, name):
    pic_list = json.get('data').get('pic_list')
    for pic in pic_list:
        purl = pic.get('purl') # 取出图片地址
        # print(purl)
        get_pic(purl, name) # 调用上面获取图片的方法  下载下来
        time.sleep(5)

# 生成13位的时间戳
def get_timestamp():
    t = str(time.time())
    return t.replace(".", "")[:-3]  # 因为最后的时间是13位

# 主函数
def maindown(url, name):
    num = 1
    # 下载html页面
    json = get_json(url)
    # print(json)
    # 从页面中提取链接
    parse_json(json, name)


if __name__ == '__main__':
    for page in range(1, 5):
        for i in range(page, page + 5):
            start = (i - page) * 40 + 1 + 200 * (page - 1)  # 201. 241.  281. 321  361
            end = 200 * (page - 1) + (i - page + 1) * 40  # 240 280 320  360  400   440. 3*40
            ts = get_timestamp()
            url = f'https://tieba.baidu.com/photo/g/bw/picture/list?kw=%E6%9E%97%E9%9D%92%E9%9C%9E&alt=jview&rn=200&tid=2266120243&pn={page}&ps={start}&pe={end}&info=1&_={ts}'
            maindown(url, '青霞仙子')
            time.sleep(5)

以上是关于爬虫实例林青霞女神照片爬取——百度贴吧的主要内容,如果未能解决你的问题,请参考以下文章

Python网络爬虫四通过关键字爬取多张百度图片的图片

Python网络爬虫四多线程爬取多张百度图片的图片

实例练习:正则表达式爬取百度贴吧照片

Python爬虫实例爬取百度贴吧帖子中的图片

python实现爬取30页百度校园女神图片!

爬取近千张女神赫本的美照,做成网站并给其中的黑白照片上色,好玩!