模拟ajax请求爬取微博
Posted 宋讼颂
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了模拟ajax请求爬取微博相关的知识,希望对你有一定的参考价值。
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2018/9/26 10:26 # @Author : Sa.Song # @Desc : 抓取崔庆才微博, 模拟ajax请求 # @File : weiBo.py # @Software: PyCharm import requests from urllib.parse import urlencode from pyquery import PyQuery as pq headers = { ‘Accept‘:‘application/json, text/plain, */*‘, ‘Referer‘:‘https://m.weibo.cn/u/2830678474‘, ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/69.0.3497.100 Safari/537.36‘, ‘X-Requested-With‘:‘XMLHttpRequest‘, } base_url = ‘https://m.weibo.cn/api/container/getIndex?‘ def get_page(page): #page是分页 parms = { ‘type‘:‘uid‘, ‘value‘:‘2830678474‘, ‘containerid‘:‘1076032830678474‘, ‘page‘:page } url = base_url + urlencode(parms) try: response = requests.get(url=url, headers=headers) if response.status_code == 200: return response.json() except Exception as e: print(‘报错:‘, e) def parse_message(json): weibo_data = [] if json: items = json.get(‘data‘).get(‘cards‘) for item in items: item = item.get(‘mblog‘) if item == None: continue else: weibo_message = {} weibo_message[‘id‘] = item.get(‘id‘) weibo_message[‘text‘] = pq(item.get(‘text‘)).text() weibo_data.append(weibo_message) print(weibo_data) if __name__ == ‘__main__‘: for i in range(100): json = get_page(i) parse_message(json)
以上是关于模拟ajax请求爬取微博的主要内容,如果未能解决你的问题,请参考以下文章