6.简单提取小红书app数据保存txt-2
Posted 五杀摇滚小拉夫
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了6.简单提取小红书app数据保存txt-2相关的知识,希望对你有一定的参考价值。
对页面信息进行简单抓取:
需要注意的问题 :
auth-sign 和 auth 都是有一定的时效性,还有url原url是https这里要改为http请求。
这参数的问题需要通过mitmdump去获取请求的具体参数并将之取出,不用手动去截获分析http请求和响应,写好请求和相应的处理逻辑,通过python实现二次操作。
后期通过appium模拟人为操作去滑动请求刷新界面,得到相应再做处理。
import requests def main(): headers = { "charset":"utf-8", "Accept-Encoding":"gzip", "referer":"https://servicewechat.com/wxffc08ac7df482a27/117/page-frame.html", "authorization":"5bda7657a4ce660001f7eed8", "auth":"eyJoYXNoIjoibWQ0IiwiYWxnIjoiSFMyNTYiLCJ0eXAiOiJKV1QifQ.eyJzaWQiOiI0M2RkNGY2YS01NTk1LTRjNGEtYTkyMi05ODEzNjdiMTlmMTEiLCJleHBpcmUiOjE1NDExMzAyNjJ9.9AC8VBcXiBG48vHa-LLgVEWOnloTdQvNWzYAyvqGnMA", "content-type":"application/json", "auth-sign":"c475525b214bb5d9ae431ac029cb9b50", "User-Agent":"Mozilla/5.0 (Linux; android 7.1.2; MI 5X Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/64.0.3282.137 Mobile Safari/537.36 MicroMessenger/6.7.3.1360(0x26070336) NetType/WIFI Language/zh_CN Process/appbrand2", "Host":"www.xiaohongshu.com", "Connection":"Keep-Alive", } # url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2&cursor_score=&sid=session.1540996623416187718" url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2&cursor_score=1541067389.9550&sid=session.1540996623416187718" datas = requests.get(url= url, headers=headers ).json() data = datas[\'data\'] # print(data) for i in data: print(i) # print(i[\'title\']) # print(i[\'share_link\']) title = \'标题: \' + i[\'mini_program_info\'][\'share_title\'] print(title) link_url = \'链接: \' + i[\'share_link\'] print(link_url) b_picture = \'封面图片: \'+ i[\'mini_program_info\'][\'thumb\'] print(b_picture) type = \'类型: \' + i[\'type\'] print(type) level = \'级别: \' + str(i[\'level\']) print(level) h_picture = \'用户头像: \' + i[\'user\'][\'images\'] print(h_picture) username = \'用户名: \' + i[\'user\'][\'nickname\'] print(username) user_id = \'userid: \' + i[\'user\'][\'userid\'] print(user_id) zan = \'喜欢点心: \' + str(i[\'likes\']) print(zan) # 以追加的方式及打开一个文件,文件指针放在文件结尾,追加读写! with open(\'text\', \'a\', encoding=\'utf-8\')as f: f.write(\'\\n\'.join([title,link_url,b_picture,type,level,h_picture,username,user_id,zan])) f.write(\'\\n\' + \'=\' * 100 + \'\\n\') if __name__ == "__main__": main()
保存本地
字段信息:
标题: 王者荣耀——貂蝉~仲夏夜之梦 游戏角色貂蝉皮肤印象妆容 主色
链接: https://www.xiaohongshu.com/discovery/item/5bc0b2bf910cf646cc1087aa
封面图片: http://ci.xiaohongshu.com/161f03cb-0cf6-355f-b178-712a928a7720?imageView2/2/w/540/format/jpg
类型: normal
级别: 4
用户头像: https://img.xiaohongshu.com/avatar/5bb1047b0fd0590001997f83.jpg@80w_80h_90q_1e_1c_1x.jpg
用户名: zanleo
userid: 582c5f8982ec393b5ec866ba
喜欢点心: 233
====================================================================================================
标题:以上是关于6.简单提取小红书app数据保存txt-2的主要内容,如果未能解决你的问题,请参考以下文章