Python 爬虫 采集王者荣耀英雄皮肤

Posted lanxiaofang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python 爬虫 采集王者荣耀英雄皮肤相关的知识,希望对你有一定的参考价值。

一、环境使用

   Python 3.8

   Pycharm

二、模块使用

   requests ---> 数据请求模块 需要安装 pip install requests

   re  正则表达式 内置模块 不需要安装

   os  文件操作模块 内置模块 不需要安装  --> 自动创建文件夹 把每个英雄都自动创建对应文件


三、模块安装

win + R 输入cmd 输入安装命令 pip install 模块名 (如果你觉得安装速度比较慢, 你可以切换国内镜像源)

模块安装问题:

   - 如何安装python第三方模块:

       1. win + R 输入 cmd 点击确定, 输入安装命令 pip install 模块名 (pip install requests) 回车

       2. 在pycharm中点击Terminal(终端) 输入安装命令


   - 安装失败原因:
       - 失败一: pip 不是内部命令

           解决方法: 设置环境变量


       - 失败二: 出现大量报红 (read time out)

           解决方法: 因为是网络链接超时,  需要切换镜像源

               清华:https://pypi.tuna.tsinghua.edu.cn/simple

               阿里云:https://mirrors.aliyun.com/pypi/simple/

               中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/

               华中理工大学:https://pypi.hustunique.com/

               山东理工大学:https://pypi.sdutlinux.org/

               豆瓣:https://pypi.douban.com/simple/

               例如:pip3 install -i https://pypi.doubanio.com/simple/ 模块名


       - 失败三: cmd里面显示已经安装过了, 或者安装成功了, 但是在pycharm里面还是无法导入

           解决方法: 可能安装了多个python版本 (anaconda 或者 python 安装一个即可) 卸载一个就好

                   或者你pycharm里面python解释器没有设置好

四、配置pycharm里面的python解释器

 1. 选择file(文件) >>> setting(设置) >>> Project(项目) >>> python interpreter(python解释器)

        2. 点击齿轮, 选择add

        3. 添加python安装路径

五、pycharm如何安装插件

1. 选择file(文件) >>> setting(设置) >>> Plugins(插件)

2. 点击 Marketplace  输入想要安装的插件名字 比如:翻译插件 输入 translation / 汉化插件 输入 Chinese

3. 选择相应的插件点击 install(安装) 即可

4. 安装成功之后 是会弹出 重启pycharm的选项 点击确定, 重启即可生效

六、爬虫基本思路

(1). 数据来源分析

   1. 确定需求, 确定采集目标

   2. 通过开发者工具抓包分析, 分析我们想要数据内容来自于那个url地址

       - F12 或者 鼠标右键点击检查 选择 network(网络)  刷新网页

       - 去分析图片url地址是什么 ---> 选择 Img 可以查找图片url地址

   505 表示英雄ID

   2 皮肤第几个 ---> 通过皮肤名字对应他的皮肤链接

​    https://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/505/505-bigskin-2.jpg​

想要获取 瑶 皮肤数据

   1. https://pvp.qq.com/web201605/herodetail/505.shtml 发送请求

   2. 获取response响应数据

   3. 提取皮肤名字

   4. 构建 皮肤 url地址

   5. 保存数据

(2). 代码实现步骤

   1. 发送请求, 模拟浏览器对于url地址发送请求

   2. 获取数据, 获取服务器返回响应数据

   3. 解析数据, 提取我们想要内容, 皮肤名字

   4. 保存数据, 数据保存本地

采集所有英雄皮肤数据 ---> 获取所有英雄ID <都可以在目录或者列表页面获取>

七、完整代码

# 导入数据请求模块  ---> 第三方模块 需要 在cmd里面进行安装 pip install requests
import requests
# 导入正则模块 ---> 内置模块 不需要安装
import re
# 导入文件操作模块 ---> 内置模块 不需要安装
import os

# 确定网址
link = https://pvp.qq.com/web201605/js/herolist.json
# 模拟伪装浏览器 ---> 请求头
headers =
# user-agent 用户代理 表示浏览器基本身份标识
user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36

# 发送请求
json_data = requests.get(url=link, headers=headers).json()
# for循环遍历
for index in json_data:
# 字典键值对取值 根据冒号左边的内容[键],提取冒号右边的内容[值]
hero_id = index[ename]
hero_name = index[cname]
# 设定文件夹路径 相对路径
file = fimg\\\\hero_name\\\\
if not os.path.exists(file):
os.makedirs(file)
"""
1. 发送请求, 模拟浏览器对于url地址发送请求
- headers 字典数据类型, 构建完整键值对
- 请求头参数 可以直接在开发者工具复制粘贴
- 使用什么请求方法, 根据开发者工具来
"""
# 确定请求url地址
url = fhttps://pvp.qq.com/web201605/herodetail/hero_id.shtml
# 模拟伪装浏览器 ---> 请求头
headers =
# user-agent 用户代理 表示浏览器基本身份标识
user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36

# 发送请求 ---> <Response [200]> 响应对象: <>表示对象 response 响应回复 200 状态码 表示请求成功
response = requests.get(url=url, headers=headers)
# 乱码了 怎么办? ---> 你要根据网页编码来 response.encoding = gbk
# 自动识别编码
response.encoding = response.apparent_encoding
# 获取数据, 获取服务器返回响应数据 文本数据 print(response.text)
"""
解析数据 re正则 会1 不会2
re.findall() 从什么地方 去找什么数据
从 response.text 里面 去找 data-imgname="(.*?)"> 其中 (.*?) 就是我们要的数据
"""
title_list = re.findall(data-imgname="(.*?)">, response.text)[0]
# 鹿灵守心&0|森&0|遇见神鹿&71|时之祈愿&94|时之愿境&42
title_list = re.sub(&\\d+, , title_list).split(|)
print(title_list)
# for循环 for num in range(1, 6): len() 统计列表元素个数
for num in range(1, len(title_list) +1):
# 列表取值, 根据索引位置,索引位置从0开始计数
img_name = title_list[num-1]
# 构建图片url地址
img_url = fhttps://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/hero_id/hero_id-bigskin-num.jpg
print(img_name, img_url)
# 保存数据 ---> 发送请求 获取数据 二进制数据
img_content = requests.get(url=img_url, headers=headers).content
with open(file + img_name + .jpg, mode=wb) as f:
f.write(img_content)

Python

八、herolist.json

[
"ename": 105,
"cname": "廉颇",
"title": "正义爆轰",
"new_type": 0,
"hero_type": 3,
"skin_name": "正义爆轰|地狱岩魂",
"moss_id": 3627

,
"ename": 106,
"cname": "小乔",
"title": "恋之微风",
"new_type": 0,
"hero_type": 2,
"skin_name": "恋之微风|万圣前夜|天鹅之梦|纯白花嫁|缤纷独角兽",
"moss_id":3644
,
"ename": 107,
"cname": "赵云",
"title": "苍天翔龙",
"new_type": 0,
"hero_type": 1,
"hero_type2": 4,
"skin_name": "苍天翔龙|忍●炎影|未来纪元|皇家上将|嘻哈天王|白执事|引擎之心",
"moss_id":3661
,
"ename": 108,
"cname": "墨子",
"title": "和平守望",
"new_type": 0,
"hero_type": 2,
"hero_type2": 1,
"skin_name": "和平守望|金属风暴|龙骑士|进击墨子号",
"moss_id":3547
,
"ename": 109,
"cname": "妲己",
"title": "魅力之狐",
"pay_type": 11,
"new_type": 0,
"hero_type": 2,
"skin_name": "魅惑之狐|女仆咖啡|魅力维加斯|仙境爱丽丝|少女阿狸|热情桑巴",
"moss_id":3663
,
"ename": 110,
"cname": "嬴政",
"title": "王者独尊",
"new_type": 0,
"hero_type": 2,
"skin_name": "王者独尊|摇滚巨星|暗夜贵公子|优雅恋人|白昼王子",
"moss_id":3680
,
"ename": 111,
"cname": "孙尚香",
"title": "千金重弩",
"new_type": 0,
"hero_type": 5,
"skin_name": "千金重弩|火炮千金|水果甜心|蔷薇恋人|杀手不太冷|末日机甲|沉稳之力",
"moss_id":3577
,
"ename": 112,
"cname": "鲁班七号",
"title": "机关造物",
"new_type": 0,
"hero_type": 5,
"skin_name": "机关造物|木偶奇遇记|福禄兄弟|电玩小子|星空梦想",
"moss_id":3697
,
"ename": 113,
"cname": "庄周",
"title": "逍遥梦幻",
"new_type": 0,
"hero_type": 6,
"hero_type2": 3,
"skin_name": "逍遥幻梦|鲤鱼之梦|蜃楼王|云端筑梦师",
"moss_id":3594
,
"ename": 114,
"cname": "刘禅",
"title": "暴走机关",
"new_type": 0,
"hero_type": 6,
"hero_type2": 3,
"skin_name": "暴走机关|英喵野望|绅士熊喵|天才门将",
"moss_id":3714
,
"ename": 115,
"cname": "高渐离",
"title": "叛逆吟游",
"new_type": 0,
"hero_type": 2,
"skin_name": "叛逆吟游|金属狂潮|死亡摇滚",
"moss_id":3611
,
"ename": 116,
"cname": "阿轲",
"title": "信念之刃",
"new_type": 0,
"hero_type": 4,
"skin_name": "信念之刃|爱心护理|暗夜猫娘|致命风华|节奏热浪",
"moss_id":3731
,
"ename": 117,
"cname": "钟无艳",
"title": "野蛮之锤",
"new_type": 0,
"hero_type": 1,
"hero_type2": 3,
"skin_name": "野蛮之锤|生化警戒|王者之锤|海滩丽影",
"moss_id":3628
,
"ename": 118,
"cname": "孙膑",
"title": "逆流之时",
"new_type": 0,
"hero_type": 6,
"hero_type2": 2,
"skin_name": "逆流之时|未来旅行|天使之翼|妖精王",
"moss_id":3645
,
"ename": 119,
"cname": "扁鹊",
"title": "善恶怪医",
"new_type": 0,
"hero_type": 2,
"skin_name": "善恶怪医|救世之瞳|化身博士|炼金王",
"moss_id":3662
,
"ename": 120,
"cname": "白起",
"title": "最终兵器",
"new_type": 0,
"hero_type": 3,
"skin_name": "最终兵器|白色死神|狰|星夜王子",
"moss_id":3679
,
"ename": 121,
"cname": "芈月",
"title": "永恒之月",
"new_type": 0,
"hero_type": 2,
"hero_type2": 3,
"skin_name": "永恒之月|红桃皇后|大秦宣太后|重明",
"moss_id":3696
,
"ename": 123,
"cname": "吕布",
"title": "无双之魔",
"new_type": 0,
"hero_type": 1,
"hero_type2": 3,
"skin_name": "无双之魔|圣诞狂欢|天魔缭乱|末日机甲|猎兽之王",
"moss_id":3713
,
"ename": 124,
"cname": "周瑜",
"title": "铁血都督",
"new_type": 0,
"hero_type": 2,
"skin_name": "铁血都督|海军大将|真爱至上",
"moss_id":3784
,
"ename": 126,
"cname": "夏侯惇",
"title": "不羁之风",
"pay_type": 10,
"new_type": 0,
"hero_type": 3,
"hero_type2": 1,
"skin_name": "不羁之风|战争骑士|乘风破浪|无限飓风号",
"moss_id":3730
,
"ename": 127,
"cname": "甄姬",
"title": "洛神降临",
"pay_type": 10,
"new_type": 0,
"hero_type": 2,
"skin_name": "洛神降临|冰雪圆舞曲|花好人间|游园惊梦",
"moss_id":3747
,
"ename": 128,
"cname": "曹操",
"title": "鲜血枭雄",
"new_type": 0,
"hero_type": 1,
"skin_name": "鲜血枭雄|超能战警|幽灵船长|死神来了|烛龙",
"moss_id":3765
,
"ename": 129,
"cname": "典韦",
"title": "狂战士",
"new_type": 0,
"hero_type": 1,
"skin_name": "狂战士|黄金武士|穷奇",
"moss_id":3782
,
"ename": 130,
"cname": "宫本武藏",
"title": "剑圣",
"new_type": 0,
"hero_type": 1,
"skin_name": "剑圣|鬼剑武藏|未来纪元|万象初新|地狱之眼|霸王丸",
"moss_id":3799
,
"ename": 131,
"cname": "李白",
"title": "青莲剑仙",
"pay_type": 10,
"new_type": 0,
"hero_type": 4,
"skin_name": "青莲剑仙|范海辛|千年之狐|凤求凰|敏锐之力",
"moss_id":3816
,
"ename": 132,
"cname": "马可波罗",
"title": "远游之枪",
"new_type": 0,
"hero_type": 5,
"skin_name": "远游之枪|激情绿茵|逐梦之星",
"moss_id":3764
,
"ename": 133,
"cname": "狄仁杰",
"title": "断案大师",
"new_type": 0,
"hero_type": 5,
"skin_name": "断案大师|锦衣卫|魔术师|超时空战士|阴阳师",
"moss_id":3781
,
"ename": 134,
"cname": "达摩",
"title": "拳僧",
"new_type": 0,
"hero_type": 1,
"hero_type2": 3,
"skin_name": "拳僧|大发明家|拳王",
"moss_id":3798

,
"ename": 135,
"cname": "项羽",
"title": "霸王",
"new_type": 0,
"hero_type": 3,
"skin_name": "霸王|帝国元帅|苍穹之光|海滩派对|职棒王牌|霸王别姬|科学大爆炸",
"moss_id": 3815
,
"ename": 136,
"cname": "武则天",
"title": "女帝",
"new_type": 0,
"hero_type": 2,
"skin_name": "女帝|东方不败|海洋之心",
"moss_id": 3832
,
"ename": 139,
"cname": "老夫子",
"title": "万古长明",
"new_type": 0,
"hero_type": 1,
"skin_name": "万古长明|潮流仙人|圣诞老人|功夫老勺",
"moss_id": 3849
,
"ename": 140,
"cname": "关羽",
"title": "一骑当千",
"new_type": 0,
"hero_type": 1,
"skin_name": "一骑当千|天启骑士|冰锋战神|龙腾万里",
"moss_id": 3866
,
"ename": 141,
"cname": "貂蝉",
"title": "绝世舞姬",
"new_type": 0,
"hero_type": 2,
"hero_type2": 4,
"skin_name": "绝世舞姬|异域舞娘|圣诞恋歌|逐梦之音|仲夏夜之梦",
"moss_id": 3883
,
"ename": 142,
"cname": "安琪拉",
"title": "暗夜萝莉",
"new_type": 0,
"hero_type": 2,
"skin_name": "暗夜萝莉|玩偶对对碰|魔法小厨娘|心灵骇客|如懿",
"moss_id": 3900
,
"ename": 144,
"cname": "程咬金",
"title": "热烈之斧",
"new_type": 0,
"hero_type": 3,
"hero_type2": 1,
"skin_name": "热烈之斧|爱与正义|星际陆战队|华尔街大亨|功夫厨神",
"moss_id": 3917
,
"ename": 146,
"cname": "露娜",
"title": "月光之女",
"new_type": 0,
"hero_type": 1,
"hero_type2": 2,
"skin_name": "月光之女|哥特玫瑰|绯红之刃|紫霞仙子|一生所爱",
"moss_id": 3934
,
"ename": 148,
"cname": "姜子牙",
"title": "太古魔导",
"new_type": 0,
"hero_type": 2,
"skin_name": "太古魔导|时尚教父",
"moss_id": 3951
,
"ename": 149,
"cname": "刘邦",
"title": "双面君主",
"new_type": 0,
"hero_type": 3,
"skin_name": "双面君主|圣殿之光|德古拉伯爵",
"moss_id": 3978
,
"ename": 150,
"cname": "韩信",
"title": "国士无双",
"new_type": 0,
"hero_type": 4,
"skin_name": "国士无双|街头霸王|教廷特使|白龙吟|逐梦之影",
"moss_id": 3985
,
"ename": 152,
"cname": "王昭君",
"title": "冰雪之华",
"new_type": 0,
"hero_type": 2,
"skin_name": "冰雪之华|精灵公主|偶像歌手|凤凰于飞|幻想奇妙夜",
"moss_id": 4002
,
"ename": 153,
"cname": "兰陵王",
"title": "暗影刀锋",
"new_type": 0,
"hero_type": 4,
"skin_name": "暗影刀锋|隐刃|暗隐猎兽者",
"moss_id": 4019
,
"ename": 154,
"cname": "花木兰",
"title": "传说之刃",
"new_type": 0,
"hero_type": 1,
"hero_type2": 4,
"skin_name": "传说之刃|剑舞者|兔女郎|水晶猎龙者|青春决赛季|冠军飞将|瑞麟志",
"moss_id": 4036
,
"ename": 156,
"cname": "张良",
"title": "言灵之书",
"new_type": 0,
"hero_type": 2,
"skin_name": "言灵之书|天堂福音|一千零一夜|幽兰居士",
"moss_id": 4053
,
"ename": 157,
"cname": "不知火舞",
"title": "明媚烈焰",
"new_type": 0,
"hero_type": 2,
"hero_type2": 4,
"skin_name": "明媚烈焰",
"moss_id": 4070
,
"ename": 162,
"cname": "娜可露露",
"title": "鹰之守护",
"new_type": 0,
"hero_type": 4,
"skin_name": "鹰之守护",
"moss_id": 4087
,
"ename": 163,
"cname": "橘右京",
"title": "神梦一刀",
"new_type": 0,
"hero_type": 4,
"hero_type2": 1,
"skin_name": "神梦一刀",
"moss_id": 4104
,
"ename": 166,
"cname": "亚瑟",
"title": "圣骑之力",
"pay_type": 11,
"new_type": 0,
"hero_type": 1,
"hero_type2": 3,
"skin_name": "圣骑之力|死亡骑士|狮心王|心灵战警",
"moss_id": 4121
,
"ename": 167,
"cname": "孙悟空",
"title": "齐天大圣",
"new_type": 0,
"hero_type": 4,
"hero_type2": 1,
"skin_name": "齐天大圣|地狱火|西部大镖客|美猴王|至尊宝|全息碎影|大圣娶亲",
"moss_id": 4138
,
"ename": 168,
"cname": "牛魔",
"title": "精英酋长",
"new_type": 0,
"hero_type": 6,
"hero_type2": 3,
"skin_name": "精英酋长|西部大镖客|制霸全明星",
"moss_id": 4155
,
"ename": 169,
"cname": "后羿",
"title": "半神之弓",
"new_type": 0,
"hero_type": 5,
"skin_name": "半神之弓|精灵王|阿尔法小队|辉光之辰|黄金射手座",
"moss_id": 4172
,
"ename": 170,
"cname": "刘备",
"title": "仁德义枪",
"new_type": 0,
"hero_type": 1,
"skin_name": "仁德义枪|万事如意|纽约教父|汉昭烈帝",
"moss_id": 4189
,
"ename": 171,
"cname": "张飞",
"title": "禁血狂兽",
"new_type": 0,
"hero_type": 3,
"hero_type2": 6,
"skin_name": "禁血狂兽|五福同心|乱世虎臣",
"moss_id": 4206
,
"ename": 173,
"cname": "李元芳",
"title": "王都密探",
"pay_type": 10,
"new_type": 0,
"hero_type": 5,
"skin_name": "王都密探|特种部队|黑猫爱糖果|逐浪之夏",
"moss_id": 4223
,
"ename": 174,
"cname": "虞姬",
"title": "森之风灵",
"new_type": 0,
"hero_type": 5,
"skin_name": "森之风灵|加勒比小姐|霸王别姬|凯尔特女王",
"moss_id": 4240
,
"ename": 175,
"cname": "钟馗",
"title": "虚灵城判",
"new_type": 0,
"hero_type": 6,
"hero_type2": 2,
"skin_name": "虚灵城判|地府判官",
"moss_id": 4257
,
"ename": 177,
"cname": "成吉思汗",
"title": "苍狼末裔",
"new_type": 0,
"hero_type": 5,
"skin_name": "苍狼末裔|维京掠夺者",
"moss_id": 4274
,
"ename": 178,
"cname": "杨戬",
"title": "根源之目",
"new_type": 0,
"hero_type": 1,
"skin_name": "根源之目|埃及法老|永曜之星",
"moss_id": 4291
,
"ename": 183,
"cname": "雅典娜",
"title": "圣域余晖",
"new_type": 0,
"hero_type": 1,
"skin_name": "圣域余晖|战争女神|冰冠公主|神奇女侠",
"moss_id": 4308
,
"ename": 184,
"cname": "蔡文姬",
"title": "天籁弦音",
"new_type": 0,
"hero_type": 6,
"skin_name": "天籁弦音|蔷薇王座|舞动绿茵|奇迹圣诞",
"moss_id": 4325
,
"ename": 186,
"cname": "太乙真人",
"title": "炼金大师",
"new_type": 0,
"hero_type": 6,
"hero_type2": 3,
"skin_name": "炼金大师|圆桌骑士|饕餮|华丽摇滚",
"moss_id": 4342
,
"ename": 180,
"cname": "哪吒",
"title": "桀骜炎枪",
"new_type": 0,
"hero_type": 1,
"skin_name": "桀骜炎枪|三太子|逐梦之翼",
"moss_id": 4359
,
"ename": 190,
"cname": "诸葛亮",
"title": "绝代智谋",
"new_type": 0,
"hero_type": 2,
"skin_name": "绝代智谋|星航指挥官|黄金分割率|武陵仙君|掌控之力",
"moss_id": 4376
,
"ename": 192,
"cname": "黄忠",
"title": "燃魂重炮",
"new_type": 0,
"hero_type": 5,
"skin_name": "燃魂重炮|芝加哥教父",
"moss_id": 4393
,
"ename": 191,
"cname": "大乔",
"title": "沧海之曜",
"new_type": 0,
"hero_type": 6,
"skin_name": "沧海之曜|伊势巫女|守护之力|猫狗日记",
"moss_id": 4410
,
"ename": 187,
"cname": "东皇太一",
"title": "噬灭日蚀",
"new_type": 0,
"hero_type": 6,
"hero_type2": 3,
"skin_name": "噬灭日蚀|东海龙王|逐梦之光",
"moss_id": 4427
,
"ename": 182,
"cname": "干将莫邪",
"title": "淬命双剑",
"new_type": 0,
"hero_type": 2,
"skin_name": "淬命双剑|第七人偶|冰霜恋舞曲",
以上是关于Python 爬虫 采集王者荣耀英雄皮肤的主要内容,如果未能解决你的问题,请参考以下文章

12.奇怪知识 --Matlab爬虫获取王者荣耀英雄皮肤

python爬虫-20行代码爬取王者荣耀所有英雄图片,小白也轻轻松松

python爬虫-20行代码爬取王者荣耀所有英雄图片,小白也轻轻松松

教你用python爬取王者荣耀英雄皮肤图片,并将图片保存在各自英雄的文件夹中。(附源码)

王者荣耀五周年,爬取102个英雄+326款皮肤,分析上线时间

教你用python爬取王者荣耀英雄皮肤图片,并将图片保存在各自英雄的文件夹中。(附源码)