https://www.taptap.com/webapiv2/review/v2/by-app?app_id=168332&from=20&limit=10&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D41%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3Dandroid%26UID%3D59e7bddc-565f-4774-a945-0cbcc13c1ad3%26DT%3DPC

分析下url

from=20&limit=10 ：盲猜是从第20个开始，请求10个评论，多次拖动后验证了猜想，优秀

X-UA：X：在计算机中一般表示 extend，扩展的意思 UA：User Agent，有啥用我也不知道，直接拷出来就行了

我们试试修改from 和to 参数直接请求看看结果：

完美，看起来可以直接请求数据，因此我们可以直接修改参数，一直请求就可以了

2.2 词云的创建

词云的创建没什么难度，只是需要面对的是对评论数据的处理，然后调用wordCloud 就可以了，为了能有人更多的读取这篇文章，我给自己加了个难度，对一个美女图片进行了扣取，然后显示成美女的形状，看下原图

就问你稀不稀罕？

3、show you code

3.1 环境的安装

我的环境：

python：python3.8

OS : win7

IDE ：pycharm

拷贝下面几个命令到控制台，安装必要的包，注：一次使用一条哦

pip install matplotlib
pip install jieba
pip install wordcloud

3.2 爬取评论的代码

#!/usr/bin/env python
# encoding: utf-8


"""
#Author: 香菜
@time: 2021/9/2 0002 下午 9:07
"""
import requests

def get_content(url):
   try:
       user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'
       response = requests.get(url, headers={'User-Agent': user_agent})
       response.raise_for_status()  # 如果返回的状态码不是200， 则抛出异常;
       response.encoding = response.apparent_encoding  # 判断网页的编码格式， 便于respons.text知道如何解码;
   except Exception as e:
       print("爬取错误")
   else:
       print(response.url)
       print("爬取成功!")
       return response.json()
if __name__ == '__main__':
   # baseUrl = "https://www.taptap.com/app/168332/review"
   # https://www.taptap.com/webapiv2/review/v2/by-app?app_id=168332&from=30&limit=10&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D41%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3D929182cb-bba8-4a5d-aee8-3aaacb24dcc7%26DT%3DPC
   fromWhere = 0
   fileName = 'comments.txt'
   while fromWhere != -1:
       url = 'https://www.taptap.com/webapiv2/review/v2/by-app?app_id=168332&from='+str(fromWhere)+'&limit=10&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D41%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3D929182cb-bba8-4a5d-aee8-3aaacb24dcc7%26DT%3DPC'
       jsonData = get_content(url)
       for item in jsonData['data']['list']:
           comment = item['moment']['extended_entities']['reviews']
           for c in comment:
               with open(fileName, 'a+', encoding='utf-8') as f:
                   f.write(c['contents']['text'])

       fromWhere += 10

       if fromWhere >50:
           fromWhere = -1

注：我将下载的评论写入到了文本文件comments.txt中，

我只爬取了50条的评论，没有爬取更多的内容，你可以修改参数进行爬取。

3.3 词云的制作

#!/usr/bin/env python
# encoding: utf-8
from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud, STOPWORDS

"""
#Author: 香菜
@time: 2021/9/2 0002 下午 8:36
"""
# # 安装：pip install matplotlib
# # 安装：pip install jieba
# # 安装pip install wordcloud
if __name__ == '__main__':
   # 1、下载数据
   # 2、整理数据
   # 3、画词云
   # !/usr/bin/Python
   # -*- coding: utf-8 -*-
   ###当前文件路径
   d = path.dirname(__file__)

   # 读取文本
   file = open(path.join(d, 'comments.txt'),encoding='utf-8').read()
   ##进行分词
   default_mode = jieba.cut(file)
   text = " ".join(default_mode)
   alice_mask = np.array(Image.open(path.join(d, "0.jpeg")))
   # stop = open('baidu_stop.txt', 'r+', encoding='utf-8')
   # 用‘\\n’去分隔读取，返回一个一维数组
   stopword = set(map(str.strip, open('baidu_stop.txt',encoding='utf-8').readlines()))
   stopwords = set(stopword)
   wc = WordCloud(
       # 设置字体，不指定就会出现乱码,这个字体文件需要下载
       font_path=r'c:\\windows\\fonts\\simsun.ttc',
       background_color="white",
       max_words=2000,
       mask=alice_mask,
       stopwords=stopwords)
   # 生成词云
   wc.generate(text)

   # 放到图片中
   wc.to_file(path.join(d, "香菜.jpg"))

   # show
   plt.imshow(wc, interpolation='bilinear')
   plt.axis("off")
   plt.figure()
   plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
   plt.axis("off")
   plt.show()
   pass