python爬虫 -掘金

Posted brady-wang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python爬虫 -掘金相关的知识,希望对你有一定的参考价值。

import json
from time import sleep

import requests

url = "https://web-api.juejin.im/query"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
    "Referer": "https://juejin.im/",
    "X-Agent": "Juejin/Web",
    "Content-Type": "application/json",

}


def get_content(after=\'\'):
    info = {"operationName": "", "query": "", "variables": {"first": 20, "after": after, "order": "POPULAR"},
            "extensions": {"query": {"id": "21207e9ddb1de777adeaca7a2fb38030"}}}
    resp = requests.post(url, headers=headers, data=json.dumps(info))
    content = resp.content.decode(\'utf-8\')
    content = json.loads(content)

    edges = content[\'data\'][\'articleFeed\'][\'items\'][\'edges\']
    pageInfo = content[\'data\'][\'articleFeed\'][\'items\'][\'pageInfo\']
    return edges, pageInfo


def getList(edges):
    tmp = []
    for item in edges:
        one = {}
        node = item[\'node\']
        one[\'title\'] = node[\'title\']
        # one[\'links\'] = node[\'originalUrl\']
        # one[\'content\'] = node[\'content\']
        tmp.append(one)

    return tmp


data = []

content = get_content()
edges = content[0]
pageInfo = content[1]

tmpList = getList(edges)
#data = data + tmpList
print(tmpList)
while (pageInfo[\'hasNextPage\']):
    content = get_content(pageInfo[\'endCursor\'])
    edges = content[0]
    pageInfo = content[1]
    tmpList = getList(edges)
    #data = data + tmpList
    print(tmpList)
    sleep(2)

 

以上是关于python爬虫 -掘金的主要内容,如果未能解决你的问题,请参考以下文章

Python 爬虫的入门教程都有哪些值得推荐的?

Python为啥叫爬虫?

python爬虫最全总结

python爬虫最全总结

Python为啥会被叫爬虫?

Python爬虫怎么挣钱?解析Python爬虫赚钱方式