知乎图片下载

Posted pau1fang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了知乎图片下载相关的知识,希望对你有一定的参考价值。

抓取一波知乎表情图呀,表情来源于知乎某个提问,地址为 https://www.zhihu.com/question/311745535.

import requests
import re
import os

class CrawlImg:
    def __init__(self):
        self.question_number = 311745535
        self.url = fhttps://www.zhihu.com/api/v4/questions/{self.question_number}/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset=0&platform=desktop&sort_by=default
        self.headers = self.get_headers()

    def get_headers(self):
        return {
            user-agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/83.0.4103.97 Safari/537.36,
            referer: fhttps://www.zhihu.com/question/{self.question_number},
            }

    def get_img_info(self, url):
        response = requests.get(url=url, headers=self.headers)
        if response.status_code==200:
            content = response.json()
            data = content.get(data)
            for answer in data:
                author_name = re.sub(r[\/:*?"<>|
。,.? ]+, ‘‘, answer.get(author).get(name))
                imgs = re.findall(data-actualsrc="(.*?)"/>, answer.get(content))
                for img in imgs:
                    yield {"author": author_name, "img_url": img}
            is_end = content.get(paging).get(is_end)
            while not is_end:
                next = content.get(paging).get(next)
                response_ = requests.get(url=next, headers=self.headers)
                if response.status_code==200:
                    content = response_.json()
                    for answer in content.get(data):
                        author_name = re.sub(r[\/:*?"<>|
。,.? ]+, ‘‘, answer.get(author).get(name))
                        imgs = re.findall(data-actualsrc="(.*?)"/>, answer.get(content))
                        for img in imgs:
                            yield {"author": author_name, "img_url": img}
                    is_end = content.get(paging).get(is_end)

    def download_img(self, img):
        url = img.get(img_url)
        author_name = img.get(author)
        filename = f./imgs/{author_name}
        if not os.path.exists(filename):
            os.mkdir(filename)
        try:
            response = requests.get(url)
            if response.status_code==200:
                img_name=re.search(v2-(.*)?, url).group(1)
                img_path = f{filename}/{img_name}
                with open(img_path, wb) as f:
                    f.write(response.content)
            else:
                print(download failed)
        except:
            return None

    def main(self):
        if not os.path.exists(./imgs):
            os.mkdir(./imgs)
        for img_info in self.get_img_info(self.url):
            self.download_img(img_info)


if __name__ == __main__:
    c = CrawlImg()
    c.main()

 

以上是关于知乎图片下载的主要内容,如果未能解决你的问题,请参考以下文章

下载知乎指定问题的答案并保存图片

人生苦短,我用Python--分分钟下载知乎美图给你看

片段中的Firebase数据不是持久的,会重新下载

16个必备的JavaScript代码片段

我无法从 firebase 获取下载网址()。请任何人帮助这是我的代码和错误。 (我正在使用片段)[重复]

CardView 不在披萨片段中显示图片