如何使用Scrapy制作Twitter Crawler？ [关闭]

Posted 2021-04-05

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何使用Scrapy制作Twitter Crawler？ [关闭]相关的知识，希望对你有一定的参考价值。

我曾尝试使用Scrapy从Pinterest这样的网站中抓取数据，这些网站不需要登录会话进行数据抓取，但是如何使用Scrapy来抓取和抓取Twitter，因为访问Twitter粉丝和我们需要首先登录的其他数据。

答案

登录Twitter并获取某人的关注页面使用Python库请求的示例：

import requests

url = "https://twitter.com/login"
payload = { 'session[username_or_email]': account, 
            'session[password]': password}
r = requests.post(url, data=payload)

最好添加浏览器的标头来请求查询，以便Twitter服务器将蜘蛛视为浏览器用户。

# You need to fill the area below after checking the header in your browser
header = {
        'Host': 'twitter.com',
        'User-Agent': ,
        'Accept': ,
        'Accept-Language': ,
        'Accept-Encoding': ,
        'X-Requested-With': ,
        "Cookie": ",
        'Connection': }
url = 'http://twitter.com/%s/followers'%(someone)
p = requests.get(url, headers=headers)

然后你得到页面，你可以通过其他东西解析页面，如BS4，刮或任何东西。

另一答案

到目前为止，我已经看到两个针对Twitter的Scrapy库：

scrapy-twitter - 使用Twitter API并从每条推文中获取更多数据
TweetScraper - 没有Twitter API，但它具有强大的查询语言

以上是关于如何使用Scrapy制作Twitter Crawler？ [关闭]的主要内容，如果未能解决你的问题，请参考以下文章