Python爬虫之Scrapy框架系列(17)——实战某代码托管平台登录FormRequest类

Posted 孤寒者

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python爬虫之Scrapy框架系列(17)——实战某代码托管平台登录FormRequest类相关的知识,希望对你有一定的参考价值。

目录:

1. 分析:



对比分析Form Data里的参数:

commit: Sign in
authenticity_token: 6P_cHe4nysQ8-i6fBzdJZFONOynZOAur6ps0884kMSPkXxgNsBIBE2bnP1IKKQABJM1vED4o0SGPBC5m3nSJZw
login: 1535744876@qq.com
password: 555
webauthn-support: supported
webauthn-iuvpaa-support: supported
return_to: https://github.com/login
allow_signup: 
client_id: 
integration: 
required_field_6d6f: 
timestamp: 1672997390577
timestamp_secret: 0a0c71b52147f5150cd1a63705691ca033e69ac2a58b872e42225f7146893dea
commit: Sign in
authenticity_token: 8jLR-7RV6lnl6GeG-pIYqYpLGXkc22imvcSW2jFl5ogm3t_MJQC4UIxWNlrawKGatNocfrvZWAIPTIIESCS5lw
login: 1535744876@qq.com
password: 222
webauthn-support: supported
webauthn-iuvpaa-support: supported
return_to: https://github.com/login
allow_signup: 
client_id: 
integration: 
required_field_5097: 
timestamp: 1672998098316
timestamp_secret: 8c45a1eb0ff77f65fb78375e683d9a27dd5e2202fb906f5693595286d578df6c

发现有以下参数是变化的:

  • authenticity_token
  • required_field_5097
  • timestamp
  • timestamp_secret

2. 分析上述变化的参数如何得到:

一般获取参数有两种方法:

  1. 动态拼接,这就需要分析JS(本小型实战不需要);
  2. 从前几级的页面获取,即请求前几级页面。

分析后发现上述四个参数在上级页面中都有:

3. scrapy项目编写:

3.1 创建项目及爬虫文件:

# 一、创建scrapy项目:
scrapy startproject LoginGithub

# 二、创建爬虫文件:
scrapy genspider ghlogin github.com/login
  • 通过scrapy shell匹配参数value:

①编写爬虫文件:

import scrapy

from .new_dir.pw import P

class GhloginSpider(scrapy.Spider):
    name = 'ghlogin'
    allowed_domains = ['github.com']
    start_urls = ['https://github.com/login']

    def parse(self, response):
        authenticity_token = response.xpath('//input[@name="authenticity_token"]/@value').extract()[0]
        required_field_5097 = ''
        timestamp = response.xpath('//input[@name="timestamp"]/@value').extract()[0]
        timestamp_secret = response.xpath('//input[@name="timestamp_secret"]/@value').extract()[0]
        # 至此完成了参数的提取
        form_data = 
            "commit": "Sign in",
            "authenticity_token": authenticity_token,
            "login": "1915344876@qq.com",
            "password": P,
            "webauthn-support": "supported",
            "webauthn-iuvpaa-support": "supported",
            "return_to": "https://github.com/login",
            "allow_signup": "",
            "client_id": "",
            "integration": "",
            "required_field_5097": "",
            "timestamp": timestamp,
            "timestamp_secret": timestamp_secret,
        
        yield scrapy.FormRequest(url="https://github.com/session", callback=self.verify_login, formdata=form_data)

    def verify_login(self, response):
        if "Join GitHub Global Campus!" in response.text:
            print("github登录成功~")
  • 密码在这:
  • 别忘了关robots协议,开请求头~

②效果—登陆成功:

Pycharm骚操作拓展:

如何快速给形如下述的多组key: value形式的数据的key和value都加上双引号?

commit: Sign in
authenticity_token: 8jLR-7RV6lnl6GeG-pIYqYpLGXkc22imvcSW2jFl5ogm3t_MJQC4UIxWNlrawKGatNocfrvZWAIPTIIESCS5lw
login: 1915344876@qq.com
password: 222
webauthn-support: supported
webauthn-iuvpaa-support: supported
return_to: https://github.com/login
allow_signup: 
client_id: 
integration: 
required_field_5097: 
timestamp: 1672998098316
timestamp_secret: 8c45a1eb0ff77f65fb78375e683d9a27dd5e2202fb906f5693595286d578df6c
  • CV到一个空.py文件里,Ctrl+r:
    - Replace All即可。效果如下:
"commit": "Sign in",
"authenticity_token": "8jLR-7RV6lnl6GeG-pIYqYpLGXkc22imvcSW2jFl5ogm3t_MJQC4UIxWNlrawKGatNocfrvZWAIPTIIESCS5lw",
"login": "1915344876@qq.com",
"password": "222",
"webauthn-support": "supported",
"webauthn-iuvpaa-support": "supported",
"return_to": "https://github.com/login",
"allow_signup": "",
"client_id": "",
"integration": "",
"required_field_5097": "",
"timestamp": "1672998098316",
"timestamp_secret": "8c45a1eb0ff77f65fb78375e683d9a27dd5e2202fb906f5693595286d578df6c",

以上是关于Python爬虫之Scrapy框架系列(17)——实战某代码托管平台登录FormRequest类的主要内容,如果未能解决你的问题,请参考以下文章

Python爬虫之Scrapy框架系列(10)——Scrapy选择器selector

Python爬虫之Scrapy框架系列(10)——Scrapy选择器selector

Python爬虫之Scrapy框架系列(16)——深入剖析request和response类

Python爬虫之Scrapy框架系列(16)——深入剖析request和response类

Python爬虫之Scrapy框架系列——创建并运行你的第一个Scrapy demo项目

Python爬虫之Scrapy框架系列——创建并运行你的第一个Scrapy demo项目