使用 Python 的请求发送 ASP.net POST

Posted

技术标签:

【中文标题】使用 Python 的请求发送 ASP.net POST【英文标题】:Sending an ASP.net POST with Python's Requests 【发布时间】:2014-09-18 12:02:58 【问题描述】:

我正在使用 Python 的请求模块抓取一个旧的 ASP.net 网站。

我花了 5 个多小时试图弄清楚如何模拟这个 POST 请求,但无济于事。按照我在下面的方式进行操作,我基本上会收到一条消息,说“没有项目与此项目参考匹配。”

任何帮助将不胜感激 - 这是请求和我的代码,出于简洁和/或隐私的考虑,对一些内容进行了修改:

我自己的代码:

import requests

# Scraping the item number from the website, I have confirmed this is working.

#Then use the newly acquired item number to request the data.
item_url = http://www.example.com/EN/items/Pages/yourrates.aspx?vr= + item_number[0]
viewstate = r'/wEPD...' # Truncated for brevity.

# Create the appropriate request and payload.
payload = "vr": int(item_number[0])

item_request_body = 
        "__SPSCEditMenu": "true",
        "MSOWebPartPage_PostbackSource": "",
        "MSOTlPn_SelectedWpId": "",
        "MSOTlPn_View": 0,
        "MSOTlPn_ShowSettings": "False",
        "MSOGallery_SelectedLibrary": "",
        "MSOGallery_FilterString": "",
        "MSOTlPn_Button": "none",
        "__EVENTTARGET": "",
        "__EVENTARGUMENT": "",
        "MSOAuthoringConsole_FormContext": "",
        "MSOAC_EditDuringWorkflow": "",
        "MSOSPWebPartManager_DisplayModeName": "Browse",
        "MSOWebPartPage_Shared": "",
        "MSOLayout_LayoutChanges": "",
        "MSOLayout_InDesignMode": "",
        "MSOSPWebPartManager_OldDisplayModeName": "Browse",
        "MSOSPWebPartManager_StartWebPartEditingName": "false",
        "__VIEWSTATE": viewstate,
        "keywords": "Search our site",
        "__CALLBACKID": "ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22",
        "__CALLBACKPARAM": "startvr"
    

# Write the appropriate headers for the property information.
item_request_headers = 
    "Host": home_site,
    "Connection": "keep-alive",
    "Content-Length": len(encoded_valuation_request),
    "Cache-Control": "max-age=0",
    "Origin": home_site,
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button",
    "Accept": "*/*",
    "Referer": valuation_url,
    "Accept-Encoding": "gzip,deflate,sdch",
    "Accept-Language": "en-US,en;q=0.8"


    response = requests.post(url=item_url, params=payload, data=item_request_body, headers=item_request_headers)
    print response.text

Chrome 告诉我的请求是什么样的:

Remote Address:202.55.96.131:80
Request URL:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
Request Method:POST
Status Code:200 OK

Request Headers
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:21501
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie:__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button
Host:www.site.com
Origin:www.site.com
Referer:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36

Query String Parameters
vr:123456789

Form Data
__SPSCEditMenu:true
MSOWebPartPage_PostbackSource:
MSOTlPn_SelectedWpId:
MSOTlPn_View:0
MSOTlPn_ShowSettings:False
MSOGallery_SelectedLibrary:
MSOGallery_FilterString:
MSOTlPn_Button:none
__EVENTTARGET:
__EVENTARGUMENT:
MSOAuthoringConsole_FormContext:
MSOAC_EditDuringWorkflow:
MSOSPWebPartManager_DisplayModeName:Browse
MSOWebPartPage_Shared:
MSOLayout_LayoutChanges:
MSOLayout_InDesignMode:
MSOSPWebPartManager_OldDisplayModeName:Browse
MSOSPWebPartManager_StartWebPartEditingName:false
__VIEWSTATE:/wEPD...(Omitted for length)
keywords:Search our site
__CALLBACKID:ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22
__CALLBACKPARAM:startvr

【问题讨论】:

不确定是否有帮助,但我认为您的 item_url 目前构造错误,它不是字符串。 哦,是的,当然 - 没注意到,但这不是我的问题,这是因为我正在重新格式化内容以排除实际的 URL :) 谢谢你的现场! EventViewState 验证,除了提到的(在下面的回答中)可能的session 都是可能的...... 【参考方案1】:

你的请求参数太多,应该设置content-type、content-length、host、origin或connection headers; 将这些留给requests 设置

您还加倍了 url 参数;要么手动将 vr 参数添加到 URL

很可能 POST 正文中的某些参数是由与会话绑定的 ASP 应用程序生成的。我将使用带有Session object 和valuation_url 的GET 请求,解析该页面中的表单以提取__CALLBACKID 参数。然后,请求会话将存储服务器设置的任何 cookie 并重用它们:

item_request_headers = 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Accept": "*/*",
    "Accept-Encoding": "gzip,deflate,sdch",
    "Accept-Language": "en-US,en;q=0.8"

payload = "vr": int(item_number[0])

session = requests.Session(headers=item_request_headers)

# Get form page
form_response = session.get(validation_url, params=payload) 

# parse form page; BeautifulSoup could do this for example
soup = BeautifulSoup(form_response.content)
callbackid = soup.select('input[name=__CALLBACKID]')[0]['value']

item_request_body = 
    "__SPSCEditMenu": "true",
    "MSOWebPartPage_PostbackSource": "",
    "MSOTlPn_SelectedWpId": "",
    "MSOTlPn_View": 0,
    "MSOTlPn_ShowSettings": "False",
    "MSOGallery_SelectedLibrary": "",
    "MSOGallery_FilterString": "",
    "MSOTlPn_Button": "none",
    "__EVENTTARGET": "",
    "__EVENTARGUMENT": "",
    "MSOAuthoringConsole_FormContext": "",
    "MSOAC_EditDuringWorkflow": "",
    "MSOSPWebPartManager_DisplayModeName": "Browse",
    "MSOWebPartPage_Shared": "",
    "MSOLayout_LayoutChanges": "",
    "MSOLayout_InDesignMode": "",
    "MSOSPWebPartManager_OldDisplayModeName": "Browse",
    "MSOSPWebPartManager_StartWebPartEditingName": "false",
    "__VIEWSTATE": viewstate,
    "keywords": "Search our site",
    "__CALLBACKID": callbackid,
    "__CALLBACKPARAM": "startvr"


item_url = 'http://www.example.com/EN/items/Pages/yourrates.aspx'

response = session.post(url=item_url, params=payload, data=item_request_body,
                        headers='Referer': form_response.url)

会话处理标头(设置用户代理和接受参数),只有在与会话的 POST 上,我们才会添加引荐来源标头。

【讨论】:

超级有帮助,Martijn,谢谢!我仍在努力解决问题,但一旦我完成实施和测试解决方案,我一定会确认:) 另外,你知道我将如何编码这种类型的东西吗? __CALLBACKID=ctl00%24SPWebPartManager1%24g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22 在回调中给了我一个错误,大概是由于不寻常的百分号。 包含 decoded 值;将编码留给requests%24 是一个编码的 $ 例如。 你太有帮助了 Martijn!再次感谢!效果很好,现在我只需要通过 BeautifulSoup 自动检索一些信息。 @DavidK.:你可以将requests和BeautifulSoup与robobrowser结合起来;它也会帮助您填写表格。

以上是关于使用 Python 的请求发送 ASP.net POST的主要内容,如果未能解决你的问题,请参考以下文章

在 Asp.Net Core 中使用 Swagger 在请求中未发送授权承载令牌

如何使用asp.net核心在action方法中接收通过ajax请求发送的文件和整数

如何使用ASP.Net发送HTTP请求并且获取返回的XML

在 DataTable Delete 操作中多次发送 Ajax 请求 - ASP.NET、JQuery、Ajax

asp.net后台 怎么发送http请求?

asp.net MVC3 中获取发送请求(ajax或ashx)的源地址,即浏览器地址栏上的地址,不是请求的地址