使用 python 登录 SAML/Shibboleth 认证服务器

Posted 2023-02-15

技术标签:

【中文标题】使用 python 登录 SAML/Shibboleth 认证服务器【英文标题】：Logging into SAML/Shibboleth authenticated server using python 【发布时间】：2013-05-06 22:48:49 【问题描述】：

我正在尝试通过 python 登录我大学的服务器，但我完全不确定如何生成适当的 HTTP POST、创建密钥和证书以及我可能不熟悉的过程的其他部分必须遵守 SAML 规范。我可以使用浏览器登录，但我希望能够使用 python 登录和访问服务器中的其他内容。

供参考，here is the site

我尝试使用 mechanize 登录（选择表单、填充字段、通过 mechanize.Broswer.submit() 单击提交按钮控件等）无济于事；登录站点每次都会被吐出。

此时，我愿意以最适合该任务的任何语言实施解决方案。基本上，我想以编程方式登录到 SAML 身份验证的服务器。

【问题讨论】：

【参考方案1】：

如果一切都失败了，我建议在“headfull”模式下使用 Selenium 的 webdriver（即会打开一个浏览器窗口，允许输入用户名、密码和任何其他必要的登录信息），这样可以轻松访问目标网站，即使您的表单比标准的“用户名”和“密码”组合更复杂，并且您不确定如何填写其他答案中提到的 br.form 部分。

from selenium import webdriver
import time

DRIVER_PATH = r'C:/INSERT_YOUR_PATH_HERE/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://moodle.tau.ac.il/login/index.php') # This is the login screen

一旦你这样做了，你就可以创建一个循环来检查你是否到达了你的目标 URL - 如果是，你就进入了！这段代码对我有用；我的目标是访问我大学的课程网站 Moodle 并自动下载所有 PDF。

targetUrl = False
timeElapsed = 0

def downloadAllPDFs():         # Or any other function you'd like, the point is that 
    print("Access Granted!")   # you now have access to the html. 

while not targetUrl and timeElapsed < 60:
    time.sleep(1)
    timeElapsed += 1
    if driver.current_url == r"https://moodle.tau.ac.il/my/": # The site you're trying to login to.
        downloadAllPDFs()
        targetUrl = True

【讨论】：

【参考方案2】：

我按照接受的答案编写了这段代码。这在两个单独的项目中对我有用

import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib


cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)

br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


br.open("The URL goes here")

br.select_form(nr=0)

br.form['username'] = 'Login Username'
br.form['password'] = 'Login Password'
br.submit()

br.select_form(nr=0)
br.submit()

response = br.response().read()
print response

【讨论】：

【参考方案3】：

虽然已经回答，但希望这对某人有所帮助。我的任务是从 SAML 网站下载文件，并从 Stéphane Bruckert 的回答中获得了帮助。

如果使用无头，则需要在登录所需的重定向间隔内指定等待时间。浏览器登录后，我使用其中的 cookie 并将其与请求模块一起使用以下载文件 - Got help from this。

这就是我的代码的样子-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()

【讨论】：

【参考方案4】：

基本上，您必须了解的是 SAML 身份验证过程背后的工作流程。不幸的是，似乎没有 PDF 可以很好地帮助您了解浏览器在访问受 SAML 保护的网站时会执行哪些操作。

也许你应该看看这样的东西：http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign 显然是这样的：http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language。特别要注意这个方案：

当我试图了解 SAML 的工作方式时，我做了什么，因为文档如此很差，我正在写下（是的！写在纸上）浏览器正在执行的所有步骤第一个到最后一个。我使用 Opera，将其设置为不允许自动重定向（300、301、302 响应代码等），并且也不启用 javascript。然后我写下了服务器发送给我的所有 cookie，做了什么，出于什么原因。

也许这太费劲了，但通过这种方式，我能够用 Java 编写一个适合这项工作的库，而且速度也快得令人难以置信。也许有一天我会公开发布它......

您应该了解的是，在 SAML 登录中，有两个参与者在扮演：IDP（身份提供者）和 SP（服务提供者）。

A.第一步：用户代理向 SP 请求资源

我很确定您从另一个页面点击“访问受保护网站”之类的内容到达了您在问题中引用的链接。如果您多加注意，您会注意到您点击的链接不是显示身份验证表单的链接。这是因为点击从 IDP 到 SP 的链接是 SAML 的一个步骤。第一步，其实。它允许 IDP 定义您是谁，以及您尝试访问其资源的原因。因此，基本上您需要做的是向您所关注的链接发出请求以访问 Web 表单，并获取它将设置的 cookie。您不会看到一个 SAMLRequest 字符串，编码为您将在链接后面找到的 302 重定向，发送到建立连接的 IDP。

我认为这就是你不能将整个过程机械化的原因。您只需连接到表单，无需进行身份识别！

B.第二步：填写表格并提交

这个很简单。请小心！现在设置的 cookie 与上面的 cookie 不同。您现在连接到一个完全不同的网站。这就是使用 SAML 的原因：不同的网站，相同的凭据。因此，您可能希望将这些由成功登录提供的身份验证 cookie 存储到不同的变量中。 IDP 现在将向您发送一个响应（在 SAMLRequest 之后）：SAMLResponse。您必须检测它获取登录结束的网页的源代码。事实上，这个页面是一个包含响应的大表单，在页面加载时，JS 中的一些代码会自动提交它。您必须获取页面的源代码，解析它以消除所有 HTML 无用的内容，并获取 SAMLResponse（加密）。

C.第三步：将响应发回给 SP

现在您已准备好结束该过程。您必须（通过 POST，因为您正在模拟表单）将上一步中获得的 SAMLResponse 发送到 SP。这样，它将提供访问您要访问的受保护内容所需的 cookie。

啊啊啊，大功告成！

再次，我认为您必须做的最宝贵的事情是使用 Opera 并分析 SAML 所做的所有重定向。然后，在您的代码中复制它们。这并不难，只要记住 IDP 与 SP 完全不同。

【讨论】：

您好 Gianluca，我遇到了类似的问题，非常希望避免实现相同的库。你有机会公开发布它吗？这是***.com/a/58598520/7831858 的代码。感谢@Gianluca 的帮助。这篇文章帮助我弄清楚了 SAML 登录。【参考方案5】：

我的大学页面 SAML 身份验证也遇到了类似的问题。

基本思想是使用requests.session 对象来自动处理大部分http 重定向和cookie 存储。但是，也有许多使用两种 javascript 的重定向，这会导致使用简单请求解决方案时出现多个问题。

我最终使用fiddler 来跟踪我的浏览器向大学服务器发出的每个请求，以填补我错过的重定向。它确实让这个过程变得更容易了。

我的解决方案远非理想，但似乎有效。

【讨论】：

【参考方案6】：

扩展上面 Stéphane Bruckert 的答案，一旦您使用 Selenium 获取 auth cookie，您仍然可以根据需要切换到请求：

import requests
cook = i['name']: i['value'] for i in driver.get_cookies()
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)

【讨论】：

【参考方案7】：

Mechanize 也可以完成这项工作，只是它不处理 Javascript。验证成功，但在主页上，我无法加载这样的链接：

<a href="#" id="formMenu:linknotes1"
   onclick="return oamSubmitForm('formMenu','formMenu:linknotes1');">

如果您需要 Javascript，最好使用Selenium with PhantomJS。否则，我希望你能从这个脚本中找到灵感：

#!/usr/bin/env python
#coding: utf8
import sys, logging
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

br = mechanize.Browser() # Browser
cj = cookielib.LWPCookieJar() # Cookie Jar
br.set_cookiejar(cj) 

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36')]

br.open('https://ent.unr-runn.fr/uPortal/')
br.select_form(nr=0)
br.submit()

br.select_form(nr=0)
br.form['username'] = 'myusername'
br.form['password'] = 'mypassword'
br.submit()

br.select_form(nr=0)
br.submit()

rs = br.open('https://ent.unr-runn.fr/uPortal/f/u1240l1s214/p/esup-mondossierweb.u1240l1n228/max/render.uP?pP_org.apache.myfaces.portlet.MyFacesGenericPortlet.VIEW_ID=%2Fstylesheets%2Fetu%2Fdetailnotes.xhtml')

# Eventually comparing the cookies with those on Live HTTP Header: 
print "Cookies:"
for cookie in cj:
    print cookie

# Displaying page information
print rs.read()
print rs.geturl()
print rs.info();

# And that last line didn't work
rs = br.follow_link(id="formMenu:linknotes1", nr=0)

【讨论】：

【参考方案8】：

带有无头 PhantomJS webkit 的 Selenium 将是您登录 Shibboleth 的最佳选择，因为它会为您处理 cookie 甚至 Javascript。

安装：

$ pip install selenium
$ brew install phantomjs

from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form

driver = webdriver.PhantomJS()
# On Windows, use: webdriver.PhantomJS('C:\phantomjs-1.9.7-windows\phantomjs.exe')

# Service selection
# Here I had to select my school among others 
driver.get("http://ent.unr-runn.fr/uPortal/")
select = Select(driver.find_element_by_name('user_idp'))
select.select_by_visible_text('ENSICAEN')
driver.find_element_by_id('IdPList').submit()

# Login page (https://cas.ensicaen.fr/cas/login?service=https%3A%2F%2Fshibboleth.ensicaen.fr%2Fidp%2FAuthn%2FRemoteUser)
# Fill the login form and submit it
driver.find_element_by_id('username').send_keys("myusername")
driver.find_element_by_id('password').send_keys("mypassword")
driver.find_element_by_id('fm1').submit()

# Now connected to the home page
# Click on 3 links in order to reach the page I want to scrape
driver.find_element_by_id('tabLink_u1240l1s214').click()
driver.find_element_by_id('formMenu:linknotes1').click()
driver.find_element_by_id('_id137Pluto_108_u1240l1n228_50520_:tabledip:0:_id158Pluto_108_u1240l1n228_50520_').click()

# Select and print an interesting element by its ID
page = driver.find_element_by_id('_id111Pluto_108_u1240l1n228_50520_:tableel:tbody_element')
print page.text

注意：

在开发过程中，使用 Firefox 预览你正在做的事情driver = webdriver.Firefox() 此脚本按原样提供并带有相应链接，因此您可以将每一行代码与页面的实际源代码进行比较（至少在登录之前）。

【讨论】：

嗨 Stéphane，我已经在 Java 中实现了这个 sn-p，如果我使用 Firefox Web 驱动程序，它可以正常工作。但是，当我使用 HTML 驱动程序时，它会返回身份验证重定向之一的页面文本。有没有办法告诉驱动程序在到达某个 URL 之前不要抓取页面源，或者添加某种延迟？感谢您提供的任何建议。我想说这就是我喜欢使用 Firefox 或 Chrome 等浏览器驱动程序的原因。你为什么不满足于此？我不知道 Java API 也不知道它与 HTML 驱动程序的使用，所以如果你真的想实现这一点，我建议你创建一个新问题，因为这有点超出了这里的主题。祝你好运！我正在为将成为移动应用程序的东西做一些概念验证，最终可能不会使用 Selenium，但这使得测试更容易。还是谢谢你！一旦你有了 auth cookie，你还可以切换到请求（使用 IME 更快、更容易）...我在下面添加了一个片段...【参考方案9】：

我编写了一个简单的 Python 脚本，能够登录到 Shibbolized 页面。

首先，我在 Firefox 中使用 Live HTTP Headers 来观察我所定位的特定 Shibbolized 页面的重定向。

然后我使用urllib.request 编写了一个简单的脚本（在 Python 3.4 中，但在 Python 2.x 中的 urllib2 似乎具有相同的功能）。我发现urllib.request 的默认重定向跟踪适用于我的目的，但是我发现将urllib.request.HTTPRedirectHandler 子类化并在这个子类（ShibRedirectHandler 类）中为所有 http_error_302 事件添加一个处理程序很好。

在这个子类中，我只是打印出参数的值（用于调试目的）；请注意，为了使用默认重定向跟随，您需要以 return HTTPRedirectHandler.http_error_302(self, args...) 结束处理程序（即调用基类 http_errror_302 处理程序。）

使urllib 与 Shibbolized 身份验证一起工作的最重要的组件是创建添加了 Cookie 处理的 OpenerDirector。您使用以下内容构建OpenerDirector：

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)
response = opener.open("https://shib.page.org")

这是一个完整的脚本，可以帮助您入门（您需要更改我提供的一些模拟 URL，并输入有效的用户名和密码）。这使用 Python 3 类；为了在 Python2 中实现这项工作，请将 urllib.request 替换为 urllib2 并将 urlib.parse 替换为 urlparse：

import urllib.request
import urllib.parse

#Subclass of HTTPRedirectHandler. Does not do much, but is very
#verbose. prints out all the redirects. Compaire with what you see
#from looking at your browsers redirects (using live HTTP Headers or similar)
class ShibRedirectHandler (urllib.request.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print (req)
        print (fp.geturl())
        print (code)
        print (msg)
        print (headers)
        #without this return (passing parameters onto baseclass) 
        #redirect following will not happen automatically for you.
        return urllib.request.HTTPRedirectHandler.http_error_302(self,
                                                          req,
                                                          fp,
                                                          code,
                                                          msg,
                                                          headers)

cookieprocessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(ShibRedirectHandler, cookieprocessor)

#Edit: should be the URL of the site/page you want to load that is protected with Shibboleth
(opener.open("https://shibbolized.site.example").read())

#Inspect the page source of the Shibboleth login form; find the input names for the username
#and password, and edit according to the dictionary keys here to match your input names
loginData = urllib.parse.urlencode('username':'<your-username>', 'password':'<your-password>')
bLoginData = loginData.encode('ascii')

#By looking at the source of your Shib login form, find the URL the form action posts back to
#hard code this URL in the mock URL presented below.
#Make sure you include the URL, port number and path
response = opener.open("https://test-idp.server.example", bLoginData)
#See what you got.
print (response.read())

【讨论】：

感谢您提供这些信息。你能分享你的整个剧本吗？这会为我节省很多时间！我将脚本添加到我的帖子中。这可能会让你开始。但是 Shibb 登录可能会有所不同。与您使用浏览器看到的结果进行比较。可能想要关闭 Javascript 以简化您的浏览器在后台执行的操作。非常感谢。必须把自己投入其中。我会及时通知您。告诉我进展如何。在提供用户名和密码之间可能有一个中间步骤。我尽了最大努力，但未能成功获得正确的 cookie（与 Live HTTP Headers 上显示的相比）。我终于成功登录并使用 Selenium 和 PhantomJS 获取所需的数据。这容易多了！无论如何，非常感谢您的帮助。【参考方案10】：

您可以找到here 更详细的 Shibboleth 身份验证过程说明。

【讨论】：

以上是关于使用 python 登录 SAML/Shibboleth 认证服务器的主要内容，如果未能解决你的问题，请参考以下文章