requests 源码分析

Posted STEM服务

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了requests 源码分析相关的知识,希望对你有一定的参考价值。

下面是使用requests的两种常见方式:

import requests
req = requests.get('https://httpbin.org/get') # 方式一

with requests.Session() as s:
s.get('https://httpbin.org/get') # 方式二
  • 方式一 调用顺序:调用requests.get(url)时,跳转到源码api文件的 request('get', url, params=params, **kwargs) 接着 是session文件的

with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)

也就是说第一种方式是第二种方式的进一步包装。每次请求都会建立一次session会话。

  • 方式二在一个session内的请求,有些数据是共享的,比如cookies,常见用法是爬虫脚本中,登录api后,需要保持登录cookie。

requests模板源码

Session类

进行http请求时,首先会通过Session类。因此下面我们先对这个类进行分析:
首先贴出源码,我们对这个类讲解,这个类东西还是挺多的,我们只对关键部分进行分析:

class Session(SessionRedirectMixin):
"""A Requests session.

Provides cookie persistence, connection-pooling, and configuration.

Basic Usage::

>>> import requests
>>> s = requests.Session()
>>> s.get('https://httpbin.org/get')
<Response [200]>

Or as a context manager::

>>> with requests.Session() as s:
... s.get('https://httpbin.org/get')
<Response [200]>
"""

__attrs__ = [
'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify',
'cert', 'prefetch', 'adapters', 'stream', 'trust_env',
'max_redirects',
]

def __init__(self):

#: A case-insensitive dictionary of headers to be sent on each
#: :class:`Request <Request>` sent from this
#: :class:`Session <Session>`.
self.headers = default_headers()

#: Default Authentication tuple or object to attach to
#: :class:`Request <Request>`.
self.auth = None

#: Dictionary mapping protocol or protocol and host to the URL of the proxy
#: (e.g. {'http': 'foo.bar:3128', 'http://host.name': 'foo.bar:4012'}) to
#: be used on each :class:`Request <Request>`.
self.proxies = {}

#: Event-handling hooks.
self.hooks = default_hooks()

#: Dictionary of querystring data to attach to each
#: :class:`Request <Request>`. The dictionary values may be lists for
#: representing multivalued query parameters.
self.params = {}

#: Stream response content default.
self.stream = False

#: SSL Verification default.
self.verify = True

#: SSL client certificate default, if String, path to ssl client
#: cert file (.pem). If Tuple, ('cert', 'key') pair.
self.cert = None

#: Maximum number of redirects allowed. If the request exceeds this
#: limit, a :class:`TooManyRedirects` exception is raised.
#: This defaults to requests.models.DEFAULT_REDIRECT_LIMIT, which is
#: 30.
self.max_redirects = DEFAULT_REDIRECT_LIMIT

#: Trust environment settings for proxy configuration, default
#: authentication and similar.
self.trust_env = True

#: A CookieJar containing all currently outstanding cookies set on this
#: session. By default it is a
#: :class:`RequestsCookieJar <requests.cookies.RequestsCookieJar>`, but
#: may be any other ``cookielib.CookieJar`` compatible object.
self.cookies = cookiejar_from_dict({})

# Default connection adapters.
self.adapters = OrderedDict()
self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())

super().__init__()

def __enter__(self):
return self

def __exit__(self, *args):
self.close()

def prepare_request(self, request):
"""Constructs a :class:`PreparedRequest <PreparedRequest>` for
transmission and returns it. The :class:`PreparedRequest` has settings
merged from the :class:`Request <Request>` instance and those of the
:class:`Session`.

:param request: :class:`Request` instance to prepare with this
session's settings.
:rtype: requests.PreparedRequest
"""
cookies = request.cookies or {}

# Bootstrap CookieJar.
if not isinstance(cookies, cookielib.CookieJar):
cookies = cookiejar_from_dict(cookies)

# Merge with session cookies
merged_cookies = merge_cookies(
merge_cookies(RequestsCookieJar(), self.cookies), cookies)

# Set environment's basic authentication if not explicitly set.
auth = request.auth
if self.trust_env and not auth and not self.auth:
auth = get_netrc_auth(request.url)

p = PreparedRequest()
p.prepare(
method=request.method.upper(),
url=request.url,
files=request.files,
data=request.data,
json=request.json,
headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
params=merge_setting(request.params, self.params),
auth=merge_setting(auth, self.auth),
cookies=merged_cookies,
hooks=merge_hooks(request.hooks, self.hooks),
)
return p

def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.

:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of ``'filename': file-like-objects``
for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Set to True by default.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol or protocol and
hostname to the URL of the proxy.
:param stream: (optional) whether to immediately download the response
content. Defaults to ``False``.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param cert: (optional) if String, path to ssl client cert file (.pem).
If Tuple, ('cert', 'key') pair.
:rtype: requests.Response
"""
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)

proxies = proxies or {}

settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)

# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)

return resp

def get(self, url, **kwargs):
r"""Sends a GET request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

kwargs.setdefault('allow_redirects', True)
return self.request('GET', url, **kwargs)

def options(self, url, **kwargs):
r"""Sends a OPTIONS request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

kwargs.setdefault('allow_redirects', True)
return self.request('OPTIONS', url, **kwargs)

def head(self, url, **kwargs):
r"""Sends a HEAD request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

kwargs.setdefault('allow_redirects', False)
return self.request('HEAD', url, **kwargs)

def post(self, url, data=None, json=None, **kwargs):
r"""Sends a POST request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

return self.request('POST', url, data=data, json=json, **kwargs)

def put(self, url, data=None, **kwargs):
r"""Sends a PUT request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

return self.request('PUT', url, data=data, **kwargs)

def patch(self, url, data=None, **kwargs):
r"""Sends a PATCH request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

return self.request('PATCH', url, data=data, **kwargs)

def delete(self, url, **kwargs):
r"""Sends a DELETE request. Returns :class:`Response` object.

:param url: URL for the new :class:`Request` object.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:rtype: requests.Response
"""

return self.request('DELETE', url, **kwargs)

def send(self, request, **kwargs):
"""Send a given PreparedRequest.

:rtype: requests.Response
"""
# Set defaults that the hooks can utilize to ensure they always have
# the correct parameters to reproduce the previous request.
kwargs.setdefault('stream', self.stream)
kwargs.setdefault('verify', self.verify)
kwargs.setdefault('cert', self.cert)
kwargs.setdefault('proxies', self.proxies)

# It's possible that users might accidentally send a Request object.
# Guard against that specific failure case.
if isinstance(request, Request):
raise ValueError('You can only send PreparedRequests.')

# Set up variables needed for resolve_redirects and dispatching of hooks
allow_redirects = kwargs.pop('allow_redirects', True)
stream = kwargs.get('stream')
hooks = request.hooks

# Get the appropriate adapter to use
adapter = self.get_adapter(url=request.url)

# Start time (approximately) of the request
start = preferred_clock()

# Send the request
r = adapter.send(request, **kwargs)

# Total elapsed time of the request (approximately)
elapsed = preferred_clock() - start
r.elapsed = timedelta(seconds=elapsed)

# Response manipulation hooks
r = dispatch_hook('response', hooks, r, **kwargs)

# Persist cookies
if r.history:

# If the hooks create history then we want those cookies too
for resp in r.history:
extract_cookies_to_jar(self.cookies, resp.request, resp.raw)

extract_cookies_to_jar(self.cookies, request, r.raw)

# Redirect resolving generator.
gen = self.resolve_redirects(r, request, **kwargs)

# Resolve redirects if allowed.
history = [resp for resp in gen] if allow_redirects else []

# Shuffle things around if there's history.
if history:
# Insert the first (original) request at the start
history.insert(0, r)
# Get the last request made
r = history.pop()
r.history = history

# If redirects aren't being followed, store the response on the Request for Response.next().
if not allow_redirects:
try:
r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs))
except StopIteration:
pass

if not stream:
r.content

return r

def merge_environment_settings(self, url, proxies, stream, verify, cert):
"""
Check the environment and merge it with some settings.

:rtype: dict
"""
# Gather clues from the surrounding environment.
if self.trust_env:
# Set environment's proxies.
no_proxy = proxies.get('no_proxy') if proxies is not None else None
env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
for (k, v) in env_proxies.items():
proxies.setdefault(k, v)

# Look for requests environment configuration and be compatible
# with cURL.
if verify is True or verify is None:
verify = (os.environ.get('REQUESTS_CA_BUNDLE') or
os.environ.get('CURL_CA_BUNDLE'))

# Merge all the kwargs.
proxies = merge_setting(proxies, self.proxies)
stream = merge_setting(stream, self.stream)
verify = merge_setting(verify, self.verify)
cert = merge_setting(cert, self.cert)

return {'verify': verify, 'proxies': proxies, 'stream': stream,
'cert': cert}

def get_adapter(self, url):
"""
Returns the appropriate connection adapter for the given URL.

:rtype: requests.adapters.BaseAdapter
"""
for (prefix, adapter) in self.adapters.items():

if url.lower().startswith(prefix.lower()):
return adapter

# Nothing matches :-/
raise InvalidSchema("No connection adapters were found for {!r}".format(url))

def close(self):
"""Closes all adapters and as such the session"""
for v in self.adapters.values():
v.close()

def mount(self, prefix, adapter):
"""Registers a connection adapter to a prefix.

Adapters are sorted in descending order by prefix length.
"""
self.adapters[prefix] = adapter
keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]

for key in keys_to_move:
self.adapters[key] = self.adapters.pop(key)

def __getstate__(self):
state = {attr: getattr(self, attr, None) for attr in self.__attrs__}
return state

def __setstate__(self, state):
for attr, value in state.items():
setattr(self, attr, value)
  1. 首先,这个类实现了上下文管理,因为实现了方法enterexit特殊方法,使得可以用with语法,关于上下文管理,请参考python上下文管理。

  2. 发起http请求时,会实例化这个类,首先初始化init方法,在这个方法中self.mount('https://', HTTPAdapter()) self.mount('http://', HTTPAdapter())调用mount挂载适配器,后续根据url的前缀'http://或者'https://,选择对应的Adapter,后续会讲这个类。

  3. 接着发起请求时,调用这个实例的request方法,把请求信息用Request类封装一下,然后prep = self.prepare_request(req),也就是封装之后的请求信息传到prepare_request这个方法里,这个方法主要对请求信息进行预处理,通过类PreparedRequest对请求信息进行再次封装,返回PreparedRequest对象。

  4. 接着调用方法send,resp = self.send(prep, **send_kwargs),返回response结果。下面看send方法做了哪些操作:

  • 首先通过adapter = self.get_adapter(url=request.url)选择对应的http适配器,调用adapter的send方法:r = adapter.send(request, **kwargs),返回请求的响应结果,因此,adapter是处理请求进而返回响应的核心类。

  1. 再调用build_response,传入resp,使用类Response对结果进行封装,也就是返回标准的Response结果实例。

下面看下adapter的send方法,首先分析下HTTPAdapter类:

HTTPAdapter类:

HTTPAdapter是处理请求返回响应的类,如:

s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://', a)

可自定义HTTPAdapter,通过mount挂载。

  • init方法中,self.init_poolmanager(pool_connections, pool_maxsize, block=pool_block)调用init_poolmanager方法,初始化一个连接池管理器,关于连接池管理器,可参考http连接池
    ,传入的参数pool_connections代表num_pools,代表pool的数量,pool_maxsize传给类HTTPConnectionPool,作为其maxsize参数,代表缓存的TCP连接数量。

  • 发送请求返回响应主要是send方法:
    首先调用conn = self.get_connection(request.url, proxies)获取一个urllib3.connectionpool.ConnectionPool的连接实例。然后调用resp = conn.urlopen(...),可返回响应结果。

关于__getstate__(self)方法:
  • __getstate__(self):
    在调用pickle.dump时,默认是对象的__dict__属性被存储,如果你要修改这种行为,可以在__getstate__方法中返回一个statestate将在调用pickle.load时传值给__setstate__

  • __setstate__(self, state):
    一般来说,定义了__getstate__,就需要相应地定义__setstate__来对__getstate__返回的state进行处理。


以上是关于requests 源码分析的主要内容,如果未能解决你的问题,请参考以下文章

Android 逆向整体加固脱壳 ( DEX 优化流程分析 | DexPrepare.cpp 中 dvmOptimizeDexFile() 方法分析 | /bin/dexopt 源码分析 )(代码片段

Android 事件分发事件分发源码分析 ( Activity 中各层级的事件传递 | Activity -> PhoneWindow -> DecorView -> ViewGroup )(代码片段

Request类源码分析

OkHttpOkHttp 源码分析 ( 同步 / 异步 Request 请求执行原理分析 )

axios源码分析

requests 源码分析