如何使用缓存过滤器?
Posted
技术标签:
【中文标题】如何使用缓存过滤器?【英文标题】:How to use cache filter? 【发布时间】:2021-11-30 17:43:33 【问题描述】:我的缓存过滤器有问题。
这个想法是不缓存包含"incomplete_result":true
的响应
这是我的过滤功能:
import requests
import requests_cache
def phrase_filter(response: requests.models.Response)->bool:
if '"incomplete_results":true' in response.text:
return False
return True
但是当我用这段代码测试它时:
requests_cache.install_cache('demo_cache',expired_after=600,filter_fn=phrase_filter)
requests_cache.clear()
url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
with requests_cache.enabled():
r = requests.get(url1) # First request
r = requests.get(url1) # Second request
print(f'Text from url1:\nr.text')
assert r.from_cache==True
#
r1 = requests.get(url2) # First request
r1 = requests.get(url2) # Second request
print('---')
print(f'Text from url2:\nr1.text')
assert r1.from_cache==False
requests_cache.disabled()
结果如下:
Text from url1:
abc
xyz
"incomplete_results":false
---
Text from url2:
abc
xyz
"incomplete_results":true
Traceback (most recent call last):
File "C:\Users\ADMIN\source\repos\LearningPython\py_2\py_2.py", line 25, in <module>
assert r1.from_cache==False
AssertionError
我不明白为什么 r1
被缓存了。
有什么问题?我该如何解决?
感谢您花时间回答
【问题讨论】:
【参考方案1】:打补丁
看起来你快到了! requests_cache.enabled()
和 disabled()
是 install_cache()
和 uninstall_cache()
的上下文管理器替代品。只需将您的设置传递给enabled()
而不是install_cache()
:
with requests_cache.enabled('demo_cache', expire_after=600, filter_fn=phrase_filter):
# ... make requests
这与以下基本相同:
requests_cache.install_cache('demo_cache', expire_after=600, filter_fn=phrase_filter)
# ... make requests
requests_cache.uninstall_cache()
会话
我个人建议使用requests_cache.CachedSession
而不是修补方法,因为它使缓存的内容更加明确,如果您想发出非缓存请求,您可以使用常规的requests
函数。此处的文档中有更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/general.html
例子:
from requests import Response
from requests_cache import CachedSession
def phrase_filter(response: Response) -> bool:
return '"incomplete_results":true' not in response.text
url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
session = CachedSession('demo_cache', expire_after=600, filter_fn=phrase_filter)
session.cache.clear()
nonfiltered_response = session.get(url1)
nonfiltered_response = session.get(url1)
assert nonfiltered_response.from_cache is True
filtered_response = session.get(url2)
filtered_response = session.get(url2)
assert filtered_response.from_cache is False
调试
如果您以后遇到类似的问题,不确定为什么响应被缓存或没有被缓存,您可以启用调试日志记录:
import logging
logging.basicConfig(level='DEBUG')
您将获得每个响应的缓存信息,如下所示:
DEBUG:requests_cache.session: Pre-cache checks for response from https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt:
'disabled cache': False,
'disabled method': False,
'disabled status': False,
'disabled by filter': True,
'disabled by headers or expiration params': False,
这里的文档中的更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/troubleshooting.html
【讨论】:
【参考方案2】:我也试过了,但无法正常工作:
# Added by Eurico Covas
# see https://requests-cache.readthedocs.io/en/stable/user_guide/filtering.html
@staticmethod
def filter_by_error(response: requests.models.Response) -> bool:
"""Don't cache responses with ErrMsg"""
if response is None:
return True
if response.ok ==False:
return True
if len(response.json()['GDSSDKResponse']) == 1:
if len(response.json()['GDSSDKResponse'][0]) >= 1:
if "ErrMsg" in response.json()['GDSSDKResponse'][0].keys():
if response.json()['GDSSDKResponse'][0]['ErrMsg'] is not None and response.json()['GDSSDKResponse'][0]['ErrMsg'] != '':
return True
return False
def __init__(self, username, password, verify=True, debug=False, request_caching_enabled=False):
assert username is not None
assert password is not None
assert verify is not None
assert debug is not None
assert request_caching_enabled is not None
self._username = username
self._password = password
self._verify = verify
self._debug = debug
self._request_caching_enabled=request_caching_enabled
if self._request_caching_enabled:
self.request_count = self.get_cached_request_count()
if not self._verify:
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
if self._debug:
self.enable_request_debugging()
else:
self.enable_error_logging()
# cache requests for 30*24 hours = 1 month!
if self._request_caching_enabled:
requests_cache.install_cache('capiq_cache', backend='sqlite', expire_after=30*86400, allowable_methods=('POST',), filter_fn=self.filter_by_error)
但是
response = requests.post(self._endpoint, headers=self._headers, data=json.dumps(req),
auth=HTTPBasicAuth(self._username, self._password), verify=self._verify)
永远不会调用 filter_by_error()...
【讨论】:
这并不能真正回答问题。如果您有其他问题,可以点击 提问。要在此问题有新答案时收到通知,您可以follow this question。一旦你有足够的reputation,你也可以add a bounty 来引起对这个问题的更多关注。 - From Review以上是关于如何使用缓存过滤器?的主要内容,如果未能解决你的问题,请参考以下文章