Selenium 在旋转代理时抛出 InvalidArgumentException
Posted
技术标签:
【中文标题】Selenium 在旋转代理时抛出 InvalidArgumentException【英文标题】:Selenium throws an InvalidArgumentException when rotating proxy 【发布时间】:2019-01-25 21:02:31 【问题描述】:所以我在 GitHub 上找到了这段代码,用于从 https://free-proxy-list.net/ 收集 IP 并轮换它们。但是当我尝试运行它时收到一条错误消息。
我尝试调试它,但找不到解决方案。我发现我的 Chrome Web 驱动程序的新版本出了问题?
这是代码:
from selenium import webdriver
from selenium.webdriver.chrome.options import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy, ProxyType
import time
co = webdriver.ChromeOptions()
co.add_argument("log-level=3")
co.add_argument("--headless")
def get_proxies(co=co):
driver = webdriver.Chrome(chrome_options=co)
driver.get("https://free-proxy-list.net/")
PROXIES = []
proxies = driver.find_elements_by_css_selector("tr[role='row']")
for p in proxies:
result = p.text.split(" ")
if result[-1] == "yes":
PROXIES.append(result[0]+":"+result[1])
driver.close()
return PROXIES
ALL_PROXIES = get_proxies()
def proxy_driver(PROXIES, co=co):
prox = Proxy()
if PROXIES:
pxy = PROXIES[-1]
else:
print("--- Proxies used up (%s)" % len(PROXIES))
PROXIES = get_proxies()
prox.proxy_type = ProxyType.MANUAL
prox.http_proxy = pxy
prox.socks_proxy = pxy
prox.ssl_proxy = pxy
capabilities = webdriver.DesiredCapabilities.CHROME
prox.add_to_capabilities(capabilities)
driver = webdriver.Chrome(chrome_options=co, desired_capabilities=capabilities)
return driver
# --- YOU ONLY NEED TO CARE FROM THIS LINE ---
# creating new driver to use proxy
pd = proxy_driver(ALL_PROXIES)
# code must be in a while loop with a try to keep trying with different proxies
running = True
while running:
try:
mycodehere()
# if statement to terminate loop if code working properly
something()
# you
except:
new = ALL_PROXIES.pop()
# reassign driver if fail to switch proxy
pd = proxy_driver(ALL_PROXIES)
print("--- Switched proxy to: %s" % new)
time.sleep(1)
这是我得到的错误:
Traceback (most recent call last):
File "test_v1.py", line 53, in <module>
pd = proxy_driver(ALL_PROXIES)
File "test_v1.py", line 47, in proxy_driver
driver = webdriver.Chrome('/home/djurovic/Desktop/Linux ChromeDriver/chromedriver', chrome_options=co, desired_capabilities=capabilities)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: cannot parse capability: proxy
from invalid argument: Specifying 'socksProxy' requires an integer for 'socksVersion'
(Driver info: chromedriver=2.45.615279 (12b89733300bd268cff3b78fc76cb8f3a7cc44e5),platform=Linux 4.15.0-43-generic x86_64)
【问题讨论】:
从错误中,听起来您必须为袜子的代理版本传入一个整数,而您的代理列表中缺少该整数。您可能需要将pxy
列表中的内容与有效 socks 代理的示例进行比较
另见this question关于在袜子上使用硒
@G.Anderson 我之前发现了问题,但我无法实现它来工作......
你能抽出一点时间,给我写一个简短的例子,它会比你想象的更有帮助......
通过简单的谷歌搜索找到:github.com/rootVIII/proxy_web_crawler 它完全符合您的要求,但使用的是 Firefox
【参考方案1】:
我能够重现此错误...该错误在您正在使用的 chrome 驱动程序版本中,即 2.45
。我认为这个版本有些不同。
因此,您所要做的就是下载以前的 chrome webdriver 版本。我目前使用的是2.41
,可以从here下载。
【讨论】:
是的,我怀疑 Chrome 驱动程序以上是关于Selenium 在旋转代理时抛出 InvalidArgumentException的主要内容,如果未能解决你的问题,请参考以下文章
CloudFormation 在启动 EC2 实例时抛出“Value () for parameter groupId is invalid. The value cannot be empty”
httpClient 在从 azure 反向代理调用 https 连接时抛出连接重置
在访问jsp时抛java.lang.IllegalArgumentException: Page directive: invalid value for import的原因
使用 Jprofiler 通过 SSH 隧道下载远程服务器的代理时抛出错误“Sun.security.validator.ValidatorException”[重复]