如何使用 luminati.io 等代理服务器正确地向 https 发出请求？

Posted 2023-02-23

技术标签:

【中文标题】如何使用 luminati.io 等代理服务器正确地向 https 发出请求？【英文标题】：How to properly do requests to a https with a proxy server such as luminati.io? 【发布时间】：2019-06-06 23:20:11 【问题描述】：

这是高级代理提供商 luminati.io 提供的 API。但是，它以字节码而不是字典的形式返回，因此将其转换为字典以便能够提取ip 和port：

每个请求都会以一个新的对等代理结束，因为每个请求的 IP 都会轮换。

import csv
import requests
import json
import time

#!/usr/bin/env python

print('If you get error "ImportError: No module named \'six\'"'+\
    'install six:\n$ sudo pip install six');
import sys
if sys.version_info[0]==2:
    import six
    from six.moves.urllib import request
    opener = request.build_opener(
        request.ProxyHandler(
            'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'))
    proxy_details = opener.open('http://lumtest.com/myip.json').read()
if sys.version_info[0]==3:
    import urllib.request
    opener = urllib.request.build_opener(
        urllib.request.ProxyHandler(
            'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'))
    proxy_details = opener.open('http://lumtest.com/myip.json').read()
proxy_dictionary = json.loads(proxy_details)

print(proxy_dictionary)

那我打算用requests模块中的ip和port连接到感兴趣的网站：

headers = 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'

if __name__ == "__main__":

    search_keyword = input("Enter the search keyword: ")
    page_number =  int(input("Enter total number of pages: "))

    for i in range(1,page_number+1):
        time.sleep(10)

        link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
        proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
        print(proxy)
        req = requests.get(link,headers=headers,proxies="https":proxy)

但我的问题是它在requests 部分出错。当我将proxies="https":proxy 更改为proxies="http":proxy 时，有一次它通过了，但除此之外，代理无法连接。

样本输出：

print_dictionary = 'ip': '84.22.151.191', 'country': 'RU', 'asn': 'asnum': 57129, 'org_name': 'Optibit LLC', 'geo': 'city': 'Krasnoyarsk', 'region': 'KYA', 'postal_code': '660000', 'latitude': 56.0097, 'longitude': 92.7917, 'tz': 'Asia/Krasnoyarsk'

对等代理的详细信息如下图所示：

print(proxy) 将产生 84.22.151.191:57129，它被输入到 requests.get 方法中

我得到的错误：

(Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000282DDD592B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))

我测试了删除 requests 方法的 proxies="https":proxy 参数，并且抓取工作没有错误。所以代理有问题或者我访问它的方式有问题。

【问题讨论】：

嗯，无法测试，得到“urllib.error.URLError: ”。代理末尾的“@127.0.3.1:20005”部分是什么？正在尝试在本地设置一个？您是否使用您的 ISP 的 ASN 号码作为代理端口？您提到了 port 但仅在 cmets 中。代理值也不应该包含协议吗？例如：http://84.22.151.191:57129? @CristiFati @127.0.3.3.1:20005 是我的应用程序用来连接到Luminati Proxy Manager 然后他们将返回一个peer proxy 这是84.22.151.191:57129:57129 然后我将使用它来连接以抓取网站兴趣。既然我定义了proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])，那么proxies="https":proxy 就是proxies="https":84.22.151.191:57129。你的意思是它必须是proxies="https":"https://84.22.151.191:57129"？请注意，您将无法使用 this'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005' 进行连接，因为我更改了详细信息，因为它是我的服务用户名和密码我也尝试了`proxies="https":"84.22.151.191:57129"的格式，出现同样的错误。 【参考方案1】：

将proxies="https":proxy 更改为proxies="http":proxy 时，您还必须确保您的链接是http 而不是https，所以也请尝试替换：

link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

与

link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

您的整体代码应如下所示：

headers = 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'

if __name__ == "__main__":

    search_keyword = input("Enter the search keyword: ")
    page_number =  int(input("Enter total number of pages: "))

    for i in range(1,page_number+1):
        time.sleep(10)

        link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
        proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
        print(proxy)
        req = requests.get(link,headers=headers,proxies="http":proxy)

希望这会有所帮助！

【讨论】：

我实际上在发帖之前尝试过这样做，认为他们必须是匹配的，但它也没有奏效【参考方案2】：

聚会有点晚了，但这对我有用。

proxies = 'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005', 'https': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'
            
req = requests.get(link,headers=headers,proxies=proxies)

像这样定义代理后，我能够点击链接并获得响应。我相信 luminati 需要凭据才能从他们的代理中旋转和点击链接

【讨论】：

以上是关于如何使用 luminati.io 等代理服务器正确地向 https 发出请求？的主要内容，如果未能解决你的问题，请参考以下文章