C#爬虫-Selenium ChromeDriver 设置代理
Posted dotNET跨平台
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了C#爬虫-Selenium ChromeDriver 设置代理相关的知识,希望对你有一定的参考价值。
背景
开发爬虫程序,如果不做代理设置,本机的外网IP很容易被网站封掉,导致不能持续进行数据抓取。而Selenium作为动态网页抓取的利器,我们有必要了解一下,如何对它进行代理设置,并正常访问网页。
解决办法
1、首先申请代理ip,正常付费的才比较靠谱。这其中包括账号、密码。
private string proxy_Host = "域名地址";
private int proxy_Post = 端口;
private string proxy_UserName = "账号";
private string proxy_PassWord = "密码";
private string proxy_CheckURL = "检查是否正常的地址";
private string Ex_Proxy_Name = "proxy.zip";
2、设置chrome background.js、manifest.json
private bool Rebuild_Extension_Proxy(string proxy_UserName, string proxy_PassWord)
bool result = false;
FileStream zipToOpen = null;
ZipArchive archive = null;
ZipArchiveEntry readmeEntry = null;
StreamWriter writer = null;
string background = "";
string manifest = "";
try
background = @"
var Global =
currentProxyAouth:
username: '',
password: ''
Global.currentProxyAouth =
username: '" + proxy_UserName + @"',
password: '" + proxy_PassWord + @"'
chrome.webRequest.onAuthRequired.addListener(
function(details, callbackFn)
console.log('onAuthRequired >>>: ', details, callbackFn);
callbackFn(
authCredentials: Global.currentProxyAouth
);
,
urls: [""<all_urls>""]
, [""asyncBlocking""]);
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse)
console.log('Background recieved a message: ', request);
POPUP_PARAMS = ;
if (request.command && requestHandler[request.command])
requestHandler[request.command] (request);
);";
manifest = @"
""version"": ""1.0.0"",
""manifest_version"": 2,
""name"": ""Chrome Proxy"",
""permissions"": [
""proxy"",
""tabs"",
""unlimitedStorage"",
""storage"",
""<all_urls>"",
""webRequest"",
""webRequestBlocking""
],
""background"":
""scripts"": [""background.js""]
,
""minimum_chrome_version"":""22.0.0""
";
zipToOpen = new FileStream(System.Environment.CurrentDirectory + "\\\\" + Ex_Proxy_Name, FileMode.Create);
archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update);
readmeEntry = archive.CreateEntry("background.js");
writer = new StreamWriter(readmeEntry.Open());
writer.WriteLine(background);
writer.Close();
readmeEntry = archive.CreateEntry("manifest.json");
writer = new StreamWriter(readmeEntry.Open());
writer.WriteLine(manifest);
writer.Close();
result = true;
catch (Exception ex)
result = false;
finally
if (writer != null) writer.Close(); writer.Dispose(); writer = null;
if (readmeEntry != null) readmeEntry = null;
if (archive != null) archive.Dispose(); archive = null;
if (zipToOpen != null) zipToOpen.Close(); zipToOpen.Dispose(); zipToOpen = null;
return result;
3、Chrome Driver使用代理Proxy
// 設置 Chrome Driver Exyension Proxy 設定
bool isproxysetting = true;
if (_isuseproxy)
isproxysetting = Rebuild_Extension_Proxy(proxy_UserName, proxy_PassWord);
if (isproxysetting)
// Driver 設定
options = new ChromeOptions();
if (_isuseproxy)
options.Proxy = null;
options.AddArguments("--proxy-server=" + proxy_Host + ":" + proxy_Post.ToString());
options.AddExtension(Ex_Proxy_Name);
4、测试一下我们的设置
private Proxy_Unit.ProxyIPInfo Get_ProxyIPInfo(string html_Content)
Proxy_Unit.ProxyIPInfo result = null;
try
result = new Proxy_Unit.ProxyIPInfo();
Html_Content = Html_Content.Replace("<html><head></head><body><pre style=\\"word-wrap: break-word; white-space: pre-wrap;\\">", "");
Html_Content = Html_Content.Replace("</pre></body></html>", "");
if (!Html_Content.Contains("proxy error"))
result = JsonConvert.DeserializeObject<Proxy_Unit.ProxyIPInfo>(Html_Content);
else
result = null;
catch (Exception ex)
result = null;
return result;
测试效果
成功,达到预期效果
"ip":"213.182.205.185",
"country":"IS",
"asn":
"asnum":9009,
"org_name":"M247 Ltd"
,
"geo":
"city":"Reykjavik",
"region":"1",
"region_name":"Capital Region",
"postal_code":"105",
"latitude":64.1369,
"longitude":-21.9139,
"tz":"Atlantic/Reykjavik",
"lum_city":"reykjavik",
"lum_region":"1"
总结
我们之前测试要为ChromeDriver设定Proxy时有遇到许多困难,需要使用Chrome Extension的管道设定Proxy才成功,以上希望能让您比较好了解。
以上是关于C#爬虫-Selenium ChromeDriver 设置代理的主要内容,如果未能解决你的问题,请参考以下文章
爬虫--python3.6+selenium+BeautifulSoup实现动态网页的数据抓取,适用于对抓取频率不高的情况
c#爬虫-从内存中释放Selenium chromedriver.exe终极杀