如何使用 Html Agility Pack 使请求超时
Posted
技术标签:
【中文标题】如何使用 Html Agility Pack 使请求超时【英文标题】:How to Timeout a request using Html Agility Pack 【发布时间】:2011-09-28 06:46:04 【问题描述】:我正在向当前离线(故意)的远程 Web 服务器发出请求。
我想找出使请求超时的最佳方法。基本上,如果请求运行时间超过“X”毫秒,则退出请求并返回null
响应。
目前网络请求只是坐在那里等待响应.....
我将如何最好地解决这个问题?
这是当前的代码 sn-p
public JsonpResult About(string HomePageUrl)
Models.Pocos.About about = null;
if (HomePageUrl.RemoteFileExists())
// Using the html Agility Pack, we want to extract only the
// appropriate data from the remote page.
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(HomePageUrl);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");
if (node != null)
about = new Models.Pocos.About html = node.InnerHtml ;
//todo: look into whether this else statement is necessary
else
about = null;
return this.Jsonp(about);
【问题讨论】:
【参考方案1】:通过此方法检索你的url网页:
private static string retrieveData(string url)
// used to build entire input
StringBuilder sb = new StringBuilder();
// used on each read operation
byte[] buf = new byte[8192];
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create(url);
request.Timeout = 10; //10 millisecond
// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
while (count > 0); // any more data to read?
return sb.ToString();
使用 HTML Agility 包并像这样检索 html 标记:
public static string htmlRetrieveInfo()
string htmlSource = retrieveData("http://example.com/test.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlSource);
if (doc.DocumentNode.SelectSingleNode("//body") != null)
HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
return node.InnerHtml;
【讨论】:
+1 感谢您的回复,它让我走上了正确的道路。我没有通过HttpWebRequest
读取 Html,而是简单地向 RemoteFileExists
添加了超时 - see my answer
@reggie:请注意,此代码的生产版本应使用 using
处理 IDisposable
之类的内容。【参考方案2】:
Html Agility Pack 是开源的。这就是为什么您可以自己修改源代码的原因。 首先将此代码添加到类 HtmlWeb:
private int _timeout = 20000;
public int Timeout
get return _timeout;
set
if (_timeout < 1)
throw new ArgumentException("Timeout must be greater then zero.");
_timeout = value;
然后找到这个方法
private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)
并修改它:
req = WebRequest.Create(uri) as HttpWebRequest;
req.Method = method;
req.UserAgent = UserAgent;
req.Timeout = Timeout; //add this
或者类似的:
htmlWeb.PreRequest = request =>
request.Timeout = 15000;
return true;
;
【讨论】:
【参考方案3】:我不得不对我最初发布的代码做一个小的调整
public JsonpResult About(string HomePageUrl)
Models.Pocos.About about = null;
// ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
if (HomePageUrl.RemoteFileExists(1000))
// Using the Html Agility Pack, we want to extract only the
// appropriate data from the remote page.
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(HomePageUrl);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");
if (node != null)
about = new Models.Pocos.About html = node.InnerHtml ;
//todo: look into whether this else statement is necessary
else
about = null;
return this.Jsonp(about);
然后我修改了我的 RemoteFileExists
扩展方法以设置超时
public static bool RemoteFileExists(this string url, int timeout)
try
//Creating the HttpWebRequest
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
// ************ ADDED HERE
// timeout the request after x milliseconds
request.Timeout = timeout;
// ************
//Setting the Request method HEAD, you can also use GET too.
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TRUE if the Status code == 200
return (response.StatusCode == HttpStatusCode.OK);
catch
//Any exception will returns false.
return false;
在这种方法中,如果我的超时在RemoteFileExists
可以确定标头响应之前触发,那么我的bool
将返回false。
【讨论】:
【参考方案4】:您可以使用标准 HttpWebRequest 来获取远程资源并设置 Timeout 属性。如果成功,则将生成的 HTML 提供给 HTML Agility Pack 进行解析。
【讨论】:
将System.Net.WebRequest
转换为HtmlAgilityPack.HtmlDocument
的正确方法是什么?以上是关于如何使用 Html Agility Pack 使请求超时的主要内容,如果未能解决你的问题,请参考以下文章
Html Agility Pack/C#:如何创建/替换标签?
HTML Agility Pack - 使用 Align=left 样式从 DIV 获取文本
使用 HTML Agility Pack 替换 HTML div InnerText 标签