使用 XPath 和 WebBrowser 控件选择多个节点
Posted
技术标签:
【中文标题】使用 XPath 和 WebBrowser 控件选择多个节点【英文标题】:Using XPath and WebBrowser Control to select multiple nodes 【发布时间】:2014-10-10 21:34:36 【问题描述】:在 C# WinForms 示例应用程序中,我使用 WebBrowser 控件和 javascript-XPath 选择单个节点并通过以下代码更改该节点 .innerhtml:
private void MainForm_Load(object sender, EventArgs e)
webBrowser1.DocumentText = @"
<html>
<head>
<script src=""http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js""></script>
</head>
<body>
<img 0764547763 Product Details""
src=""http://ecx.images-amazon.com/images/I/51AK1MRIi7L._AA160_.jpg"">
<hr/>
<h2>Product Details</h2>
<ul>
<li><b>Paperback:</b> 648 pages</li>
<li><b>Publisher:</b> Wiley; Unlimited Edition edition (October 15, 2001)</li>
<li><b>Language:</b> English</li>
<li><b>ISBN-10:</b> 0764547763</li>
</ul>
</body>
</html>
";
private void cmdTest_Click(object sender, EventArgs e)
string xPath = "//li";
string code = string.Format("document.evaluate('0', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;", xPath);
var li = webBrowser1.Document.InvokeScript("eval", new object[] code ) as mshtml.IHTMLElement;
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>0</span>", li.innerText);
这段代码运行结果如下:
现在我想使用相同的技术在<ul>
节点下选择多个<li>
nodes,我正在写:
xPath = "//ul//*";
code = string.Format("document.evaluate('0', document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);", xPath);
var allLI = webBrowser1.Document.InvokeScript("eval", new object[] code ) as mshtml.IHTMLElementCollection;
但是allLI
变量的返回值是NULL
。
如果我会写
xPath = "//ul//*";
code = string.Format("document.evaluate('0', document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);", xPath);
var allLI = webBrowser1.Document.InvokeScript("eval", new object[] code );
那么返回的 allLI
变量不是 null 并且它的值类型是 COM Object
但是这个 COM Object
可以转换为更具体的类型我不清楚。
有没有办法通过这里使用的技术来选择多个节点?
[已编辑]
xPath = "ul//*";
到
xPath = "//ul//*";
[加法]
我在示例 HTML 中添加了两个 javaScript 函数:
<script type=""text/javascript"">
function GetElementsText (XPath)
var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
var nextElement = xPathRes.iterateNext ();
var text = """";
while (nextElement)
text += nextElement.innerText;
nextElement = xPathRes.iterateNext ();
return text;
;
function GetElements (XPath)
var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
var nextElement = xPathRes.iterateNext ();
var elements = new Object();
var elementIndex = 1;
while (nextElement)
elements[elementIndex++] = nextElement;
nextElement = xPathRes.iterateNext ();
return elements;
;
</script>
现在,当我在 cmd_TestClick
方法中运行以下 C# 代码行时:
var text = webBrowser1.Document.InvokeScript("eval", new object[] "GetElementsText('//ul')" );
我正在获取所有 li
元素的文本:
"Paperback: 648 pages \r\nPublisher: Wiley; Unlimited Edition edition (October 15, 2001) \r\nLanguage: English \r\nISBN-10: 0764547763 "
当我在 cmd_TestClick
方法中运行以下 C# 代码行时:
var elements = webBrowser1.Document.InvokeScript("eval", new object[] "GetElements('//ul')" );
我收到了COM Object
,我无法将其转换为IEnumerable<mshtml.IHtmlElement>
。
有没有办法在 C# 代码中处理由返回的 HTML 节点的 JavaScript 集合
var elements = webBrowser1.Document.InvokeScript("eval", new object[] "GetElements('//ul')" );
?
【问题讨论】:
这有帮助吗? ***.com/a/20783420/1768303 @Noseratio:我想避免使用 HTML Agility Pack - 我想通过 mshtml.IHTMLElement 和/或 mshtml.IHTMLElementCollection 通过 mshtml.IHTMLElementCollection 直接操作 WebBrowser 控件的 DOM 内容。 【参考方案1】:我找到了解决办法,代码如下:
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;
namespace myTest.WinFormsApp
public partial class MainForm : Form
public MainForm()
InitializeComponent();
private void MainForm_Load(object sender, EventArgs e)
webBrowser1.DocumentText = @"
<html>
<body>
<img 0764547763 Product Details""
src=""http://ecx.images-amazon.com/images/I/51AK1MRIi7L._AA160_.jpg"">
<hr/>
<h2>Product Details</h2>
<ul>
<li><b>Paperback:</b> 648 pages</li>
<li><b>Publisher:</b> Wiley; Unlimited Edition edition (October 15, 2001)</li>
<li><b>Language:</b> English</li>
<li><b>ISBN-10:</b> 0764547763</li>
</html>
";
private void cmdTest_Click(object sender, EventArgs e)
var processor = new WebBrowserControlXPathQueriesProcessor(webBrowser1);
// change attributes of the first element of the list
var li = processor.GetHtmlElement("//li");
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>0</span>", li.innerText);
// change attributes of the second and subsequent elements of the list
var list = processor.GetHtmlElements("//ul//li");
int index = 1;
foreach (var li in list)
if (index++ == 1) continue;
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>0</span>", li.innerText);
/// <summary>
/// Enables IE WebBrowser control to evaluate XPath queries
/// by injecting http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js
/// and to return XPath queries results to the calling C# code as strongly typed
/// mshtml.IHTMLElement and IEnumerable<mshtml.IHTMLElement>
/// </summary>
public class WebBrowserControlXPathQueriesProcessor
private System.Windows.Forms.WebBrowser _webBrowser;
public WebBrowserControlXPathQueriesProcessor(System.Windows.Forms.WebBrowser webBrowser)
_webBrowser = webBrowser;
injectScripts();
private void injectScripts()
// Thanks to: http://***.com/questions/7998996/how-to-inject-javascript-in-webbrowser-control
HtmlElement head = _webBrowser.Document.GetElementsByTagName("head")[0];
HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.src = "http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js";
head.AppendChild(scriptEl);
string javaScriptText = @"
function GetElements (XPath)
var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
var nextElement = xPathRes.iterateNext ();
var elements = new Object();
var elementIndex = 1;
while (nextElement)
elements[elementIndex++] = nextElement;
nextElement = xPathRes.iterateNext ();
elements.length = elementIndex -1;
return elements;
;
";
scriptEl = _webBrowser.Document.CreateElement("script");
element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = javaScriptText;
head.AppendChild(scriptEl);
/// <summary>
/// Gets Html element's mshtml.IHTMLElement object instance using XPath query
/// </summary>
public mshtml.IHTMLElement GetHtmlElement(string xPathQuery)
string code = string.Format("document.evaluate('0', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;", xPathQuery);
return _webBrowser.Document.InvokeScript("eval", new object[] code ) as mshtml.IHTMLElement;
/// <summary>
/// Gets Html elements' IEnumerable<mshtml.IHTMLElement> object instance using XPath query
/// </summary>
public IEnumerable<mshtml.IHTMLElement> GetHtmlElements(string xPathQuery)
// Thanks to: http://***.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
var comObject = _webBrowser.Document.InvokeScript("eval", new object[] string.Format("GetElements('0')", xPathQuery) );
Type type = comObject.GetType();
int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);
for (int i = 1; i <= length; i++)
yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
下面是代码运行结果:
我已将学分的引用嵌入到我的代码中。如果您发现我遗漏了一些,请在您的 cmets 中指出我,我会添加它们。
如果您知道更好的解决方案 - 更短的代码,更有效的代码 - 请评论和/或发布您的答案。
【讨论】:
这个用元素填充数组的 js 不适用于 google.com/… 站点,它为 xpath//div[@class='_pl _ki']/descendant-or-self::text()[1]
提供截断的业务名称,仅像 Broadway
而不是 Broadway Chiropractic & Wellness
对于一个特定示例,此解决方案返回 null 而在 Chrome 中执行的相同 JavaScript 返回正确的元素
另外,您可能希望 XPath 用双引号括起来,因为 XPath 可能包含双引号而不是简单引号:string code = string.Format("document.evaluate(\"0\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;", xPathQuery);
instread of document.evaluate('0' ...
以上是关于使用 XPath 和 WebBrowser 控件选择多个节点的主要内容,如果未能解决你的问题,请参考以下文章