从 org.w3c.dom.Node 获取 Xpath

Posted 2023-02-21

技术标签:

【中文标题】从 org.w3c.dom.Node 获取 Xpath【英文标题】：Get Xpath from the org.w3c.dom.Node 【发布时间】：2011-06-30 02:11:35 【问题描述】：

我可以从 org.w3c.dom.Node 获取完整的 xpath 吗？

说当前节点正指向xml文档中间的某个地方。我想提取该元素的 xpath。

我正在寻找的输出 xpath 是//parent/child1/chiild2/child3/node。节点 xpath 的父节点。只需忽略具有表达式并指向同一节点的 xpath。

【问题讨论】：

除非您想要一个 XPath 2.0 解决方案（在 XPath 1.0 中这是不可能的）并且您定义了一组特定的 XPath 表达式，否则这个问题一般来说是无法回答的：有无限的 XPath 表达式选择给定 XML 树的同一节点。 @Alejandro：好的。我的 XPath 不会有任何表达。我正在寻找 //parent/child1/chiild2/node 这在 XPath 2.0 规范中：string-join(ancestor-or-self::node()/name(),'/') 以下 Stack Overflow 问题可能与您有关：***.com/questions/4746299/… 【参考方案1】：

对我来说，这个效果最好（使用 org.w3c.dom 元素）：

String getXPath(Node node)

    Node parent = node.getParentNode();
    if (parent == null)
    
        return "";
    
    return getXPath(parent) + "/" + node.getNodeName();

【讨论】：

如果第一个 return 语句返回一个空字符串，这对我来说效果更好。否则返回的 xpath 以两个正斜杠开头。 @Adam Wise：谢谢，你是对的，那个斜线是没有必要的......只会修复代码计数器（即“html[1]/div[3]”）呢？【参考方案2】：

我在 jOOX 背后的公司工作，该库为 Java 标准 DOM API 提供许多有用的扩展，模仿 jquery API。使用 jOOX，您可以获得任何元素的 XPath，如下所示：

String path = $(element).xpath();

上面的路径将是这样的

/document[1]/library[2]/books[3]/book[1]

【讨论】：

【参考方案3】：

这样的东西会给你一个简单的xpath：

public String getXPath(Node node) 
    return getXPath(node, "");


public String getXPath(Node node, String xpath) 
    if (node == null) 
        return "";
    
    String elementName = "";
    if (node instanceof Element) 
        elementName = ((Element) node).getLocalName();
    
    Node parent = node.getParentNode();
    if (parent == null) 
        return xpath;
    
    return getXPath(parent, "/" + elementName + xpath);

【讨论】：

【参考方案4】：

我从 Mikkel Flindt post & 对其进行了修改，使其适用于属性节点。

public static String getFullXPath(Node n) 
// abort early
if (null == n)
  return null;

// declarations
Node parent = null;
Stack<Node> hierarchy = new Stack<Node>();
StringBuffer buffer = new StringBuffer();

// push element on stack
hierarchy.push(n);

switch (n.getNodeType()) 
case Node.ATTRIBUTE_NODE:
  parent = ((Attr) n).getOwnerElement();
  break;
case Node.ELEMENT_NODE:
  parent = n.getParentNode();
  break;
case Node.DOCUMENT_NODE:
  parent = n.getParentNode();
  break;
default:
  throw new IllegalStateException("Unexpected Node type" + n.getNodeType());


while (null != parent && parent.getNodeType() != Node.DOCUMENT_NODE) 
  // push on stack
  hierarchy.push(parent);

  // get parent of parent
  parent = parent.getParentNode();


// construct xpath
Object obj = null;
while (!hierarchy.isEmpty() && null != (obj = hierarchy.pop())) 
  Node node = (Node) obj;
  boolean handled = false;

  if (node.getNodeType() == Node.ELEMENT_NODE) 
    Element e = (Element) node;

    // is this the root element?
    if (buffer.length() == 0) 
      // root element - simply append element name
      buffer.append(node.getNodeName());
     else 
      // child element - append slash and element name
      buffer.append("/");
      buffer.append(node.getNodeName());

      if (node.hasAttributes()) 
        // see if the element has a name or id attribute
        if (e.hasAttribute("id")) 
          // id attribute found - use that
          buffer.append("[@id='" + e.getAttribute("id") + "']");
          handled = true;
         else if (e.hasAttribute("name")) 
          // name attribute found - use that
          buffer.append("[@name='" + e.getAttribute("name") + "']");
          handled = true;
        
      

      if (!handled) 
        // no known attribute we could use - get sibling index
        int prev_siblings = 1;
        Node prev_sibling = node.getPreviousSibling();
        while (null != prev_sibling) 
          if (prev_sibling.getNodeType() == node.getNodeType()) 
            if (prev_sibling.getNodeName().equalsIgnoreCase(
                node.getNodeName())) 
              prev_siblings++;
            
          
          prev_sibling = prev_sibling.getPreviousSibling();
        
        buffer.append("[" + prev_siblings + "]");
      
    
   else if (node.getNodeType() == Node.ATTRIBUTE_NODE) 
    buffer.append("/@");
    buffer.append(node.getNodeName());
  

// return buffer
return buffer.toString();

【讨论】：

【参考方案5】：

没有获取 XPath 的通用方法，主要是因为没有一个通用 XPath 可以标识文档中的特定节点。在某些模式中，节点将由一个属性唯一标识（id 和 name 可能是最常见的属性。）在其他模式中，每个元素的名称（即标签）足以唯一标识一个节点.在少数（不太可能，但可能）情况下，没有一个唯一名称或属性可以将您带到特定节点，因此您需要使用基数（获取第 m 个子节点的第 n 个子节点。 ..)。

编辑：在大多数情况下，创建一个依赖于模式的函数来为给定节点组装 XPath 并不难。例如，假设您有一个文档，其中每个节点都由 id 属性唯一标识，并且您没有使用命名空间。然后（我认为）下面的伪 Java 将根据这些属性返回一个 XPath。（警告：我还没有测试过。）

String getXPath(Node node)

    Node parent = node.getParent();
    if (parent == null) 
        return "/" + node.getTagName();
    
    return getXPath(parent) + "/" + "[@id='" + node.getAttribute("id") + "']";

【讨论】：

【参考方案6】：

一些专门用于 XML 的 IDE 会为您做到这一点。

这里是最有名的

oXygen Stylus Studio xmlSpy

例如，在 oXygen 中，您可以右键单击 XML 文档的元素部分，上下文菜单将有一个选项“复制 Xpath”。

还有一些 Firefox 附加组件（例如 XPather 会很乐意为您完成这项工作。对于 Xpather，您只需单击网页的一部分并在上下文菜单中选择“显示在XPather'，你就完成了。

但是，正如 Dan 在他的回答中指出的那样，XPath 表达式的用途有限。例如，它不包括谓词。而是看起来像这样。

/root/nodeB[2]/subnodeX[2]

对于像这样的文档

<root>
   <nodeA>stuff</nodeA>
   <nodeB>more stuff</nodeB>
   <nodeB cond="thisOne">
       <subnodeX>useless stuff</subnodeX>
       <subnodeX id="MyCondition">THE STUFF YOU WANT</subnodeX>
       <subnodeX>more useless stuff</subnodeX>
   </nodeB>
</root>

我列出的工具不会生成

/root/nodeB[@cond='thisOne']/subnodeX[@id='MyCondition']

例如，对于一个 html 页面，你最终会得到一个非常无用的表达式：

/html/body/div[6]/p[3]

这是意料之中的。如果他们必须生成谓词，他们怎么知道哪个条件是相关的？有无数种可能性。

【讨论】：

以上是关于从 org.w3c.dom.Node 获取 Xpath的主要内容，如果未能解决你的问题，请参考以下文章