Xpath获取两个a标签之间p内的所有文本

Posted 2023-02-16

技术标签:

【中文标题】Xpath获取两个a标签之间p内的所有文本【英文标题】：Xpath get all text within p between two a tags 【发布时间】：2021-10-09 16:34:03 【问题描述】：

我正在尝试获取位于两个<a> 链接标签之间的以下<p> 段落标签中的所有文本。我想获取整个段落标记或段落标记中的所有文本。任何一个都可以。

<div>
     <h3 class="mt30">
         <a href="/the-world-factbook/field/area">Area</a>
     </h3>
         <p>
              <strong>total: </strong>
              1,138,910 sq km
              <br>
              <br>
              <strong>land: </strong>
              1,038,700 sq km
              <br>
              <br>
              <strong>water: </strong>
              100,210 sq km
              <br>
              <br>
              <strong>note:</strong> 
              includes Isla de Malpelo, Roncador Cay, and Serrana Bank
          </p>
         <a href="/the-world-factbook/field/area/country-comparison/">country comparison to the world: <!-- -->27</a>
</div>

我正在尝试这样的事情：

//a[contains(@href, "area")]/@href/following::text()[1]

然后打算弄清楚如何将所有文本节点连接在一起。问题是我正在尝试抓取在<p> 段落中具有不同数量的文本节点的多个页面，该段落由<a> 链接标签包围，因此文本节点的数量会有所不同。我希望有更灵活的方法，谢谢。

编辑 - 我尝试了 @michael.hor257k 的推荐 //a[contains(@href, 'area')]/following::p[1]，并且回复不仅包括最初的段落

<div>
     <h3 class="mt30">
         <a href="/the-world-factbook/field/area">Area</a>
     </h3>
         <p>
              <strong>total: </strong>
              1,138,910 sq km
              <br>
              <br>
              <strong>land: </strong>
              1,038,700 sq km
              <br>
              <br>
              <strong>water: </strong>
              100,210 sq km
              <br>
              <br>
              <strong>note:</strong> 
              includes Isla de Malpelo, Roncador Cay, and Serrana Bank
          </p>
         <a href="/the-world-factbook/field/area/country-comparison/">country comparison to the world: <!-- -->27</a>
</div>
<div>
    <h3 class="mt30">
        <a href="/the-world-factbook/field/area-comparative">Area - comparative</a>
    </h3>
        <p>slightly less than twice the size of Texas</p>
<div>

【问题讨论】：

您使用哪个版本的 XSLT 或 XPath？无论如何，如果 p 元素内的文本是您想要的，我将始终选择 p 元素并获取其字符串值，而不是尝试向下选择 p 元素的任何文本节点子节点。输入的 XML 格式不正确：<br> 需要为 <br/>。 Martin - 我使用的是 Xpath 1.0。 Michael - 输入是直接从 Chrome 开发者工具复制的，然后正确缩进，我很抱歉重新编辑：请发布minimal reproducible example，显示格式正确的 XML 输入、完整、可执行的 XSLT 和预期输出。 -- 请注意，您有两个满足条件的a 元素，因此自然会选择以下两个p 元素进行输出。 【参考方案1】：

这个问题并不完全清楚。要复制感兴趣的a 元素之后的第一个p 元素，您可以这样做：

<xsl:copy-of select="//a[contains(@href, 'area')]/following::p[1]" />

要仅获取同一 p 中的文本，请使用：

<xsl:value-of select="//a[contains(@href, 'area')]/following::p[1]" />

【讨论】：

您好@michael.hor257k，感谢您的评论 - 您知道如何在 Xpath 1.0 中执行此操作吗？以上都是使用 XPath 1.0 表达式的 XSLT 1.0 指令。 @dstow 请不要在 cmets 中发布代码。编辑您的问题或发布新问题。评论删除并改为编辑原始帖子。

以上是关于Xpath获取两个a标签之间p内的所有文本的主要内容，如果未能解决你的问题，请参考以下文章

如何用bs或者xpath获取指定标签下的某一标签的所有文本?

Xpath基础学习

怎么用JS获取HTML标签内的内容

xpath获取a标签下文本

使用xpath提取页面所有a标签的href属性值

在 Selenium WebDriver 上，如何从文本标签内的标题标签中获取文本