如何一次处理多个 xpath（基于提要结构）或创建我自己的具有相同结构的提要

Posted 2023-02-24

技术标签:

【中文标题】如何一次处理多个 xpath（基于提要结构）或创建我自己的具有相同结构的提要【英文标题】：How to handle multiple xpath at once (based on feed structure) or create my own feeds with the same structure 【发布时间】：2011-09-11 22:49:37 【问题描述】：

下面的代码已经过测试并且可以运行，它会打印具有这种结构的提要的内容。

<rss>
    <channel>
        <item>
            <pubDate/>
            <title/>
            <description/>
            <link/>
            <author/>
        </item>
    </channel>
</rss>

即使我将 xpath 更改为 /feed//entry，我也没有成功地打印遵循以下结构的提要（区别在于 <feed><entry><published>）。您可以在页面源中看到结构。

<feed>
    <entry>
        <published/>
        <title/>
        <description/>
        <link/>
        <author/>
    </entry>
</feed>

我不得不说代码根据其pubDate 对所有item 进行排序。在第二个结构提要中，我猜它应该根据其published 对所有entry 进行排序。

我可能在找不到的 xPath 上出错了。但是，如果最后我设法正确打印该提要，我该如何修改代码以同时处理不同的结构？

是否有任何服务允许我根据这些提要创建和托管我自己的提要，以便我对所有人拥有相同的结构？我希望我说清楚了……谢谢。

<?php

$feeds = array();

// Get all feed entries
$entries = array();
foreach ($feeds as $feed) 
    $xml = simplexml_load_file($feed);
    $entries = array_merge($entries, $xml->xpath(''));


?>

【问题讨论】：

“我可能在找不到的 xPath 上出错了。”你说的是哪个 XPath？好问题，+1。请参阅我的答案以获得一般解决方案，其中您提供替代元素名称作为参数并且它......有效。 :) 如果您不熟悉 XML 和命名空间，这似乎并不重要，但如果您使用 RSS 和 ATOM 提要，则 ATOM 元素位于 ATOM 命名空间中：http://www.w3.org/2005/Atom。您的 ATOM XML 示例未反映您正在使用的文档中可能存在的名称空间。 【参考方案1】：

这个答案的主要贡献是一个解决方案（最后），可以与无限数量的格式一起使用，只需在外部（全局）参数 @ 中指定所有“条目”替代名称987654321@ 以及外部（全局）参数$pub-dateElements 中的所有“发布日期”替代名称。

除此之外，这里是如何指定选择所有/rss//item 和所有/feed//entry 元素的XPath 表达式。

在只有两种可能的文档格式的简单情况下这个（由@Josh Davis 提出）Xpath 表达式正确工作：

/rss//item  |   /feed//entry

更通用的 XPath 表达式允许从一组无限数量的文档格式中选择所需的元素：

/*[contains($topElements, concat('|',name(),'|'))]
    //*[contains($postElements, concat('|',name(),'|'))]

其中变量$topElements 应替换为顶部元素的所有可能名称的管道分隔字符串，$postElements 应替换为“条目”元素的所有可能名称的管道分隔字符串.我们还允许“入口”元素在不同的文档格式中处于不同的深度。

特别是，对于这种具体情况，XPath 表达式将是；

/*[contains('|feed|rss|', concat('|',name(),'|'))]
    //*[contains('|item|entry|', concat('|',name(),'|'))]

本文的其余部分展示了如何完全在 XSLT 中完成所需的完整处理——轻松而优雅。

我。 温和的介绍

使用 XSLT 进行此类处理非常简单：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <myFeed>
   <xsl:apply-templates/>
  </myFeed>
 </xsl:template>

 <xsl:template match="channel|feed">
  <xsl:apply-templates select="*">
   <xsl:sort select="pubDate|published" order="descending"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="item|entry">
  <post>
    <xsl:apply-templates mode="identity"/>
  </post>
 </xsl:template>

 <xsl:template match="pubDate|published" mode="identity">
  <publicationDate>
   <xsl:apply-templates/>
  </publicationDate>
 </xsl:template>

  <xsl:template match="node()|@*" mode="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="identity"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

将此转换应用于此 XML 文档时（格式 1）：

<rss>
    <channel>
        <item>
            <pubDate>2011-06-05</pubDate>
            <title>Title1</title>
            <description>Description1</description>
            <link>Link1</link>
            <author>Author1</author>
        </item>
        <item>
            <pubDate>2011-06-06</pubDate>
            <title>Title2</title>
            <description>Description2</description>
            <link>Link2</link>
            <author>Author2</author>
        </item>
        <item>
            <pubDate>2011-06-07</pubDate>
            <title>Title3</title>
            <description>Description3</description>
            <link>Link3</link>
            <author>Author3</author>
        </item>
    </channel>
</rss>

以及当它应用于此等效文档时（格式 2）：

<feed>
        <entry>
            <published>2011-06-05</published>
            <title>Title1</title>
            <description>Description1</description>
            <link>Link1</link>
            <author>Author1</author>
        </entry>
        <entry>
            <published>2011-06-06</published>
            <title>Title2</title>
            <description>Description2</description>
            <link>Link2</link>
            <author>Author2</author>
        </entry>
        <entry>
            <published>2011-06-07</published>
            <title>Title3</title>
            <description>Description3</description>
            <link>Link3</link>
            <author>Author3</author>
        </entry>
</feed>

在这两种情况下都需要相同的结果，产生正确的结果：

<myFeed>
   <post>
      <publicationDate>2011-06-07</publicationDate>
      <title>Title3</title>
      <description>Description3</description>
      <link>Link3</link>
      <author>Author3</author>
   </post>
   <post>
      <publicationDate>2011-06-06</publicationDate>
      <title>Title2</title>
      <description>Description2</description>
      <link>Link2</link>
      <author>Author2</author>
   </post>
   <post>
      <publicationDate>2011-06-05</publicationDate>
      <title>Title1</title>
      <description>Description1</description>
      <link>Link1</link>
      <author>Author1</author>
   </post>
</myFeed>

二。完整的解决方案

这可以推广到参数化解决方案：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="postElements" select=
 "'|entry|item|'"/>
 <xsl:param name="pub-dateElements" select=
  "'|published|pubDate|'"/>

  <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="identity"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <myFeed>
   <xsl:apply-templates select=
   "//*[contains($postElements, concat('|',name(),'|'))]">
    <xsl:sort order="descending" select=
     "*[contains($pub-dateElements, concat('|',name(),'|'))]"/>
   </xsl:apply-templates>
  </myFeed>
 </xsl:template>

 <xsl:template match="*">
  <xsl:choose>
   <xsl:when test=
    "contains($postElements, concat('|',name(),'|'))">
    <post>
      <xsl:apply-templates/>
    </post>
   </xsl:when>
   <xsl:when test=
   "contains($pub-dateElements, concat('|',name(),'|'))">
    <publicationDate>
     <xsl:apply-templates/>
    </publicationDate>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="identity"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

</xsl:stylesheet>

此转换可用于无数种格式，只需在外部（全局）参数 $postElements 中指定所有“条目”替代名称，并在外部（全局）参数$pub-dateElements。

任何人都可以尝试这种转换，以验证当应用于上面的两个 XML 文档时，它再次产生相同的、想要的和正确的结果。

【讨论】：

这是一个很好的答案，谢谢。所以，我有一个加载 stylesheet.xslt 和 eshop1.xml 的 PHP 代码。如何加载多个 xml，例如 eshop1.xml 和 eshop2.xml ？ @Punkis：不客气。至于您的下一个问题，XSLT 具有处理多个 XML 文档的标准功能——请阅读标准 XSLT document() 函数。此外，XSLT 2.0 可以轻松生成多个结果文档——请阅读<xsl:result-document> 元素。如果您使用的是 XSLT 1.0，您可以生成一个聚合结果，然后通过使用 DOM（丑陋）或应用（每个结果一次）另一个只会产生一个结果的 XSLT 转换将其拆分并保存到所需文件中.【参考方案2】：

这个问题实际上是两个问题，“如何同时处理多个 xpath”和“[如何] 创建我自己的具有相同结构的提要”。

Dimitre Novatchev 出色地回答了第二个问题。如果您想“合并”或转换一个或多个 XML 文档，那绝对是我推荐的。

同时，我将采用简单的方法解决第一个问题，“如何同时处理多个 xpath”。这很简单，有一个运算符：|。如果要查询匹配/feed//entry 或/rss//item 的所有节点，则可以使用/feed//entry | /rss//item。

【讨论】：

【参考方案3】：

这里有一个解决方案。

问题在于许多 RSS 或 Atom 提要定义的名称空间不能很好地与 SimpleXML 配合使用。在下面的示例中，我使用 str_replace 将 xmlns= 替换为 ns=。然后，我使用根元素的名称来确定提要的类型（是 RSS 还是 Atom）。

array_push 调用负责将所有条目添加到 $entries 数组中，供您以后使用。

$entries = array();

foreach ( $feeds as $feed )

  $xml = simplexml_load_string(str_replace('xmlns=', 'ns=', $feed));

  switch ( strtolower($xml->getName()) )
  
    // Atom
    case 'feed':
      array_push($entries, $xml->xpath('/feed//entry'));

      break;

    // RSS
    case 'rss':
      array_push($entries, $xml->xpath('/rss//item'));

      break;
  

  // Unset the namespace variable.
  unset($namespaces);


var_dump($entries);

另一种解决方案可能是使用Google Reader 聚合所有供稿并使用该供稿而不是所有单独的供稿。

【讨论】：

以上是关于如何一次处理多个 xpath（基于提要结构）或创建我自己的具有相同结构的提要的主要内容，如果未能解决你的问题，请参考以下文章