Muenchian Grouping - 在一个节点内分组，而不是在整个文档内

Posted 2023-02-16

技术标签:

【中文标题】Muenchian Grouping - 在一个节点内分组，而不是在整个文档内【英文标题】：Muenchian Grouping - group within a node, not within the entire document 【发布时间】：2010-12-17 17:33:01 【问题描述】：

我试图在我的 XSLT 中使用 Muenchian 分组来对匹配节点进行分组，但我只想在父节点内进行分组，而不是在整个源 XML 文档中进行分组。

给定 XSLT 和 XML 如下（对我的示例代码的长度表示歉意）：

XSLT

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"> 
 <xsl:output method="html" indent="yes"/>

 <xsl:key name="contacts-by-surname" match="contact" use="surname" />
 <xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

XML

<root>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>John</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Amy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Mary</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Anne</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>Peter</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Indy</forename>
   <surname>Jones</surname>
  </contact>
 </records>
 <records>
  <contact id="0001">
   <title>Mr</title>
   <forename>James</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0002">
   <title>Dr</title>
   <forename>Mandy</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0003">
   <title>Mrs</title>
   <forename>Elizabeth</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0004">
   <title>Ms</title>
   <forename>Sally</forename>
   <surname>Jones</surname>
  </contact>
  <contact id="0005">
   <title>Mr</title>
   <forename>George</forename>
   <surname>Smith</surname>
  </contact>
  <contact id="0006">
   <title>Dr</title>
   <forename>Harry</forename>
   <surname>Jones</surname>
  </contact>
 </records>
</root>

结果

Jones,
Amy (Dr)
Anne (Ms)
Harry (Dr)
Indy (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)
John (Mr)
Mary (Mrs)
Peter (Mr)

我如何在每个<records> 中分组并实现此结果：

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)

Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)

Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)

Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)

【问题讨论】：

克里斯蒂安，在您想要的结果中，姓氏中的名字没有排序。我假设它们应该是，因为您在 xslt 中明确对名字进行排序。关于排序的要点 - 已更新问题以对结果中的名字进行排序。 【参考方案1】：

花了我一些时间......我正要放弃但仍然继续:)

key函数的缺点是生成的key总是针对整个xml。因此，您应该在密钥中连接其他信息以使其更具体。在例如下面，我将连接记录节点的位置，以便为每个记录获取不同姓氏的键。

这是 xslt：

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="html" indent="yes"/>
  <xsl:key name="distinct-surname" match="contact" use="concat(generate-id(..), '|', surname)"/>
  <xsl:template match="records">
    <xsl:for-each select="contact[generate-id() = generate-id(key('distinct-surname', concat(generate-id(..), '|', surname))[1])]">
      <xsl:sort select="surname" />
      <xsl:value-of select="surname" />,<br />
      <xsl:for-each select="key('distinct-surname', concat(generate-id(..), '|', surname))">
        <xsl:sort select="forename" />
        <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>  
</xsl:stylesheet>

这是结果：

Jones,
Amy (Dr)
Anne (Ms)
Indy (Dr)
Smith,
John (Mr)
Mary (Mrs)
Peter (Mr)
Jones,
Harry (Dr)
Mandy (Dr)
Sally (Ms)
Smith,
Elizabeth (Mrs)
George (Mr)
James (Mr)

请注意，结果也是按名字排序的。如果你不想按名字排序，你需要删除<xsl:sort select="forename" />这一行

【讨论】：

很好的答案和解释。谢谢！这就是我会做的，+1。我提出一个微小的改变：使用concat(generate-id(..), '|', surname)，而不是concat(count(parent::*/preceding-sibling::*), surname)。由于额外的分隔符字符，它更短、更高效、更安全。 Tomalak，我已按照您的建议编辑了 xslt。谢谢:)【参考方案2】：

有一个更简单的方法，通过添加一个谓词来确保参与 muench 测试的 contacts 是当前 records 的子级。

<xsl:key name="contacts-by-surname" match="contact" use="surname" />
<xsl:template match="records">
  <xsl:for-each select="contact[count(. | key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current())][1]) = 1]">
   <xsl:sort select="surname" />
   <xsl:value-of select="surname" />,<br />
   <xsl:for-each select="key('contacts-by-surname', surname)[generate-id(parent::records) = generate-id(current()/parent::records)]">
    <xsl:sort select="forename" />
    <xsl:value-of select="forename" /> (<xsl:value-of select="title" />)<br />
   </xsl:for-each>
  </xsl:for-each>
</xsl:template>

【讨论】：

它可能更简单，但效率也较低。我想说contact[generate-id() = generate-id(…[…])] 在最坏的情况下是 O(n²)，而 @Rashmi Pandit 的 contact[generate-id() = generate-id(…)] 是 O(n)。我认为可能效率较低，但更健壮。将字符串连接成复合键意味着分隔符字符串永远不会出现在任何使用的字符串中。我更喜欢确定性行为而不是最快的运行。 :) 嗯……我能想到一种 id-value 不明确的方法（id "key-30", value "0" vs. id "key-300", value ""），但是对于 id-separator -value（是 "id-30|0" 与 "id-300|"）？恕我直言，值中是否存在分隔符无关紧要。我错过了什么吗？ id "0|1" & value "2", id "0" & value "1|2" 将产生与 "|" 相同的键 "0|1|2"作为分隔符。我同意，在这种特殊情况下，id 不应包含任何“|” （根据 W3C XSLT 规范，generate-id() 返回字母数字 ASCII 字符）但是如果您使用两个或多个值，问题是相同的。在许多情况下，带有串联字符串的复合键并不安全，因此我根本不希望将它们用作“良好做法”。

以上是关于Muenchian Grouping - 在一个节点内分组，而不是在整个文档内的主要内容，如果未能解决你的问题，请参考以下文章