在 STRING_AGG 中产生 DISTINCT 值

Posted

技术标签:

【中文标题】在 STRING_AGG 中产生 DISTINCT 值【英文标题】:Produce DISTINCT values in STRING_AGG 【发布时间】:2019-01-09 19:09:31 【问题描述】:

我正在使用 SQL Server 2017 中的 STRING_AGG 函数。我想创建与 COUNT(DISTINCT <column>) 相同的效果。我试过STRING_AGG(DISTINCT <column>,','),但这不是合法的语法。

我想知道是否有 T-SQL 解决方法。这是我的示例:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

上面的结果如下:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

输出正是我想要的,除了我希望为亚利桑那州凤凰城列出的串联“动物”是 DISTINCT,如下所示:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

有什么想法吗?

当我使用更大的真实数据集时,我收到关于“动物”列超过 8000 个字符的错误。

我认为我的问题与this one 相同,只是我的示例要简单得多。

【问题讨论】:

【参考方案1】:

这是一种方法。

由于您也需要不同的计数,因此只需将行分组两次即可。第一个GROUP BY 将删除重复项,第二个GROUP BY 将产生最终结果。

WITH
Sitings
AS
(
    SELECT * FROM (VALUES 
    (1, 'Florida', 'Orlando', 'bird'),
    (2, 'Florida', 'Orlando', 'dog'),
    (3, 'Arizona', 'Phoenix', 'bird'),
    (4, 'Arizona', 'Phoenix', 'dog'),
    (5, 'Arizona', 'Phoenix', 'bird'),
    (6, 'Arizona', 'Phoenix', 'bird'),
    (7, 'Arizona', 'Phoenix', 'bird'),
    (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
    SELECT
        State, City, Siting
    FROM Sitings
    GROUP BY State, City, Siting
)
SELECT
    State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
    State
    ,City
;

结果

+---------+-----------+--------------+----------+
|  State  |   City    | # Of Sitings | Animals  |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff |            1 | dog      |
| Arizona | Phoenix   |            2 | bird,dog |
| Florida | Orlando   |            2 | bird,dog |
+---------+-----------+--------------+----------+

如果您仍然收到有关超过 8000 个字符的错误消息,请将值转换为 varchar(max),然后再转换为 STRING_AGG

类似

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals

【讨论】:

顺便说一句:显式双 GROUP BY 没有性能缺陷,因为聚合函数中的 DISTINCT 也会进行隐式分组(或排序不同)。当您将原始查询的执行计划与此解决方案中的执行计划进行比较时,您会发现双重分组要快得多(我在数百万行表上对其进行了测试,发现 CPU 时间和因数约为 3 ~12 总执行时间)【参考方案2】:

只需使用sub-query

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

    select State,City,count(*) as [# Of Types],STRING_AGG(Siting,',') AS Animals from 
    (
      SELECT State, City, Siting
    FROM Sitings 
    GROUP BY State, City,Siting
    ) as T  group by State,City

http://sqlfiddle.com/#!18/ba4b8/11

  State     City    # Of Types  Animals
Arizona     Flagstaff   1   dog
Florida     Orlando     2   bird,dog
Arizona     Phoenix     2   bird,dog

【讨论】:

【参考方案3】:

这里还有另一种方法 (sql fiddle):

  WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 

select State,City,count(*) as [# Of Types],(select string_agg(value,', ') from (select distinct value from string_split(string_agg(Siting, ','),',')) t) AS Animals
FROM Sitings 
GROUP BY State, City

您可以轻松地将拆分和合并部分转换为可重用的标量值函数。

欢迎专家对性能发表评论。

【讨论】:

【参考方案4】:

当然是一个很晚的回复。

这是另一种方式。 STRING_AGG(Siting,',') 中的 Siting 可以被子查询以返回 SITTING 的 DISTINCT 列表,其中分组的键匹配 STATE 和 CITIES。

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

SELECT 
    S.State, 
    S.City, 
    COUNT(DISTINCT S.Siting) AS [# Of Types],

    --STRING_AGG(S.Siting,',') AS Animals
    (
        SELECT STRING_AGG(U.SITING, ',')
        FROM 
        (
            SELECT DISTINCT T.Siting
            FROM Sitings AS T
            WHERE 
                T.State = S.State AND
                T.City = S.City
        ) AS U
    ) AS ANIMAL

FROM 
    Sitings AS S
GROUP BY 
    S.State, 
    S.City
ORDER BY
    S.State, 
    S.City

【讨论】:

这不是一个好的答案,因为您正在对每一行进行子查询。相反,您应该使用连接到主查询的相关查询。

以上是关于在 STRING_AGG 中产生 DISTINCT 值的主要内容,如果未能解决你的问题,请参考以下文章

python 在蟒蛇中产生诗歌

对象创建在 oracle 中产生错误

在 HTML 中产生 LIGHT COLOR 的理论机制是啥?

画布签名触摸在 phonegap 中产生问题

在Java中产生小写字符而不是大写

向单元格动态添加视图会在表格中产生问题