在 STRING_AGG 中产生 DISTINCT 值
Posted
技术标签:
【中文标题】在 STRING_AGG 中产生 DISTINCT 值【英文标题】:Produce DISTINCT values in STRING_AGG 【发布时间】:2019-01-09 19:09:31 【问题描述】:我正在使用 SQL Server 2017 中的 STRING_AGG 函数。我想创建与 COUNT(DISTINCT <column>)
相同的效果。我试过STRING_AGG(DISTINCT <column>,',')
,但这不是合法的语法。
我想知道是否有 T-SQL 解决方法。这是我的示例:
WITH Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings
GROUP BY State, City
上面的结果如下:
+---------+-----------+--------------+-------------------------+
| State | City | # Of Types | Animals |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff | 1 | dog |
| Florida | Orlando | 2 | dog,bird |
| Arizona | Phoenix | 2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+
输出正是我想要的,除了我希望为亚利桑那州凤凰城列出的串联“动物”是 DISTINCT,如下所示:
+---------+-----------+--------------+--------------------+
| State | City | # Of Types | Animals |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff | 1 | dog |
| Florida | Orlando | 2 | dog,bird |
| Arizona | Phoenix | 2 | bird,dog |
+---------+-----------+--------------+--------------------+
有什么想法吗?
当我使用更大的真实数据集时,我收到关于“动物”列超过 8000 个字符的错误。
我认为我的问题与this one 相同,只是我的示例要简单得多。
【问题讨论】:
【参考方案1】:这是一种方法。
由于您也需要不同的计数,因此只需将行分组两次即可。第一个GROUP BY
将删除重复项,第二个GROUP BY
将产生最终结果。
WITH
Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
SELECT
State, City, Siting
FROM Sitings
GROUP BY State, City, Siting
)
SELECT
State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
State
,City
;
结果
+---------+-----------+--------------+----------+
| State | City | # Of Sitings | Animals |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff | 1 | dog |
| Arizona | Phoenix | 2 | bird,dog |
| Florida | Orlando | 2 | bird,dog |
+---------+-----------+--------------+----------+
如果您仍然收到有关超过 8000 个字符的错误消息,请将值转换为 varchar(max)
,然后再转换为 STRING_AGG
。
类似
STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals
【讨论】:
顺便说一句:显式双 GROUP BY 没有性能缺陷,因为聚合函数中的 DISTINCT 也会进行隐式分组(或排序不同)。当您将原始查询的执行计划与此解决方案中的执行计划进行比较时,您会发现双重分组要快得多(我在数百万行表上对其进行了测试,发现 CPU 时间和因数约为 3 ~12 总执行时间)【参考方案2】:只需使用sub-query
WITH Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
select State,City,count(*) as [# Of Types],STRING_AGG(Siting,',') AS Animals from
(
SELECT State, City, Siting
FROM Sitings
GROUP BY State, City,Siting
) as T group by State,City
http://sqlfiddle.com/#!18/ba4b8/11
State City # Of Types Animals
Arizona Flagstaff 1 dog
Florida Orlando 2 bird,dog
Arizona Phoenix 2 bird,dog
【讨论】:
【参考方案3】:这里还有另一种方法 (sql fiddle):
WITH Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
select State,City,count(*) as [# Of Types],(select string_agg(value,', ') from (select distinct value from string_split(string_agg(Siting, ','),',')) t) AS Animals
FROM Sitings
GROUP BY State, City
您可以轻松地将拆分和合并部分转换为可重用的标量值函数。
欢迎专家对性能发表评论。
【讨论】:
【参考方案4】:当然是一个很晚的回复。
这是另一种方式。 STRING_AGG(Siting,',')
中的 Siting 可以被子查询以返回 SITTING 的 DISTINCT 列表,其中分组的键匹配 STATE 和 CITIES。
WITH Sitings
AS
(
SELECT * FROM (VALUES
(1, 'Florida', 'Orlando', 'bird'),
(2, 'Florida', 'Orlando', 'dog'),
(3, 'Arizona', 'Phoenix', 'bird'),
(4, 'Arizona', 'Phoenix', 'dog'),
(5, 'Arizona', 'Phoenix', 'bird'),
(6, 'Arizona', 'Phoenix', 'bird'),
(7, 'Arizona', 'Phoenix', 'bird'),
(8, 'Arizona', 'Flagstaff', 'dog')
) F (ID, State, City, Siting)
)
SELECT
S.State,
S.City,
COUNT(DISTINCT S.Siting) AS [# Of Types],
--STRING_AGG(S.Siting,',') AS Animals
(
SELECT STRING_AGG(U.SITING, ',')
FROM
(
SELECT DISTINCT T.Siting
FROM Sitings AS T
WHERE
T.State = S.State AND
T.City = S.City
) AS U
) AS ANIMAL
FROM
Sitings AS S
GROUP BY
S.State,
S.City
ORDER BY
S.State,
S.City
【讨论】:
这不是一个好的答案,因为您正在对每一行进行子查询。相反,您应该使用连接到主查询的相关查询。以上是关于在 STRING_AGG 中产生 DISTINCT 值的主要内容,如果未能解决你的问题,请参考以下文章