SQL Server 将三个不同的列连接成逗号分隔且没有重复值
Posted
技术标签:
【中文标题】SQL Server 将三个不同的列连接成逗号分隔且没有重复值【英文标题】:SQL Server Concatenate three different columns into a Comma-Separated without repeated values 【发布时间】:2020-10-27 23:23:36 【问题描述】:下表是我在 SQL Server 中的问题的简化:
ID COLUMN_A COLUMN_B COLUMN_C
-------------------------------------
1 A B C
1 A B D
1 B C D
我想得到一个由逗号连接的列而不重复值的组。我尝试使用 STRING_AGG() 但它返回:
ID COLUMN_A COLUMN_B COLUMN_C
-------------------------------------
1 A, A, B B, B, C C, D, D
这是我所做的查询:
SELECT ID, STRING_AGG(COLUMN_A, ', ') AS COL_A, STRING_AGG(COLUMN_B, ', ') AS COL_B,
STRING_AGG(COLUMN_C, ', ') AS COL_C
FROM MYTABLE
GROUP BY ID;
我想要下一个结果:
ID COLUMN_A COLUMN_B COLUMN_C
-------------------------------------
1 A, B B, C C, D
谢谢!
【问题讨论】:
【参考方案1】:不幸的是,string_agg(distinct)
(还)不起作用。但是你可以做一些更复杂的事情:
SELECT ID,
STRING_AGG(CASE WHEN seqnum_a = 1 THEN COLUMN_A, END ', ') AS COLUMN_A,
STRING_AGG(CASE WHEN seqnum_b = 1 THEN COLUMN_B, END ', ') AS COLUMN_B,
STRING_AGG(CASE WHEN seqnum_c = 1 THEN COLUMN_C, END ', ') AS COLUMN_C
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_A ORDER BY ID) as seqnum_a,
ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_B ORDER BY ID) as seqnum_b,
ROW_NUMBER() OVER (PARTITION BY ID, COLUMN_C ORDER BY ID) as seqnum_c
FROM MYTABLE t
) t
GROUP BY ID;
因此,尽管 STRING_AGG()
不会删除重复项,但它会忽略 NULL
值。
【讨论】:
感谢您的快速回答。它正在抛出下一个错误,函数 ROW_NUMBER 必须有一个带有 ORDER BY 的 OVER 子句。【参考方案2】:不使用window functions
。 union
可能会减慢速度,但请尝试一下,看看您是否可以忍受这种性能。
with
cte1 (id, col, indicator) as
(select id, column_a, 'col1' from t union
select id, column_b, 'col2' from t union
select id, column_c, 'col3' from t),
cte2 (id, indicator, agg) as
(select id, indicator, string_agg(col,',')
from cte1
group by id, indicator)
select id,
max(case when indicator='col1' then agg end) as column_a,
max(case when indicator='col2' then agg end) as column_b,
max(case when indicator='col3' then agg end) as column_c
from cte2
group by id;
【讨论】:
我不明白它是如何工作的,但工作得很好。如果您能解释一下该功能,那将非常有帮助。非常感谢! @alejoaldana 我们首先以一种允许我们对其进行重复数据删除并指示每个堆栈指示的列的方式堆叠数据集。然后,您只需将它们聚合起来。 case 语句在那里,因此您可以在指标上设置三列。我已在此演示中将此代码分解为多个部分,以便您了解它是如何工作的 dbfiddle.uk/… 非常感谢!真的很有帮助!【参考方案3】:这是基于 XML 和 XQuery 的解决方案。
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT, COLUMN_A CHAR(1), COLUMN_B CHAR(1), COLUMN_C CHAR(1));
INSERT INTO @tbl (ID, COLUMN_A, COLUMN_B, COLUMN_C)
VALUES
(1,'A','B','C'),
(1,'A','B','D'),
(1,'B','C','D');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = ',';
;WITH rs AS
(
SELECT ID
, CAST('<root><r><![CDATA[' +
REPLACE(STRING_AGG(COLUMN_A, ','), @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML) AS COL_A
, CAST('<root><r><![CDATA[' +
REPLACE(STRING_AGG(COLUMN_B, ','), @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML) AS COL_B
, CAST('<root><r><![CDATA[' +
REPLACE(STRING_AGG(COLUMN_c, ','), @separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML) AS COL_C
FROM @tbl
GROUP BY ID
)
SELECT rs.ID
, COL_A.query('for $i in distinct-values(/root/r/text())
return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
else concat($i, sql:variable("@separator"))
').value('.', 'NVARCHAR(MAX)') AS COL_A
, COL_B.query('for $i in distinct-values(/root/r/text())
return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
else concat($i, sql:variable("@separator"))
').value('.', 'NVARCHAR(MAX)') AS COL_B
, COL_C.query('for $i in distinct-values(/root/r/text())
return if ($i eq (distinct-values(/root/r/text())[last()])[1]) then $i
else concat($i, sql:variable("@separator"))
').value('.', 'NVARCHAR(MAX)') AS COL_C
FROM rs;
输出
+----+-------+-------+-------+
| ID | COL_A | COL_B | COL_C |
+----+-------+-------+-------+
| 1 | A, B | B, C | C, D |
+----+-------+-------+-------+
【讨论】:
以上是关于SQL Server 将三个不同的列连接成逗号分隔且没有重复值的主要内容,如果未能解决你的问题,请参考以下文章