在 Snowflake 中使用 Count Distinct 和 Pivot
Posted
技术标签:
【中文标题】在 Snowflake 中使用 Count Distinct 和 Pivot【英文标题】:Using Count Distinct with Pivot in Snowflake 【发布时间】:2021-04-27 00:33:42 【问题描述】:我正在尝试对列 Join_mon
进行旋转并获取每个 ID
的聚合计数,如以下查询所示;
select *
from CTE3
pivot(COUNT(DISTINCT platform_payer_name) for Join_Mon in (
'2021-03-01',
'2021-02-01',
'2021-01-01',
'2020-12-01'
))
as p
order by ID
)
如您所见,我正在尝试为platform_payer_name
列明确计数她。但它给出了以下错误;
SQL compilation error: syntax error line 48 at position 16 unexpected 'DISTINCT'
我很肯定DISTINCT
与雪花中的COUNT
合作。我可以得到一些帮助,为什么它在这里失败。感谢您的帮助。
【问题讨论】:
你能告诉我们输入数据和预期结果吗? Pivot 对这种语法感到困惑,但我们可以找到另一种方法来完成这项工作 【参考方案1】:所以制作一些映射到您的枢轴的虚假数据,尽管我放弃了过多的括号
with cte3(id, platform_payer_name, Join_Mon) as (
select * from values
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-02-01'),
(2,'bb', '2012-03-01'),
(2,'cc', '2020-12-01')
)
select *
from CTE3 AS c
pivot(COUNT(c.platform_payer_name) for c.Join_Mon in (
'2021-03-01',
'2021-02-01',
'2021-01-01',
'2020-12-01' )
) as p
order by id;
给予:
ID '2021-03-01' '2021-02-01' '2021-01-01' '2020-12-01'
1 3 1 0 0
2 0 0 0 1
所以你想要distinct
是有道理的
但似乎不支持..
因此,虽然它在某种程度上容易发生剪切粘贴错误,但它确实“有效”:
with cte3(id, platform_payer_name, Join_Mon) as (
select * from values
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-02-01'),
(2,'bb', '2012-03-01'),
(2,'cc', '2020-12-01')
)
select id
,count(distinct(iff(Join_Mon='2021-03-01',platform_payer_name,null))) as "2021-03-01"
,count(distinct(iff(Join_Mon='2021-02-01',platform_payer_name,null))) as "2021-02-01"
,count(distinct(iff(Join_Mon='2021-01-01',platform_payer_name,null))) as "2021-01-01"
,count(distinct(iff(Join_Mon='2020-12-01',platform_payer_name,null))) as "2020-12-01"
from CTE3 AS c
group by 1 order by 1;
给予:
ID 2021-03-01 2021-02-01 2021-01-01 2020-12-01
1 1 1 0 0
2 0 0 0 1
之所以有效,是因为 pivot 正在执行两项任务,第一项是将与输入匹配的值移动到列中,因此与以下内容相同:
with cte3(id, platform_payer_name, Join_Mon) as (
select * from values
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-03-01'),
(1,'aa', '2021-02-01'),
(2,'bb', '2012-03-01'),
(2,'cc', '2020-12-01')
)
select id
,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
from CTE3 AS c
order by 1;
给出:
ID, 2021-03-01, 2021-02-01, 2021-01-01, 2020-12-01
1, aa, NULL, NULL, NULL
1, aa, NULL, NULL, NULL
1, aa, NULL, NULL, NULL
1, NULL, aa, NULL, NULL
2, NULL, NULL, NULL, NULL
2, NULL, NULL, NULL, cc
然后可以在每一列上运行count(distinct x)
。
select id
,count(distinct("2021-03-01")) as "2021-03-01"
,count(distinct("2021-02-01")) as "2021-02-01"
,count(distinct("2021-01-01")) as "2021-01-01"
,count(distinct("2020-12-01")) as "2020-12-01"
from (
select id
,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
from CTE3 AS c
)
group by id
order by id;
或者可以像我在第一个答案中显示的那样内联完成。
【讨论】:
【参考方案2】:雪花支持COUNT_IF:
SELECT id,
COUNT_IF(join_mon='2021-03-01') AS "2021-03-01",
COUNT_IF(join_mon='2021-02-01') AS "2021-02-01",
COUNT_IF(join_mon='2021-01-01') AS "2021-01-01"
FROM (SELECT DISTINCT id, platform_name, join_mon FROM cte) s
GROUP BY id
ORDER BY id;
【讨论】:
以上是关于在 Snowflake 中使用 Count Distinct 和 Pivot的主要内容,如果未能解决你的问题,请参考以下文章
Snowflake:SQL 编译错误:不是有效的 group by 表达式