在 Snowflake 中使用 Count Distinct 和 Pivot

Posted

技术标签:

【中文标题】在 Snowflake 中使用 Count Distinct 和 Pivot【英文标题】:Using Count Distinct with Pivot in Snowflake 【发布时间】:2021-04-27 00:33:42 【问题描述】:

我正在尝试对列 Join_mon 进行旋转并获取每个 ID 的聚合计数,如以下查询所示;

select *
from CTE3
pivot(COUNT(DISTINCT platform_payer_name) for Join_Mon in (
 '2021-03-01',
 '2021-02-01',
  '2021-01-01',
 '2020-12-01'

        ))
  as p
order by ID
)

如您所见,我正在尝试为platform_payer_name 列明确计数她。但它给出了以下错误;

SQL compilation error: syntax error line 48 at position 16 unexpected 'DISTINCT'

我很肯定DISTINCT 与雪花中的COUNT 合作。我可以得到一些帮助,为什么它在这里失败。感谢您的帮助。

【问题讨论】:

你能告诉我们输入数据和预期结果吗? Pivot 对这种语法感到困惑,但我们可以找到另一种方法来完成这项工作 【参考方案1】:

所以制作一些映射到您的枢轴的虚假数据,尽管我放弃了过多的括号

with cte3(id, platform_payer_name, Join_Mon) as (
    select * from values
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-02-01'),
        (2,'bb', '2012-03-01'),
        (2,'cc', '2020-12-01')
)
select *
from CTE3 AS c
pivot(COUNT(c.platform_payer_name) for c.Join_Mon in (
         '2021-03-01',
         '2021-02-01',
         '2021-01-01',
         '2020-12-01' )
) as p
order by id;

给予:

ID  '2021-03-01'    '2021-02-01'    '2021-01-01'    '2020-12-01'
1   3               1               0               0
2   0               0               0               1

所以你想要distinct 是有道理的

但似乎不支持..

因此,虽然它在某种程度上容易发生剪切粘贴错误,但它确实“有效”:

with cte3(id, platform_payer_name, Join_Mon) as (
    select * from values
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-02-01'),
        (2,'bb', '2012-03-01'),
        (2,'cc', '2020-12-01')
)
select id
    ,count(distinct(iff(Join_Mon='2021-03-01',platform_payer_name,null))) as "2021-03-01"
    ,count(distinct(iff(Join_Mon='2021-02-01',platform_payer_name,null))) as "2021-02-01"
    ,count(distinct(iff(Join_Mon='2021-01-01',platform_payer_name,null))) as "2021-01-01"
    ,count(distinct(iff(Join_Mon='2020-12-01',platform_payer_name,null))) as "2020-12-01"
from CTE3 AS c
group by 1 order by 1;

给予:

ID  2021-03-01  2021-02-01  2021-01-01  2020-12-01
1   1           1           0           0
2   0           0           0           1

之所以有效,是因为 pivot 正在执行两项任务,第一项是将与输入匹配的值移动到列中,因此与以下内容相同:

with cte3(id, platform_payer_name, Join_Mon) as (
select * from values
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-03-01'),
    (1,'aa', '2021-02-01'),
    (2,'bb', '2012-03-01'),
    (2,'cc', '2020-12-01')
)
select id
    ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
    ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
    ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
    ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
from CTE3 AS c
order by 1;

给出:

ID, 2021-03-01, 2021-02-01, 2021-01-01, 2020-12-01
1,  aa,         NULL,       NULL,       NULL
1,  aa,         NULL,       NULL,       NULL
1,  aa,         NULL,       NULL,       NULL
1,  NULL,       aa,         NULL,       NULL
2,  NULL,       NULL,       NULL,       NULL
2,  NULL,       NULL,       NULL,       cc

然后可以在每一列上运行count(distinct x)

select id
    ,count(distinct("2021-03-01")) as "2021-03-01"
    ,count(distinct("2021-02-01")) as "2021-02-01"
    ,count(distinct("2021-01-01")) as "2021-01-01"
    ,count(distinct("2020-12-01")) as "2020-12-01"
from (
    select id
        ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
        ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
        ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
        ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
    from CTE3 AS c
)
group by id
order by id;

或者可以像我在第一个答案中显示的那样内联完成。

【讨论】:

【参考方案2】:

雪花支持COUNT_IF:

SELECT id,
       COUNT_IF(join_mon='2021-03-01') AS "2021-03-01",
       COUNT_IF(join_mon='2021-02-01') AS "2021-02-01",
       COUNT_IF(join_mon='2021-01-01') AS "2021-01-01"
FROM (SELECT DISTINCT id, platform_name, join_mon FROM cte) s
GROUP BY id
ORDER BY id;

【讨论】:

以上是关于在 Snowflake 中使用 Count Distinct 和 Pivot的主要内容,如果未能解决你的问题,请参考以下文章

请教一个问题,arduino怎么读取一个字节里的某两位数据

Snowflake:SQL 编译错误:不是有效的 group by 表达式

如何在 Snowflake 中找到我的 Snowflake 账单?

在 SNOWFLAKE 中进行横向展平时重复主键

Snowflake 是不是支持索引?

在 Snowflake 中灵活命名表