SQL - 增量案例语句 - 队列分析
Posted
技术标签:
【中文标题】SQL - 增量案例语句 - 队列分析【英文标题】:SQL - Incremental Case Statement - Cohort Analysis 【发布时间】:2020-08-03 14:46:28 【问题描述】:我的主要目标是快速了解用户在 30,90,180、180+ 天期间的平均收入。 我有一封电子邮件、他们加入某个群组的日期以及收入日期
create temporary table cohorts (
email varchar(64)
, start_date timestamp
, purchase_date timestamp
, amount decimal(10,2)
)
;
insert into cohorts
values
('johnsmith@domain.com', '2020-01-01 00:00:00', '2020-01-01 12:00:00', '200.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-02-28 00:00:00','100.00')
, ('happyday@domain.com', '2020-01-01 00:00:00', '2020-01-28 00:00:00','100.00')
, ('susieq@domain.com', '2020-01-01 00:00:00', '2020-05-01 00:00:00', '50.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-03-30 00:00:00', '75.00')
, ('janedoe@domain.com', '2020-01-01 00:00:00', '2020-07-30 00:00:00', '75.00')
;
如果我想查看某个时间段内用户的平均收入,我会写如下内容:
select
case
when datediff(day,start_date, purchase_date) < 30 then 'Within 30'
when datediff(day,start_date, purchase_date) < 90 then 'Within 90'
when datediff(day,start_date, purchase_date) < 180 then 'Within 180'
else 'Older than 180'
end as cohort_flag
, count(distinct email) num_of_emails
, sum(amount) summed_amt
, sum(amount)/count(distinct email) as avg_value
from cohorts
group by 1
cohort_flag num_of_emails summed_amt avg_value
Within 30 2 300.0 150.0
Within 90 2 175.0 87.5
Within 180 1 50.0 50.0
Older than 180 1 75.0 75.0
但是,由于 case 语句解析为第一个 true 子句,因此它不包括来自早期“同类”的收入。我想要的结果如下,早期同类群组中的用户是其他群组的一部分:
cohort_flag num_of_emails summed_amt avg_value
Within 30 2 300.0 150.0
Within 90 3 475.0 158.33
Within 180 4 525.0 131.25
Older than 180 4 600.0 150.0
【问题讨论】:
【参考方案1】:您必须将表的同一行用于 1 个以上的组,因此您需要这样的查询:
select 30 days_dif, 'Within 30' cohort_flag union all
select 90, 'Within 90' union all
select 180, 'Within 180' union all
select 2147483647, 'Older than 180'
(希望 Redshift 支持该语法)
它定义了组,然后LEFT
将其加入到表中:
select t.cohort_flag,
count(distinct c.email) num_of_emails,
coalesce(sum(c.amount), 0) summed_amt,
coalesce(sum(c.amount), 0) / nullif(count(distinct c.email), 0) as avg_value
from (
select 30 days_dif, 'Within 30' cohort_flag union all
select 90, 'Within 90' union all
select 180, 'Within 180' union all
select 2147483647, 'Older than 180'
) t left join cohorts c
on datediff(day, c.start_date, c.purchase_date) < t.days_dif
group by t.days_dif, t.cohort_flag
order by t.days_dif
【讨论】:
加入datediff(day, c.start_date, c.purchase_date) < t.days_dif
非常棒。以上是关于SQL - 增量案例语句 - 队列分析的主要内容,如果未能解决你的问题,请参考以下文章
SQL:数据与运算的融合体(附用一条语句实现时间序列的增量运算)