计算上一期而不是下一期购买的不同客户 Bigquery
Posted
技术标签:
【中文标题】计算上一期而不是下一期购买的不同客户 Bigquery【英文标题】:Count distinct customers who bought in previous period and not in next period Bigquery 【发布时间】:2021-09-01 09:13:24 【问题描述】:我在 bigquery 中有一个数据集,其中包含 order_date: DATE 和 customer_id。
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
我尝试在上一年的月份和当年的相同月份之间计算不同的 customer_id。比如从2019-01-01到2020-01-01,然后从2019-02-01到2020-02-01,然后谁没有在明年同期2020-01-01到2021- 01-01,然后 2020-02-01 到 2021-02-01。 我期望的输出
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
并且下一个期间不应包括上一个。
我尝试了下面的代码,但它以另一种方式工作
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
提前致谢。
【问题讨论】:
是否有任何我们可以看到的代码代表您尝试过的代码,或者您正在寻求帮助而没有花任何时间解决问题? 【参考方案1】:如果我理解正确,您是在尝试在时间窗口上计算值,为此我建议使用窗口函数 - docs here 和 here 是一篇很好的文章,解释了它是如何工作的。
也就是说,我的建议是:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
我相信您可以从中构建一些自定义项来改进您想要的聚合。
【讨论】:
【参考方案2】:您可以使用unnest(generate_date_array())
生成句点。然后使用join
s 引入前 12 个月和未来 12 个月的客户。最后,汇总并统计客户数:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;
【讨论】:
以上是关于计算上一期而不是下一期购买的不同客户 Bigquery的主要内容,如果未能解决你的问题,请参考以下文章