Amazon redshift 中的每月保留
Posted
技术标签:
【中文标题】Amazon redshift 中的每月保留【英文标题】:Monthly retention in Amazon redshift 【发布时间】:2016-04-04 23:07:03 【问题描述】:我正在尝试计算 Amazon Redshift
中的每月保留率并提出以下查询:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
由于某种原因,此查询返回 zero
retained_users
和 zero
retention
。对于为什么会发生这种情况,或者完全不同的每月保留查询可能会起作用,我将不胜感激。
我根据另一篇 SO 帖子修改了查询,结果如下:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
这个查询给了我以下错误:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
我感觉Query 2
会比Query 1
表现更好,所以我更愿意修复这个错误。
任何帮助将不胜感激。
【问题讨论】:
我认为您对查询 1 的问题是您已将间隔条件datediff ('month',activity.created_at,future_activity.created_at) = 1
放入 JOIN 中。不要认为那会奏效。因此连接失败,因此您在连接的右侧得到 NULL,导致计数为零。将条件移到“WHERE”时会发生什么?
【参考方案1】:
查询 1 看起来不错。我试过类似的。见下文。您在表 (ads.fbs_page_view_staging) 和同一列 (created_at) 上使用自联接。假设 mongo_id 是唯一的,datediff('month'....)
将始终返回 0,datediff ('month',activity.created_at,future_activity.created_at) = 1
将始终为 false。
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct
【讨论】:
完全正确,我加入了错误的列,应该是member_id
,而不是mongo_id
。我的愚蠢错误。感谢您指出这一点。以上是关于Amazon redshift 中的每月保留的主要内容,如果未能解决你的问题,请参考以下文章