Redshift 的月数保留队列计算问题
Posted
技术标签:
【中文标题】Redshift 的月数保留队列计算问题【英文标题】:Month number retention cohort calculation issue with Redshift 【发布时间】:2019-04-04 21:36:43 【问题描述】:我正在尝试按月计算过去 9 个月在 redshift 中的用户保留同类群组。但是我遇到了一个问题,即以下查询中的月份队列没有被滚动到正确的月份。
我查询的数据类型是:
用户 ID - varchar 激活的varchar
这是我要运行的查询:
with by_month as
(SELECT
userid
DATE_TRUNC('month', cast ("activated" as date)) AS joined_month
FROM customers
GROUP BY 1, 2),
first_month as
(select userid,
joined_month,
FIRST_VALUE(order_month) OVER (PARTITION BY userid ORDER BY
joined_month asc rows unbounded preceding) AS first
FROM by_month),
months as (select userid,
joined_month,
first,
extract(month from (joined_month - first_month)) as month_number
from first_month)
SELECT
first as "cohort",
SUM(CASE WHEN month_number = '0' THEN 1 ELSE 0 END) AS " Month 0",
SUM(CASE WHEN month_number = '1' THEN 1 ELSE 0 END) AS " Month 1",
SUM(CASE WHEN month_number = '2' THEN 1 ELSE 0 END) AS " Month 2",
SUM(CASE WHEN month_number = '3' THEN 1 ELSE 0 END) AS " Month 3",
SUM(CASE WHEN month_number = '4' THEN 1 ELSE 0 END) AS " Month 4",
SUM(CASE WHEN month_number = '5' THEN 1 ELSE 0 END) AS " Month 5",
SUM(CASE WHEN month_number = '6' THEN 1 ELSE 0 END) AS " Month 6",
SUM(CASE WHEN month_number = '7' THEN 1 ELSE 0 END) AS " Month 7",
SUM(CASE WHEN month_number = '8' THEN 1 ELSE 0 END) AS " Month 8",
SUM(CASE WHEN month_number = '9' THEN 1 ELSE 0 END) AS " Month 9"
from months
where first_month >= '2018-08-01'
GROUP BY 1
ORDER BY 1 desc
当我得到结果时,我得到了几个同类群组的一个不可能的数字:
如:
Cohort Month 0 Month 1
'2019-01-01' 95 120
我做了一些挖掘,发现月份数字没有被正确计算例如,对于 '2019-01-01 的队列,month_ 数字是 t 正确捕获 0,1 和 3,但 2 被遗漏-归因于第 1 个月。非常感谢您对修复的任何帮助!
【问题讨论】:
窗口函数是什么意思?我在您的查询中没有看到任何窗口函数。 您将不得不提供表列数据类型;这很重要。 您的by_month
子查询不包含在下一个重用by_month
的子查询中使用的order_month
列,查询似乎无效
感谢 cmets 每个人,我编辑了我的问题以使其更加相关。
After >> SELECT first as "cohort",
【参考方案1】:
现在试试
SELECT userid, joined_month, first_month, month_number FROM months
WHERE first = '2019-01-01'
(请随意添加其他列以深入了解问题)添加已激活、order_month 等,直到您正确处理导致问题的原因。
【讨论】:
谢谢!看起来问题出在month_number上。它正确捕获了 0、1 和 3,但 2 被错误归因于第 1 个月。我不太确定如何更正 哪个 SELECT 发生了错误归因? by_month、first_month 或几个月?看看 >>extract(month from (joined_month - first_month)) as month_number以上是关于Redshift 的月数保留队列计算问题的主要内容,如果未能解决你的问题,请参考以下文章