在 Redshift 查询中为 max(date) 函数指定意外事件
Posted
技术标签:
【中文标题】在 Redshift 查询中为 max(date) 函数指定意外事件【英文标题】:Specifying contingencies for a max(date) function in Redshift query 【发布时间】:2019-04-13 00:06:29 【问题描述】:我试图确定客户在一个日历周内何时未登录,他们最后一次登录是什么时候。起始数据集如下所示:
User_Name Login_Date Week_Ending
Bobisaur 1/1/2019 1/5/2019
Bobisaur 1/3/2019 1/5/2019
Bobisaur 3/5/2019 3/9/2019
Bobisaur 3/24/2019 3/30/2019
Bobisaur 4/1/2019 4/6/2019
现在以1/12/2019
等结束的那一周,客户Bobisaur
将无法登录我想要做的是最终得到一个如下所示的数据集:
User_Name Week_Ending (for weeks with no login) Days Since Last Login (as of week ending date)
Bobisaur 1/12/2019 7
Bobisaur 1/19/2019 14
Bobisaur 1/26/2019 21
Bobisaur 2/2/2019 28
Bobisaur 2/9/2019 35
Bobisaur 2/16/2019 42
Bobisaur 2/23/2019 49
Bobisaur 3/2/2019 56
Bobisaur 3/16/2019 11
Bobisaur 3/23/2019 18
Bobisaur 4/13/2019 12
这将过滤到他们没有登录的用户的周列表中,并有一个列显示自他们上次登录以来的天数,截至该特定的周末日期。
我能够获得没有登录活动的 Week_Ending
日期,但是,我一直在计算“自上次登录以来的天数(截至周末日期)”。我尝试做的是使用(Week_Ending - max(Login_Date))
,然后使用max(Login_Date) <= Week_Ending
指定一个有子句。
但是,这基本上删除了 Week_Ending
值早于最大 Login_Date
的所有行。
任何帮助将不胜感激。谢谢!
【问题讨论】:
您能分享一下您用来获取中间结果的查询吗?可能更容易,只是适应它。 这里是查询的最后一部分 -- select calendar_week_ending, (calendar_week_ending - max(login_dater)) as days_since_last_login, user_name from db group by 1 having max(login_date) 【参考方案1】:生成的查询看起来很复杂,但还不错:
intput_raw
- 你在问题中粘贴的任何内容
input
- 将 login_data
与各自的 week_ending
交换
cal
- 带有连续星期六的日历表(您对 weekending 的定义)。在 Redshift 中,生成行的唯一方法是 SELECT
。在这里,我通过将输入与自身交叉连接来生成 25 行,以获得 25 (5x5) 个连续的星期六。 CROSS JOIN
可以改为SELECT row_number() over () from arbitrary table limit 25
。
cal_usrs
- 所有用户和所有星期六的餐桌
output_raw
- 计算的核心 - 首先是 JOIN
cal_usrs
和 input
,然后使用窗口函数获取 自上次登录后的天数。由于不可能对带有窗口函数结果的列进行过滤(并且您所需的输出没有带有0
的行),因此最终有一个SELECT
。
final SELECT
- 只选择我们感兴趣的内容。
查询:
with input_raw as (
select 'Bobisaur' as username, '1/1/2019'::date as login_date
union all
select 'Bobisaur', '1/3/2019'::date
union all
select 'Bobisaur', '3/5/2019'::date
union all
select 'Bobisaur', '3/24/2019'::date
union all
select 'Bobisaur', '4/1/2019'::date
), input as (
select
username,
-- in your example weeks ends on saturday hence Monday + 5 days
date_trunc('week', login_date) + interval '5 days' as week_ending
from input_raw
), cal as (
-- this will create a table with consecutive Saturdays
select
date_trunc('week', '12/1/2018'::date) + interval '5 days'+ 7 * row_number() over () as week_ending
-- can be changed to 'from arbitrary table limit 25' or whatever time window you wish
from input_raw a cross join input_raw b --this will produce 25 rows
), cal_usrs as (
select * from cal cross join (select distinct username from input) as u
-- this is very important - you want to have all weeks with all users
), output_raw as (
select cal_usrs.username,
cal_usrs.week_ending,
max(input.week_ending)
over (partition by cal_usrs.username order by cal_usrs.week_ending rows between unbounded preceding and current row ) as last_login_week,
extract('days' from cal_usrs.week_ending - last_login_week) as days_since_last_login
from input
right join cal_usrs using (username, week_ending)
)
select
username,
to_char(week_ending, 'MM/DD/YYYY') as week_anding,
days_since_last_login
from output_raw
where days_since_last_login <> 0 -- your example did not contain 0 rows
order by week_ending
结果看起来像(我认为您在 3 月 16 日之后的示例中计算错误):
username week_ending days_since_last_login
Bobisaur 01/12/2019 7
Bobisaur 01/19/2019 14
Bobisaur 01/26/2019 21
Bobisaur 02/02/2019 28
Bobisaur 02/09/2019 35
Bobisaur 02/16/2019 42
Bobisaur 02/23/2019 49
Bobisaur 03/02/2019 56
Bobisaur 03/16/2019 7
Bobisaur 03/30/2019 7
Bobisaur 04/13/2019 7
Bobisaur 04/20/2019 14
Bobisaur 04/27/2019 21
Bobisaur 05/04/2019 28
Bobisaur 05/11/2019 35
Bobisaur 05/18/2019 42
Bobisaur 05/25/2019 49
【讨论】:
以上是关于在 Redshift 查询中为 max(date) 函数指定意外事件的主要内容,如果未能解决你的问题,请参考以下文章