如何根据 SQL 中的时间戳查找特定事件的下一个序列?
Posted
技术标签:
【中文标题】如何根据 SQL 中的时间戳查找特定事件的下一个序列?【英文标题】:How to find the next sequence of a particular event based on timestamp in SQL? 【发布时间】:2021-01-09 19:14:42 【问题描述】:我有 2 个表:- views_tbl
表(页面浏览量)和 play_tbl
表(记录客户播放特定视频的时间)。来自views_tbl
的nowPlaying
page_type 将在play_tbl
中具有相同的时间戳。因此时间戳是唯一可用于连接这两个表的列。
play_ts = event_ts when page_type = 'nowPlaying';
我正在尝试根据以下标准查找导致成功播放事件的browseFind
计数(nowPlaying
page_type 来自views_tbl
)。
如果browseFind
后跟browseSearch
,那么这种情况将被忽略,(browseFind
之后应该没有browseSearch
事件,但可以有其他事件)。
例如对于第一个browseFind
发生在2021-01-07 00:36:57.321
,我们有一个立即的browseSearch
,因此将被忽略。
对于browseFind
在2021-01-07 11:17:27.286
的下一次出现,在以下nowPlaying
page_type 之间没有立即的browseSearch,因此将在计数中考虑这一点。还有来自play_tbl
的duration_seconds
应该大于 30 才能计算在内。如果持续时间小于 30 秒,则忽略此情况。
我试图通过将两个表合并为一个联合来做到这一点,但 LEAD 和 LAG 没有给我任何结果。
select
tbl_name,
play_ts,
ctrl_nm,
genre,
album_id,
song_id,
duration_seconds
FROM play_tbl
union
select tbl_name, event_ts, page_type, NULL, NULL, NULL,NULL from views_tbl ;
然后尝试使用LEAD
和LAG
。我使用 Redshift 作为我的数据库。我无法获得正确的行组合。
输出: 导致 nowPlaying
持续时间大于 30 秒且它们之间没有 browseSearch
事件的 browseFind
计数。
任何帮助将不胜感激。我不知道如何编写可以逐步或逐行查找正确条件的sql。
表格 DDL:
create table play_tbl (tbl_name varchar(10), play_ts timestamp, ctrl_nm varchar(10), genre varchar(100), album_id varchar(50), song_id varchar(50), duration_seconds int);
insert into play_tbl values ('play', '2021-01-07 03:17:51.474', 'umse', 'browseGema', 'B089999999', 'B99FPD1MCJ', 212);
insert into play_tbl values ('play', '2021-01-07 11:17:37.228', 'umse', 'browseGema', 'B089999999', 'B99K2XSNY3', 175);
insert into play_tbl values ('play', '2021-01-07 11:48:19.136', 'umse', 'browseGema', 'B089999999', 'B99328YJBW', 155);
insert into play_tbl values ('play', '2021-01-07 11:49:51.419', 'umse', 'browseGema', 'B089999999', 'B999PR4XRS', 48);
insert into play_tbl values ('play', '2021-01-07 11:50:26.264', 'umse', 'browseGema', 'B089999999', 'B99C98DB5T', 89);
insert into play_tbl values ('play', '2021-01-07 11:52:56.433', 'umse', 'browseGema', 'B089999999', 'B99L88RQZS', 12);
insert into play_tbl values ('play', '2021-01-07 19:14:53.865', 'umse', 'browseGema', 'B089999999', 'B99GWTZZ3H', 23);
insert into play_tbl values ('play', '2021-01-07 19:40:46.806', 'umse', 'browseGema', 'B089999999', 'B99NCVV16G', 185);
insert into play_tbl values ('play', '2021-01-07 19:48:47.708', 'umse', 'browseGema', 'B089999999', 'B99BVYG1S6', 48);
insert into play_tbl values ('play', '2021-01-07 21:30:03.102', 'umse', 'browseGema', 'B089999999', 'B99C9KDQW6', 69);
insert into play_tbl values ('play', '2021-01-07 11:17:33.655', 'umse', 'browseGema', 'B089999999', 'B99FPH4GD1', 232);
insert into play_tbl values ('play', '2021-01-07 21:30:05.931', 'umse', 'browseGema', 'B089999999', 'B99GFBC4V5', 2);
create table views_tbl (tbl_name varchar(10), event_ts timestamp, page_type varchar(150));
insert into views_tbl values ('view', '2021-01-07 00:36:55.33','detail-userPlaylist');
insert into views_tbl values ('view', '2021-01-07 00:36:52.328','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 00:36:57.321','browseFind');
insert into views_tbl values ('view', '2021-01-07 00:37:03.871','browseSearch');
insert into views_tbl values ('view', '2021-01-07 03:17:42.541','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 03:17:45.78','detail-userPlaylist');
insert into views_tbl values ('view', '2021-01-07 03:17:51.474','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:17:25.38','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 11:17:27.286','browseFind');
insert into views_tbl values ('view', '2021-01-07 11:17:29.048','browseGema');
insert into views_tbl values ('view', '2021-01-07 11:17:32.342','browseGema');
insert into views_tbl values ('view', '2021-01-07 11:17:31.363','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:17:34.221','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:17:37.228','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:48:04.055','browseGema');
insert into views_tbl values ('view', '2021-01-07 11:48:04.796','browseFind');
insert into views_tbl values ('view', '2021-01-07 11:48:05.359','browseSearch');
insert into views_tbl values ('view', '2021-01-07 11:48:08.778','browseSearch');
insert into views_tbl values ('view', '2021-01-07 11:48:19.136','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:48:12.066','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:49:51.419','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:50:12.778','browseSearch');
insert into views_tbl values ('view', '2021-01-07 11:50:17.936','browseSearch');
insert into views_tbl values ('view', '2021-01-07 11:50:26.264','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:52:21.502','browseFind');
insert into views_tbl values ('view', '2021-01-07 11:52:26.201','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 11:52:27.375','browseHome');
insert into views_tbl values ('view', '2021-01-07 11:52:36.111','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:52:42.604','browseHome');
insert into views_tbl values ('view', '2021-01-07 11:52:41.909','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:52:45.771','browseFind');
insert into views_tbl values ('view', '2021-01-07 11:52:53.719','detail-album');
insert into views_tbl values ('view', '2021-01-07 11:52:56.433','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 11:59:38.747','browseSearch');
insert into views_tbl values ('view', '2021-01-07 11:59:39.718','browseFind');
insert into views_tbl values ('view', '2021-01-07 11:59:41.481','browseHome');
insert into views_tbl values ('view', '2021-01-07 11:59:43.998','detail-playlist');
insert into views_tbl values ('view', '2021-01-07 11:59:47.427','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 12:00:58.284','browseHome');
insert into views_tbl values ('view', '2021-01-07 12:24:15.929','browseHome');
insert into views_tbl values ('view', '2021-01-07 18:34:15.191','browseHome');
insert into views_tbl values ('view', '2021-01-07 19:14:47.426','browseHome');
insert into views_tbl values ('view', '2021-01-07 19:14:50.187','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 19:14:52.098','detail-userPlaylist');
insert into views_tbl values ('view', '2021-01-07 19:14:53.865','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 19:40:45.267','detail-userPlaylist');
insert into views_tbl values ('view', '2021-01-07 19:40:43.486','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 19:40:46.806','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 19:48:42.942','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 19:48:46.362','detail-userPlaylists');
insert into views_tbl values ('view', '2021-01-07 19:48:47.708','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 21:29:54.488','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 21:29:56.973','activity-feed');
insert into views_tbl values ('view', '2021-01-07 21:30:01.206','detail-userPlaylist');
insert into views_tbl values ('view', '2021-01-07 21:29:59.795','cloudLibrary-playlists');
insert into views_tbl values ('view', '2021-01-07 21:30:03.102','nowPlaying');
insert into views_tbl values ('view', '2021-01-07 21:30:05.931','nowPlaying');
【问题讨论】:
Redshift 还是 mysql?这是两个非常不同的数据库。 想要的结果是什么 @a_horse_with_no_name 数据库是 Redshift,但如果需要使用仅在 MySQL 中可用的功能,我可以将数据复制到 MySQL。谢谢。 @Strawberry: 期望的输出是 2 【参考方案1】:如果我正确理解了您的问题(您的数据的答案是 4),那么我认为问题在于如何将 browseSearch 和 browseFind 视为视图事件中的特殊情况。在下面的 SQL 中,我使用 DECODE 进行了此操作(如果您愿意,请使用 CASE)并将 IGNORE NULLS 添加到 LAG()。我在子选择中留下了额外的列,以防您想自己运行每个级别以适应正在发生的事情。如果不完全正确,它应该会为您提供根据您的确切需求进行调整所需的提示。
SELECT COUNT(DECODE(prev_browse,'browseFind',duration_seconds,NULL)) AS cnt
FROM (SELECT event_ts,
LAG(browse_type IGNORE NULLS) OVER (ORDER BY event_ts) AS prev_browse,
page_type,
duration_seconds
FROM (SELECT event_ts,
page_type,
DECODE(page_type IN ('browseFind', 'browseSearch'),
TRUE, page_type,
NULL
) AS browse_type,
duration_seconds
FROM views_tbl v
LEFT JOIN play_tbl p
ON v.event_ts = p.play_ts
AND v.page_type = 'nowPlaying'
WHERE page_type IN ('nowPlaying','browseFind','browseSearch')
AND (p.duration_seconds >= 30 OR p.duration_seconds IS NULL)));
【讨论】:
感谢您的提示,正确答案应该是2。我想我可以修改查询以适应条件。 很高兴听到。一切顺利。以上是关于如何根据 SQL 中的时间戳查找特定事件的下一个序列?的主要内容,如果未能解决你的问题,请参考以下文章
使用 Spark/Scala 有效地按键分组并查找在特定时间窗口中发生的事件的上一个时间戳