从日志中查找会话开始和结束信息

Posted

技术标签:

【中文标题】从日志中查找会话开始和结束信息【英文标题】:Find session start and end info from logs 【发布时间】:2022-01-10 02:09:33 【问题描述】:

我们有日志/时间序列数据,例如。

输入

Session_id  user_id session_timestamp 
S1  U1  2019-10-01 22:00:00      
S1  U1  2019-10-01 22:00:01       
S1  U1  2019-10-01 22:00:02
S1  U1  2019-10-01 22:00:03
S1  U2  2019-10-01 22:00:04     
S1  U2  2019-10-01 22:00:05       
S1  U2  2019-10-01 22:00:06
S1  U2  2019-10-01 22:00:07
S1  U3  2019-10-01 22:00:08
S1  U3  2019-10-01 22:00:09
S1  U3  2019-10-01 22:00:10
S1  U3  2019-10-01 22:00:11
S1  U3  2019-10-01 22:00:12        
S1  U1  2019-10-01 22:00:13      
S1  U1  2019-10-01 22:00:14
S1  U1  2019-10-01 22:00:15
S1  U1  2019-10-01 22:00:16 

输出

Session_id  user_id Session_start_time  Session_end_time
S1  U1  2019-10-01 22:00:00 2019-10-01 22:00:03
S1  U2  2019-10-01 22:00:04 2019-10-01 22:00:07
S1  U3  2019-10-01 22:00:08 2019-10-01 22:00:12
S1  U1  2019-10-01 22:00:13 2019-10-01 22:00:16

解释

We have a heartbeat logged at every second.

First four row should be  considered as on session  (User U1).

last four row are also part of different session    (User U1).

我尝试使用带有 lag /lead 的窗口函数,但我无法区分 u1 的第二个会话,任何版本的 sql 都适合我。

数据脚本

create table logs(
  Session_id  varchar(10),
  user_id  varchar(10),
  session_timestamp date
  )
  insert into logs 
   select * from( select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:00' as session_timestamp union      
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:01' as session_timestamp union       
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:02' as session_timestamp union
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:03' as session_timestamp union
select 'S1' as session_id,  'U2' as user_id , '2019-10-01 22:00:04' as session_timestamp union     
select 'S1' as session_id,  'U2' as user_id , '2019-10-01 22:00:05' as session_timestamp union       
select 'S1' as session_id,  'U2' as user_id , '2019-10-01 22:00:06' as session_timestamp union
select 'S1' as session_id,  'U2' as user_id , '2019-10-01 22:00:07' as session_timestamp union
select 'S1' as session_id,  'U3' as user_id , '2019-10-01 22:00:08' as session_timestamp union
select 'S1' as session_id,  'U3' as user_id , '2019-10-01 22:00:09' as session_timestamp union
select 'S1' as session_id,  'U3' as user_id , '2019-10-01 22:00:10' as session_timestamp union
select 'S1' as session_id,  'U3' as user_id , '2019-10-01 22:00:11' as session_timestamp union
select 'S1' as session_id,  'U3' as user_id , '2019-10-01 22:00:12' as session_timestamp union        
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:13' as session_timestamp union      
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:14' as session_timestamp union
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:15' as session_timestamp union
select 'S1' as session_id,  'U1' as user_id , '2019-10-01 22:00:16' as session_timestamp ) a
  

http://sqlfiddle.com/#!18/ed396

【问题讨论】:

Please ignore asking silly question 是什么意思?没有像愚蠢的问题这样的事情。您使用的是哪个 DBMS,mysql!=Postgresql,什么版本?你试过什么? Ergest 我尝试使用窗口函数和滞后/领先,但我无法区分 U1 的第二个甚至任何版本的工作。 【参考方案1】:

您可以将 self-joincte 一起使用:

with cte(id, uid, t, c) as (
    select l.session_id, l.user_id, l.session_timestamp, sum(case when l.session_id = l1.session_id and l1.user_id != l.user_id then 1 end) from logs l join logs l1 on l.session_timestamp > l1.session_timestamp
    group by l.session_id, l.user_id, l.session_timestamp
)
select t.id, t.uid, t.start, t.end from (select c.c, c.id, c.uid, min(c.t) start, max(c.t) end from cte c group by c.c, c.id, c.uid) t order by t.start

【讨论】:

我在这里试过是sqlfiddle.com/#!18/ed396/13,但没有用,修复后的case语句有一些问题没有返回 @sandeeprawat 小提琴正在 MS SQL Server 上运行查询,而我的答案中的代码根据您的问题标签与 Mysql 或 Postgres 兼容。 好的,让我在mysql中试试 感谢它通过创建子查询来工作 mysql,【参考方案2】:

使用 PostGres

WITH list AS
(
SELECT session_id
     , user_id
     , session_timestamp AS current
     , session_timestamp > COALESCE(lag(session_timestamp, 1) OVER w + interval '1 second', session_timestamp - interval '1 second') AS _start
     , session_timestamp + interval '1 second' < COALESCE(lead(session_timestamp, 1) OVER w, session_timestamp + interval '2 second') AS _end
  FROM logs
WINDOW w AS (PARTITION BY session_id, user_id ORDER BY session_timestamp ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
)
SELECT s.session_id
     , s.user_id
     , s.session_start_time
     , e.session_end_time
  FROM 
     ( SELECT session_id, user_id, current AS session_start_time
         FROM list
        WHERE _start
     ) AS s
 CROSS JOIN LATERAL
     ( SELECT l.current AS session_end_time
         FROM list AS l
        WHERE _end
          AND l.session_id = s.session_id
          AND l.user_id = s.user_id
          AND l.current > s.session_start_time
        ORDER BY l.current ASC
        LIMIT 1          
     ) AS e
 ORDER BY s.session_id, s.user_id, s.session_start_time

测试结果在dbfiddle

【讨论】:

【参考方案3】:

这里是 Sqlserver 示例的解决方案solution

这里是Mysql示例的解决方案solution

with cte(id, uid, t, c) as (
    select l.session_id, l.user_id, l.session_timestamp, 
  sum(case when l.session_id = l1.session_id and l1.user_id != l.user_id 
      then 1 end) 
  from logs l join logs l1 on  datediff(ss,l1.session_timestamp , l.session_timestamp) >0
  
    group by l.session_id, l.user_id, l.session_timestamp
  )
 
  select  c.id, c.uid, min(c.t) start, max(c.t)  
   from cte c group by c.c, c.id, c.uid  order by start

感谢 alex 提供提示 (l1.user_id != l.user_id )。

【讨论】:

以上是关于从日志中查找会话开始和结束信息的主要内容,如果未能解决你的问题,请参考以下文章

如何计算给定会话日志的峰值并发用户

Android应用开始和结束事件

如果 TestFlight 会话没有以“会话结束”日志结束会发生啥?

从连续日期中查找最近的开始日期和结束日期

在事件日志中获取特定任务的开始结束时间

在从日志条目创建的会话表中查找并发用户