连续几小时内的 MYSQL SUM 持续时间

Posted

技术标签:

【中文标题】连续几小时内的 MYSQL SUM 持续时间【英文标题】:MYSQL SUM durations within consecutive hours 【发布时间】:2020-03-05 20:59:30 【问题描述】:

使用旧版本的 mysql,其中 WITH 子句无效。

从表格开始:

+--------+---------------------+---------------------+
| person | start_time          | end_time            |
+--------+---------------------+---------------------+
| Alice  | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 |
| Alice  | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Alice  | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 |
| Alice  | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 |
| Bob    | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Cindy  | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 |
| Cindy  | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 |
+--------+---------------------+---------------------+

我想要一个查询来汇总每个人在一小时内的所有持续时间。

+--------+---------------------+---------------------+----------+
| person | start_time          | end_time            | duration |
+--------+---------------------+---------------------+----------+
| Alice  | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 |     3599 |
| Alice  | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 |     8064 |
| Bob    | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |      899 |
| Cindy  | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 |     5806 |
+--------+---------------------+---------------------+----------+

【问题讨论】:

这种任务在没有实现递归查询的数据库中真的很不容易。 应该Bob end_time2020-02-27 23:59:59 谢谢彼得。我在示例中更正了它。 不要认为您可以在单个查询中做到这一点。制作一个程序并在程序内使用游标遍历行并计算每个用户的持续时间。将结果存储在临时表中,并在过程结束时返回结果集。试试看,如果你有问题,用代码发布一个问题。 @slaakso 为什么不呢? 【参考方案1】:

例如- 尽管如前所述,此解决方案仅适用于 MySQL 8.0 之前的版本...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(person VARCHAR(12) NOT NULL
,start_time DATETIME NOT NULL
,end_time DATETIME NOT NULL
,PRIMARY KEY(person,start_time)
);

INSERT INTO my_table VALUES
('Alice','2020-02-27 20:00:00','2020-02-27 20:59:59'),
('Alice','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Alice','2020-02-28 00:00:00','2020-02-28 00:59:59'),
('Alice','2020-02-28 01:00:00','2020-02-28 01:59:59'),
('Bob','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Cindy','2020-02-28 02:00:00','2020-02-28 02:59:59'),
('Cindy','2020-02-28 03:00:00','2020-02-28 03:36:59');

SELECT person
     , MIN(start_time) start_time
     , MAX(end_time) end_time
     , SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta 
  FROM 
     ( SELECT x.*
            , CASE WHEN person = @prev_person 
                   THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR 
                             THEN @i:=@i 
                             ELSE @i:=@i+1 END 
                   ELSE @i:=1 END i
            , @prev_person := person
            , @prev_end_time := end_time
         FROM my_table x
            , (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars 
        ORDER 
           BY person
            , start_time
     ) a
 GROUP  
    BY person,i;
+--------+---------------------+---------------------+-------+
| person | start_time          | end_time            | delta |
+--------+---------------------+---------------------+-------+
| Alice  | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 |  3599 |
| Alice  | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 |  8097 |
| Bob    | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |   899 |
| Cindy  | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 |  5818 |
+--------+---------------------+---------------------+-------+

FWIW,我认为以这种方式重写查询会使其“与版本无关”,即不受不能保证元素评估顺序的公平指责 - 但我可能错了。无论如何,在 MySQL 8.0+ 中,可以使用该版本提供的扩展功能重写以下内容。

SELECT person
     , MIN(start_time) start_time
     , MAX(end_time) end_time
     , SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta 
  FROM 
  ( SELECT * FROM
     ( SELECT x.*
            , CASE WHEN person = @prev_person 
                   THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR 
                             THEN @i:=@i 
                             ELSE @i:=@i+1 END 
                   ELSE @i:=1 END i
            , @prev_person := person
            , @prev_end_time := end_time
         FROM my_table x
            , (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars 
     ) k
      ORDER 
                BY person
            , start_time
     ) a
 GROUP  
    BY person,i;

【讨论】:

模拟窗口函数做得很好,+1。但是这个:CASE WHEN person = @prev_person ... END i, @prev_person := person 总是不安全的,因为 MySQL 不保证 SELECT 子句中元素的评估顺序。我认为你需要一个更复杂的 CASE 表达式来管理它。【参考方案2】:

提供这样一个结果集的示例查询是:

SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time) 
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time) 
BETWEEN 0 AND 3599 
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;

与此相同

SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time) 
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time) 
BETWEEN 0 AND 3599 
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;

检查此SQL Fiddle 中的两个查询

编辑

根据@Strawberry 的评论,上述查询需要重新编写并稍作改动。

SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time) 
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;

与此相同

SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time) 
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;

检查此SQL Fiddle 中的两个查询

【讨论】:

如果 Cindy 的记录在 2020-02-28 03:36:59 的一小时内,但不在 2020-02-28 02:59:59 之内,我想这仍然想要被算作一个连续的时期,但我认为您的查询不会这样做。 您的意思是,如果 Cindy 与 start_time 2020-02-28 03:37:00 有另一条记录,它将被视为连续记录。但如果是2020-02-28 03:37:01,则不会。我认为这很明显。 我的意思是开始时间在 04:00:00 到 04:37:00 之间 @Strawberry 这不是continuous,这意味着有 25 分钟的休息时间。所以它将像Alice一样在一行中计算 好吧,只是为了让 OP 意识到差异;我想如果他们愿意的话,应该由他们来澄清“连续几个小时内”的含义。【参考方案3】:

尝试单个查询对我来说并不容易,但我使用 self LEFT JOIN 表和 ON 子句中的一堆条件来做到这一点

SELECT A.Person,
       MIN(A.start_time) AS start_time,
       MAX(A.end_time) AS end_time,
       TIME_TO_SEC(TIMEDIFF(MAX(A.end_time),MIN(A.start_time)))  Duration,
       CASE WHEN B.person IS NULL THEN 0 ELSE 1 END AS chk 
FROM my_table A
LEFT JOIN my_table B 
ON A.person=B.person 
AND A.start_time - INTERVAL 1 HOUR < B.end_time -- when A.start_time minus 1 hour is smaller than B.end_time
AND A.end_time + INTERVAL 1 HOUR > B.start_time -- when A.end_time plus 1 hour is bigger than B.start_time 
AND A.start_time <> B.start_time -- when A.start_time is not same as B.start_time 
GROUP BY A.person,chk;

基本查询是这样的:

SELECT *,CASE WHEN b.person IS NULL THEN 0 ELSE 1 END AS chk
FROM my_table a LEFT JOIN my_table b 
ON a.person=b.person 
AND a.start_time - INTERVAL 1 HOUR < b.end_time
AND a.end_time + INTERVAL 1 HOUR > b.start_time
AND a.start_time <> b.start_time;

返回以下结果:

+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| person | start_time          | end_time            | person | start_time          | end_time            | chk |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| Alice  | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | NULL   |        NULL         |        NULL         | 0   |
| Alice  | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice  | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 1   |
| Alice  | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | Alice  | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1   |
| Alice  | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | Alice  | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1   |
| Alice  | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice  | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | 1   |
| Bob    | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | NULL   |        NULL         |        NULL         | 0   |
| Cindy  | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | Cindy  | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | 1   |
| Cindy  | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | Cindy  | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | 1   |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +

P/S:感谢 Strawberry 提供的表结构和数据示例。

编辑: 在草莓的评论之后,我同意我之前的查询实际上并没有计算正确的持续时间,因为我只是在MAX(end_date)MIN(start_date) 之间取了TIMEDIFF。我做了一些更改,更新后的查询如下:

SELECT person,
       MIN(CASE WHEN starttime=0 THEN start_time ELSE starttime END) AS starttime,
       MAX(CASE WHEN endtime=0 THEN end_time ELSE endtime END) AS endtime,
       SUM(duration) AS duration,
       CASE WHEN starttime=0 THEN 0 ELSE 1 END AS chk 
FROM
 (SELECT a.person, a.start_time,a.end_time,
         ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0 
                        WHEN b.start_time IS NULL THEN a.start_time
                        ELSE a.start_time END) starttime,
         ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0
                        WHEN b.start_time IS NULL THEN a.end_time
                        ELSE a.end_time END) endtime,
         TIME_TO_SEC(TIMEDIFF(a.end_time,a.start_time)) duration
    FROM my_table a 
LEFT JOIN my_table b ON a.person=b.person AND b.start_time > a.end_time
GROUP BY a.person,a.start_time,a.end_time) TT
GROUP BY person,chk;

这里是小提琴:https://www.db-fiddle.com/f/8XHWhfhCYSj8zcFcmo2KUo/1

P/S:我在小提琴中添加了另一个“Bob”记录用于测试目的。

这与上一个有点相似,只是这次我将大部分ON 条件移动到SELECT。我还使用ANY_VALUE 绕过sql_mode=only_full_group_by。另一方面,如果关闭了sql_mode,则不需要ANY_VALUE()。请注意,如果您使用 MariaDB,它不支持 ANY_VALUE()

【讨论】:

我认为这也未能“正确”考虑到 Cindy 在 4.00 和 4.30 之间有另一个时段的情况

以上是关于连续几小时内的 MYSQL SUM 持续时间的主要内容,如果未能解决你的问题,请参考以下文章

计算 2 个日期内的持续时间

每日算法刷题Day3-起始时间转换二次方根while连续输入斐波那契思路

mysql语句:如何写查询当前一小时内的数据

Google表格 - 图表持续时间错误

如何检测 MySQL DATETIME 列中的连续小时数?

MySql 求一段时间范围内的每一天,每一小时,每一分钟