计算时间范围内的时间分组间隔
Posted
技术标签:
【中文标题】计算时间范围内的时间分组间隔【英文标题】:Count Grouped Gaps In Time For Time Range 【发布时间】:2012-01-10 08:13:16 【问题描述】:我正在寻找在给定时间范围内存在多少分组间隙。
starting range: 2012-01-12 00:00:00
ending range: 2012-01-18 59:59:59
大致翻译为:
type 10 11 12 13 14 15 16 17 18 19 20
a |--========]
a |==------]
b |==============--]
c |-----===========]
d |--=====================------]
按类型分组的相同数据:
a |--========] |==------]
b |==============--]
c |-----===========]
d |--=====================------]
导致:
type gap
---------
a 1 (yes)
b 1 (yes)
c 1 (yes)
d 0 (no)
最终……
SUM(gap) AS gaps
----------------
3
更新说明:
数据以每种类型的开始和结束时间戳存储。例如:
id type start_datetime end_datetime
--------------------------------------------------
1 a 2012-01-11 00:00:00 2012-01-14 59:59:59
2 a 2012-01-18 00:00:00 2012-01-20 59:59:59
3 b 2012-01-14 00:00:00 2012-01-19 59:59:59
4 c 2012-01-10 00:00:00 2012-01-15 59:59:59
5 d 2012-01-11 00:00:00 2012-01-20 59:59:59
【问题讨论】:
您的数据实际是如何存储的?表定义等是什么? 开始时间戳和结束时间戳。我将更新问题以澄清。 开始和结束时间可以是任何时间,还是总是午夜? 同一类型的时间范围可以重叠吗? @MarkBannister 开始和结束时间可以是任何时间。我为示例简化了它。 【参考方案1】:为了避免重复工作,这里是数据(我将包容性的上边界海湾替换为独占的,这是更常见的,恕我直言):
-- CREATE SCHEMA tmp;
DROP TABLE tmp.gaps CASCADE;
CREATE TABLE tmp.gaps
( id INTEGER NOT NULL PRIMARY KEY -- surrogate key
, ztype CHAR(1) NOT NULL
, start_datetime TIMESTAMP NOT NULL -- lower boundary := inclusive
, end_datetime TIMESTAMP NOT NULL -- upper boundary := exclusive
);
CREATE UNIQUE INDEX gaps_forward ON tmp.gaps(ztype,start_datetime);
CREATE UNIQUE INDEX gaps_backward ON tmp.gaps(ztype,end_datetime);
INSERT INTO tmp.gaps(id,ztype,start_datetime,end_datetime) VALUES
(1,'a', '2012-01-11 00:00:00', '2012-01-15 00:00:00' )
,(2,'a', '2012-01-18 00:00:00', '2012-01-21 00:00:00' )
,(3,'b', '2012-01-14 00:00:00', '2012-01-20 00:00:00' )
,(4,'c', '2012-01-10 00:00:00', '2012-01-16 00:00:00' )
,(5,'d', '2012-01-11 00:00:00', '2012-01-21 00:00:00' )
,(6,'e', '2012-01-11 00:00:00', '2012-01-15 00:00:00' ) -- added this
,(7,'e', '2012-01-15 00:00:00', '2012-01-21 00:00:00' ) -- and this
;
-- SELECT * FROM tmp.gaps;
更新:CTE 来了。 在第一个 UNION 中,我在想要的 (12-Jan -- 19-Jan) 区间的左侧和右侧添加了两个假区间。
每个 ztype 我计算间隔的总数。如果没有孔,这应该是一个,如果有一个孔,则应该是两个,等等。这也将找到在所需区间内没有任何记录的 ztype 的间隙。
-- EXPLAIN ANALYZE
WITH RECURSIVE meuk(ztype,start_datetime,end_datetime) AS (
-- For every possible "ztype" add two dummie records
-- just before and just after our wanted interval.
WITH plus2 AS (
SELECT g0.ztype,g0.start_datetime,g0.end_datetime FROM tmp.gaps g0
WHERE (g0.start_datetime <= '2012-01-12 00:00:00' AND g0.end_datetime >= '2012-01-12 00:00:00')
OR (g0.start_datetime >= '2012-01-12 00:00:00' AND g0.end_datetime <= '2012-01-19 00:00:00')
OR (g0.start_datetime <= '2012-01-19 00:00:00' AND g0.end_datetime >= '2012-01-19 00:00:00')
UNION ALL SELECT DISTINCT g1.ztype, '1900-01-01 00:00:00'::timestamp, '2012-01-12 00:00:00'::timestamp FROM tmp.gaps g1
UNION ALL SELECT DISTINCT g2.ztype, '2012-01-19 00:00:00'::timestamp, '2100-01-01 00:00:00'::timestamp FROM tmp.gaps g2
)
SELECT p0.ztype,p0.start_datetime,p0.end_datetime
FROM plus2 p0
-- the start of a stretch: there is no older overlapping
-- (or touching) interval
WHERE NOT EXISTS (SELECT *
FROM plus2 nx
WHERE nx.ztype = p0.ztype
AND nx.start_datetime < p0.start_datetime -- older
AND nx.end_datetime >= p0.start_datetime -- touching or overlapping
)
UNION
SELECT mk.ztype
, LEAST(mk.start_datetime,p1.start_datetime)
, GREATEST(mk.end_datetime,p1.end_datetime)
FROM plus2 p1
, meuk mk
WHERE p1.ztype = mk.ztype
AND (p1.start_datetime >= mk.start_datetime AND p1.start_datetime <= mk.end_datetime AND p1.end_datetime > mk.end_datetime)
)
SELECT ztype, COUNT(*)-1 AS ngap
FROM meuk mk
WHERE NOT EXISTS (SELECT *
FROM meuk nx
WHERE nx.ztype = mk.ztype
AND (nx.start_datetime,nx.end_datetime) OVERLAPS( mk.start_datetime,mk.end_datetime)
AND (nx.end_datetime - nx.start_datetime) > (mk.end_datetime - mk.start_datetime)
)
GROUP BY ztype
ORDER BY ztype
;
创建最终总和留给读者作为练习;-)
结果:
ztype | ngap
-------+------
a | 1
b | 1
c | 1
d | 0
e | 0
(5 rows)
【讨论】:
【参考方案2】:这是 wildplasser 答案的一个变体,它使用 windows 而不是 CTE。基于相同的测试夹具:
select ztype, count(*) as gaps
from (
select ztype, datetime, sum(n) over(partition by ztype order by datetime asc) as level
from (
select id, ztype, start_datetime as datetime, 1 as n from tmp.gaps
union all
select id, ztype, end_datetime, -1 from tmp.gaps
union all
select 0, ztype, '2012-01-12 00:00:00', 0 from (select distinct ztype from tmp.gaps) z
union all
select 0, ztype, '2012-01-19 00:00:00', 0 from (select distinct ztype from tmp.gaps) z
) x
) x
where level = 0 and datetime >= '2012-01-12 00:00:00' and datetime < '2012-01-19 00:00:00'
group by ztype
;
这是基于使用 sum() 作为窗口聚合,范围开始加 1,范围结束减 1,然后在目标范围内查找运行总和变为 0 的点。我必须做和 wildplasser 一样的事情,在边界的端点添加一些不提供任何东西的额外条目,以便找到没有任何东西覆盖边界的组...
这似乎在测试数据上花费较少,但我认为这可能高度依赖于表中没有太多数据需要通过。通过一些重新排列(这将使其更难阅读),它只需对 tmp.gaps 进行两次完整扫描(其中一次只是获得不同的 ztypes)。
【讨论】:
您的意思是添加“-1”吗?选择 ztype, count(*)-1 作为间隙...谢谢! 顺便说一句:这似乎不适用于重叠拉伸。并且:CTE 总是更快 :-]以上是关于计算时间范围内的时间分组间隔的主要内容,如果未能解决你的问题,请参考以下文章
SQL - 显示给定范围内的所有日期,并使用数据库中的时间戳计算该日期有多少帖子