差距和岛屿 - 使用 Postgresql 获取某个日期范围内的失业日期列表
Posted
技术标签:
【中文标题】差距和岛屿 - 使用 Postgresql 获取某个日期范围内的失业日期列表【英文标题】:Gaps and Islands - get a list of dates unemployed over a date range with Postgresl 【发布时间】:2019-07-08 17:02:09 【问题描述】:我有一个名为 Position 的表,在这个表中,我有以下内容,包括日期 (yyyy-mm-dd),下面是就业日期的简化视图
id, person_id, start_date, end_date , title
1 , 1 , 2001-12-01, 2002-01-31, 'admin'
2 , 1 , 2002-02-11, 2002-03-31, 'admin'
3 , 1 , 2002-02-15, 2002-05-31, 'sales'
4 , 1 , 2002-06-15, 2002-12-31, 'ops'
我希望能够计算就业差距,假设某些日期重叠以为 id=1 的人生成以下输出
person_id, start_date, end_date , last_position_id, gap_in_days
1 , 2002-02-01, 2002-02-10, 1 , 10
1 , 2002-06-01, 2002-06-14, 3 , 14
我查看了许多解决方案、UNIONS、物化视图、生成日历日期范围的表等。我真的不确定什么是最好的方法。是否有一个查询可以让我完成这项工作?
【问题讨论】:
你是什么意思assuming some of the dates overlap
?
什么是last_position_id
?
一个人可以有两个职位,因此日期重叠
last_position_id 是间隙前的位置的id
【参考方案1】:
首先你需要找到重叠的日期Determine Whether Two Date Ranges Overlap
然后将这些范围合并为一个范围并保留最后一个 id
最后计算一个end_date
和下一个start_date - 1
之间的天数范围
SQL DEMO
with find_overlap as (
SELECT t1."id" as t1_id, t1."person_id", t1."start_date", t1."end_date",
t2."id" as t2_id, t2."start_date" as t2_start_date, t2."end_date" as t2_end_date
FROM Table1 t1
LEFT JOIN Table1 t2
ON t1."person_id" = t2."person_id"
AND t1."start_date" <= t2."end_date"
AND t1."end_date" >= t2."start_date"
AND t1.id < t2.id
), merge_overlap as (
SELECT
person_id,
start_date,
COALESCE(t2_end_date, end_date) as end_date,
COALESCE(t2_id, t1_id) as last_position_id
FROM find_overlap
WHERE t1_id NOT IN (SELECT t2_id FROM find_overlap WHERE t2_ID IS NOT NULL)
), cte as (
SELECT *,
LEAD(start_date) OVER (partition by person_id order by start_date) next_start
FROM merge_overlap
)
SELECT *,
DATE_PART('day',
(next_start::timestamp - INTERVAL '1 DAY') - end_date::timestamp
) as days
FROM cte
WHERE next_start IS NOT NULL
输出
| person_id | start_date | end_date | last_position_id | next_start | days |
|-----------|------------|------------|------------------|------------|------|
| 1 | 2001-12-01 | 2002-01-31 | 1 | 2002-02-11 | 10 |
| 1 | 2002-02-11 | 2002-05-31 | 3 | 2002-06-15 | 14 |
【讨论】:
这假设只有两个日期之间的简单重叠。如果您有三向重叠,可能需要先进行递归查询 为什么我得到了正确的答案。即使我的答案是正确的,S-Man 的答案似乎更简单【参考方案2】:step-by-step demo:db<>fiddle
您只需要lead()
window function。有了这个,您可以获取当前行的值(在这种情况下为start_date
)。
SELECT
person_id,
end_date + 1 AS start_date,
lead - 1 AS end_date,
id AS last_position_id,
lead - (end_date + 1) AS gap_in_days
FROM (
SELECT
*,
lead(start_date) OVER (PARTITION BY person_id ORDER BY start_date)
FROM
positions
) s
WHERE lead - (end_date + 1) > 0
获得下一个start_date
后,您可以将其与当前的end_date
进行比较。如果它们不同,那么您就有差距。这些正值可以在WHERE
子句中过滤。
(如果2个位置重叠,则diff为负数,可以忽略。)
【讨论】:
以上是关于差距和岛屿 - 使用 Postgresql 获取某个日期范围内的失业日期列表的主要内容,如果未能解决你的问题,请参考以下文章