如何将开始日期和结束日期与其他开始日期和结束日期分开?
Posted
技术标签:
【中文标题】如何将开始日期和结束日期与其他开始日期和结束日期分开?【英文标题】:How do I split start and end dates with other start and end dates? 【发布时间】:2022-01-23 23:55:08 【问题描述】:我在下面编写了两个查询,结果类似于:
表:company_changes
user_id | start_at | end_at | company_id |
---|---|---|---|
189 | 2020-12-12 | 2021-03-02 | 88 |
189 | 2021-03-02 | 2050-01-01 | 169 |
表:enablement_changes
user_id | start_at | end_at | enablement |
---|---|---|---|
189 | 2020-12-12 | 2021-10-15 | disabled |
189 | 2021-10-15 | 2050-01-01 | enabled |
重要的是我知道用户何时处于某个company_id
并且是enabled
或disabled
。
我想要的结果是这样的表格:
user_id | start_at | end_at | company_id | status |
---|---|---|---|---|
189 | 2020-12-12 | 2021-03-02 | 88 | disabled |
189 | 2021-03-02 | 2021-10-15 | 169 | disabled |
189 | 2021-10-15 | 2050-01-01 | 169 | enabled |
我本质上想将这些查询的结果组合在一起。 2050-01-01 是未来的任意日期。由于user_id
没有更改status
或company_id
,因此它显示为2050-01-01,因为它是用户的当前状态。
知道如何解决这个问题吗?
这里是小提琴:http://sqlfiddle.com/#!9/5c42b6
第一次在 *** 上提问...如果我的问题格式不正确,请告诉我。
【问题讨论】:
【参考方案1】:如果在实践中您有更复杂的数据并且可能存在重叠的时间间隔,例如:
表:enablement_changes
user_id | start_at | end_at | enablement |
---|---|---|---|
189 | 2020-12-12 | 2021-10-15 | disabled |
189 | 2020-12-20 | 2021-02-10 | enabled |
189 | 2021-10-15 | 2050-01-01 | enabled |
我推荐一个更复杂的解决方案:
WITH _k AS (
SELECT 1 AS n
UNION ALL
SELECT 2 AS n
), _points AS (
SELECT user_id, CASE WHEN n = 1 THEN start_at ELSE end_at END AS date_point, n
FROM company_changes
CROSS JOIN _k
UNION
SELECT user_id, CASE WHEN n = 1 THEN start_at ELSE end_at END AS date_point, n
FROM enablement_changes
CROSS JOIN _k
), _drank AS (
SELECT p.user_id, p.date_point, DENSE_RANK() OVER(PARTITION BY p.user_id ORDER BY p.date_point) AS dr
FROM _points AS p
GROUP BY p.user_id, p.date_point
)
SELECT d1.user_id, d1.date_point AS start_at, d2.date_point AS end_at, c.company_id, MAX(s.status) AS status -- or MIN if status disabled is stronger than enabled in the same time
FROM _drank AS d1
JOIN _drank AS d2 ON d1.dr = d2.dr-1 AND d1.user_id = d2.user_id
LEFT JOIN company_changes AS c ON d1.user_id = c.user_id AND d1.date_point < c.end_at AND c.start_at < d2.date_point
LEFT JOIN enablement_changes AS s ON d1.user_id = s.user_id AND d1.date_point < s.end_at AND s.start_at < d2.date_point
GROUP BY d1.user_id, d1.date_point, d2.date_point, c.company_id
ORDER BY 1,2,3;
db<>fiddle demo
输出:
【讨论】:
【参考方案2】:Lukasz 解决方案很好。
but 将匹配c
表结束时间与e
表开始时间匹配的行。通常日期时间范围希望包含 start
但不匹配 end
否则您将得到两行。
它会错过任何在c
开始和结束于e
表行之后的连接,但您想要匹配子集。后一点取决于您是进行密集匹配(始终有行)还是稀疏匹配(有时只有行)
第一个问题可以通过额外的检查来解决:
SELECT c.user_id
,GREATEST(c.start_at, e.start_at) AS start_at
,LEAST(c.end_at, e.end_at) AS end_at
,c.company_id
,e.status
FROM company_changes c
JOIN enablement_changes e
ON (c.start_at BETWEEN e.start_at AND e.end_at AND c.start_at < e.end_at
OR c.end_at BETWEEN e.start_at AND e.end_at AND c.end_at > e.start_at )
AND c.user_id = e.user_id
ORDER BY 1,2;
在哪里匹配你需要的稀疏匹配。
SELECT c.user_id
,GREATEST(c.start_at, e.start_at) AS start_at
,LEAST(c.end_at, e.end_at) AS end_at
,c.company_id
,e.status
FROM company_changes c
JOIN enablement_changes e
ON c.user_id = e.user_id
AND (c.end_at > e.start AND c.start_at < e.end_at)
ORDER BY 1,2;
在具有大范围的非常大的表上,后面的代码可能很昂贵
【讨论】:
【参考方案3】:使用JOIN
和BETWEEN
:
SELECT c.user_id
,GREATEST(c.start_at, e.start_at) AS start_at
,LEAST(c.end_at, e.end_at) AS end_at
,c.company_id
,e.status
FROM company_changes c
JOIN enablement_changes e
ON (c.start_at BETWEEN e.start_at AND e.end_at
OR c.end_at BETWEEN e.start_at AND e.end_at)
AND c.user_id = e.user_id
ORDER BY 1,2;
db<>fiddle demo
输出:
【讨论】:
以上是关于如何将开始日期和结束日期与其他开始日期和结束日期分开?的主要内容,如果未能解决你的问题,请参考以下文章