SQL 跨项目的不同工作日期,不包括休息日期
Posted
技术标签:
【中文标题】SQL 跨项目的不同工作日期,不包括休息日期【英文标题】:SQL distinct Worked Dates across Projects excluding Break Dates 【发布时间】:2021-08-29 07:42:49 【问题描述】:考虑以下架构;
CREATE TABLE `Project Assignment`
(`Employee` varchar(1), `Project Id` int, `Project Assignment Date` date, `Project Relieving Date` date)
;
INSERT INTO `Project Assignment`
(`Employee`, `Project Id`, `Project Assignment Date`, `Project Relieving Date`)
VALUES
('A', 1, '2018-04-01', '2019-12-25'),
('A', 2, '2019-06-15', '2020-03-31'),
('A', 3, '2019-09-07', '2020-05-20'),
('A', 4, '2020-07-14', '2020-12-15')
;
CREATE TABLE `Break`
(`Break Id` int, `Employee` varchar(1), `Project Id` int, `Break Start Date` date, `Break End Date` date)
;
INSERT INTO `Break`
(`Break Id`, `Employee`, `Project Id`, `Break Start Date`, `Break End Date`)
VALUES
(1, 'A', 1, '2018-09-01', '2018-09-30'),
(2, 'A', 1, '2019-10-05', '2019-11-30'),
(3, 'A', 2, '2019-10-15', '2019-11-15'),
(4, 'A', 3, '2019-11-01', '2019-11-10'),
(5, 'A', 2, '2020-01-01', '2020-01-10'),
(6, 'A', 3, '2020-01-01', '2020-01-10')
;
在项目期间,员工可以在每个项目中休息一次或多次。中断在 Project 内不重叠,但可以在项目之间重叠。
我们想要员工至少分配一个项目的天数(减去)该员工在所有分配项目上的休息天数。
我能够通过使用以下查询得出员工被分配到项目的不同天数:
SELECT merged.employee,
SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
FROM (SELECT
s1.employee, s1.`Project Assignment Date`,
MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
FROM `Project Assignment` s1
INNER JOIN `Project Assignment` t1
ON t1.employee = s1.employee
AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
WHERE t2.employee = s1.employee
AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date`
AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
WHERE s2.employee = s1.employee
AND s1.`Project Assignment Date` > s2.`Project Assignment Date`
AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
GROUP BY s1.employee, s1.`Project Assignment Date`
ORDER BY s1.`Project Assignment Date`) merged
GROUP BY merged.employee
结果:
| employee | assigned_days |
| -------- | ------------- |
| A | 936 |
但想不出一个方法来推算这个人在所有分配的项目上的休息天数。
预期结果:
+----------+---------------+------------+-------------+
| employee | assigned_days | break_days | worked_days |
+==========+===============+============+=============+
| A | 936 | 50 | 886 |
+----------+---------------+------------+-------------+
Mariadb 10.3.29
锻炼break_days的解释
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| Employee | Project | Break Start | Break End | Days Considered | Remarks |
+==========+=========+=============+==================+=================+===================================================================================================================+
| A | 1 | 2018-09-01 | 2018-09-30 | 30 | Only one project assigned so consider whole break |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A | 1 | 2019-10-05 | 2019-11-30 | 10 | 3 Projects were assigned during these breaks. The common days of break fall between 2019-11-01 and 2019-11-10 |
+----------+---------+-------------+------------------+ | |
| A | 2 | 2019-10-15 | 2019-11-15 | | |
+----------+---------+-------------+------------------+ | |
| A | 3 | 2019-11-01 | 2019-11-10 | | |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| A | 2 | 2020-01-01 | 2020-01-10 | 10 | 2 Projects were assigned during this time and break in both projects |
+----------+---------+-------------+------------------+ | |
| A | 3 | 2020-01-01 | 2020-01-10 | | |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
| | | | Total Break Days | 50 | |
+----------+---------+-------------+------------------+-----------------+-------------------------------------------------------------------------------------------------------------------+
DB-Fiddle 链接:https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/0
【问题讨论】:
Edit 问题并展示您已经尝试过的内容。解释失败的原因/位置。具体(错误消息、意外结果等)。 感谢@Strawberry 的链接,它非常有用。我没有意识到我可以让其他人更容易帮助我。 请为给定的数据集提供所需的结果 @Strawberry:连同表格说明一起完成 你的mysql是什么版本的? 【参考方案1】:使用递归 CTE 获取每位员工的所有工作日期和所有休息日期。
然后,对于这两种情况下的每个日期,通过聚合将所有项目作为逗号分隔列表获取,并带有GROUP_CONCAT()
。
如果这些列表在某个日期匹配,则这是一个休息日期。
WITH RECURSIVE
working_dates AS (
SELECT `Employee`, `Project Id`, `Project Assignment Date` AS date, `Project Relieving Date`
FROM `Project Assignment`
UNION ALL
SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Project Relieving Date`
FROM working_dates
WHERE date < `Project Relieving Date`
),
break_dates AS (
SELECT `Employee`, `Project Id`, `Break Start Date` AS date, `Break End Date`
FROM `Break`
UNION ALL
SELECT `Employee`, `Project Id`, date + INTERVAL 1 day, `Break End Date`
FROM break_dates
WHERE date < `Break End Date`
),
working AS (
SELECT `Employee`, date,
GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
FROM working_dates
GROUP BY `Employee`, date
),
breaks AS (
SELECT `Employee`, date,
GROUP_CONCAT(`Project Id` ORDER BY `Project Id`) projects
FROM break_dates
GROUP BY `Employee`, date
)
SELECT w.`Employee`,
COUNT(*) assigned_days,
COUNT(b.date) AS break_days,
COUNT(*) - COUNT(b.date) worked_days
FROM working w LEFT JOIN breaks b
ON w.`Employee` = b.`Employee` AND w.date = b.date AND w.projects = b.projects
GROUP BY w.`Employee`
请参阅demo。
【讨论】:
虽然它确实提供了结果,但该查询在生产数据上花费了 100 多秒,大约有 4.5k 项目分配和 162 次中断。所以它可能无法很好地扩展。但是你的方法给了我另一种选择的暗示。将尝试并报告。谢谢! 将你的答案标记为答案,因为它可以工作(尽管速度很慢),而且我可以根据你的方法得出更快的结果(运行时间不到 10 秒)。谢谢!【参考方案2】:将Break Id
列添加到Break
表后,我可以利用@forpass 建议的聚合技术来推导出休息日:
然后,对于这两种情况下的每个日期,通过 GROUP_CONCAT() 将所有项目作为逗号分隔列表获取。
对于每个中断,获取重叠项目的计数和列表(使用 GROUP_CONCAT)。
然后通过Break
再次加入它以查找重叠中断的计数和列表以及最小的常见重叠(最新开始和最早结束)。使用ROW_NUMBER
消除重复。
将 Assigned Days 的查询移动到另一个 CTE 并与 CTE 连接以获取休息时间 想要的结果。
WITH breaks_summary AS (
SELECT `Employee`, SUM(break_days) break_days
FROM (
SELECT b.`Employee`, DATEDIFF(b.end_date, b.start_date)+1 break_days, ROW_NUMBER() OVER (PARTITION BY b.break_ids) rn, overlapping_breaks, break_ids, projects_count
FROM (
SELECT b_p_cnt.`Employee`, b_p_cnt.`Project Id`, b_p_cnt.projects_count,
COUNT(b2.`Break Id`) overlapping_breaks, GROUP_CONCAT(b2.`Break Id`) break_ids, MAX(b2.start_date) start_date, MIN(b2.end_date) end_date
FROM (
SELECT b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date, GROUP_CONCAT(pa.`Project Id`) projects, count(pa.`Project Id`) projects_count
FROM (
SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
FROM `Break`
) b1
LEFT JOIN `Project Assignment` pa ON b1.`Employee` = pa.`Employee`
AND ((b1.start_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE()))
OR (b1.end_date BETWEEN pa.`Project Assignment Date` AND IFNULL(pa.`Project Relieving Date`,CURDATE())))
GROUP BY b1.`Break Id`, b1.`Employee`, b1.`Project Id`, b1.start_date, b1.end_date) b_p_cnt
LEFT JOIN (
SELECT `Break Id`, `Employee`, `Project Id`, `Break Start Date` AS start_date, `Break End Date` AS end_date
FROM `Break`
ORDER BY `Break Id`) b2 ON b_p_cnt.`Employee` = b2.`Employee`
AND ((b_p_cnt.start_date BETWEEN b2.start_date AND b2.end_date)
OR (b_p_cnt.end_date BETWEEN b2.start_date AND b2.end_date))
GROUP BY b_p_cnt.`Break Id`, b_p_cnt.`Employee`, b_p_cnt.`Project Id`,
b_p_cnt.start_date, b_p_cnt.end_date, b_p_cnt.projects, b_p_cnt.projects_count
HAVING count(b2.`Break Id`) = b_p_cnt.projects_count
ORDER BY b_p_cnt.`Employee`, `Project Id`) b
) breaks
WHERE rn = 1
GROUP BY `Employee`),
assigned AS (
SELECT merged.`Employee`, SUM(DATEDIFF(merged.EndDate,merged.`Project Assignment Date`)+1) assigned_days
FROM (SELECT s1.`Employee`, s1.`Project Assignment Date`,
MIN(IFNULL(t1.`Project Relieving Date`,CURDATE())) AS EndDate
FROM `Project Assignment` s1
INNER JOIN `Project Assignment` t1 ON t1.`Employee` = s1.`Employee`
AND s1.`Project Assignment Date` <= IFNULL(t1.`Project Relieving Date`,CURDATE())
AND NOT EXISTS( SELECT * FROM `Project Assignment` t2
WHERE t2.`Employee` = s1.`Employee`
AND IFNULL(t1.`Project Relieving Date`,CURDATE()) >= t2.`Project Assignment Date`
AND IFNULL(t1.`Project Relieving Date`,CURDATE()) < IFNULL(t2.`Project Relieving Date`,CURDATE()))
WHERE NOT EXISTS( SELECT * FROM `Project Assignment` s2
WHERE s2.`Employee` = s1.`Employee`
AND s1.`Project Assignment Date` > s2.`Project Assignment Date`
AND s1.`Project Assignment Date` <= IFNULL(s2.`Project Relieving Date`,CURDATE()))
GROUP BY s1.`Employee`, s1.`Project Assignment Date`
ORDER BY s1.`Project Assignment Date`) merged
GROUP BY merged.`Employee`)
SELECT ad.`Employee`,
ad.assigned_days,
IFNULL(bs.break_days,0) break_days,
(ad.assigned_days - IFNULL(bs.break_days,0)) worked_days
FROM assigned ad
LEFT JOIN breaks_summary bs ON ad.`Employee` = bs.`Employee`
使用查询更新 DB-Fiddle:https://www.db-fiddle.com/f/c8fMneAUkhb2P3rzjMtVZm/3
感谢所有通过改进问题和提供可能答案做出贡献的人。
【讨论】:
以上是关于SQL 跨项目的不同工作日期,不包括休息日期的主要内容,如果未能解决你的问题,请参考以下文章