SQL:查找连续几天不存在的行组
Posted
技术标签:
【中文标题】SQL:查找连续几天不存在的行组【英文标题】:SQL: Find group of rows for consecutive days absent 【发布时间】:2018-03-19 01:47:23 【问题描述】:我的 Microsoft SQL Server 2016 中有以下 Attendance
表:
ID StudentID Date AbsenceReasonID
----------------------------------------------------
430957 10158 2018-02-02 2
430958 10158 2018-02-03 2
430959 10158 2018-02-04 11
430960 12393 2018-03-15 9
430961 1 2018-03-15 9
430962 12400 2018-03-15 9
430963 5959 2018-03-15 11
我想要一个查询来检索一组行,其中单个学生 (StudentID
) 的 Date
列连续出现 3 次或更多缺勤。理想情况下,运行查询后的以下数据将是
ID StudentID Date AbsenceReasonID
----------------------------------------------------
430957 10158 2018-02-02 2
430958 10158 2018-02-03 2
430959 10158 2018-02-04 11
请注意,如果学生在周五缺席,我希望将其延续到周末至周一(忽略周末日期)。
如果需要更多信息来更好地帮助您帮助我,请告诉我。我已使用以下查询作为入门,但知道这不是我想要的:
SELECT
CONVERT(datetime, A.DateOF, 103),
A.SchoolNum, EI.FullName,
COUNT(A.SchoolNum) as 'Absences'
FROM
Attendance A
INNER JOIN
EntityInformation EI ON EI.SchoolNum = A.SchoolNum AND EI.Deleted = 0
INNER JOIN
Enrolment E ON EI.SchoolNum = E.SchoolNum AND E.Deleted = 0
GROUP BY
A.SchoolNum, A.DateOf, FullName
HAVING
COUNT(A.SchoolNum) > 1
AND A.DateOf = GETDATE()
AND A.SchoolNum in (SELECT SchoolNum FROM Attendance A1
WHERE A1.DateOf = A.DateOf -7)
这更像是一种静态解决方案,用于检索学生 ID 在过去 7 天内出现两次的缺勤情况。这既不是连续的,也不是三天或更长时间。
【问题讨论】:
解释你的日历表是什么样子的——你知道的,那个显示有效日期的表。周末不会是唯一会被删除的日子。 【参考方案1】:您可以使用它来查找您的缺勤范围。在这里,我使用递归 CTE
对几年中的所有日子进行编号,同时记录他们的工作日。然后使用另一个递归 CTE
加入一天又一天的同一学生的缺勤日期,考虑到应该跳过周末(阅读 join 子句中的 CASE WHEN
)。最后显示每个连续 N 天过滤的缺席狂欢。
SET DATEFIRST 1 -- Monday = 1, Sunday = 7
;WITH Days AS
(
-- Recursive anchor: hard-coded first date
SELECT
GeneratedDate = CONVERT(DATE, '2017-01-01')
UNION ALL
-- Recursive expression: all days until day X
SELECT
GeneratedDate = DATEADD(DAY, 1, D.GeneratedDate)
FROM
Days AS D
WHERE
DATEADD(DAY, 1, D.GeneratedDate) <= '2020-01-01'
),
NumberedDays AS
(
SELECT
GeneratedDate = D.GeneratedDate,
DayOfWeek = DATEPART(WEEKDAY, D.GeneratedDate),
DayNumber = ROW_NUMBER() OVER (ORDER BY D.GeneratedDate ASC)
FROM
Days AS D
),
AttendancesWithNumberedDays AS
(
SELECT
A.*,
N.*
FROM
Attendance AS A
INNER JOIN NumberedDays AS N ON A.Date = N.GeneratedDate
),
AbsenceSpree AS
(
-- Recursive anchor: absence day with no previous absence, skipping weekends
SELECT
StartingAbsenceDate = A.Date,
CurrentDateNumber = A.DayNumber,
CurrentDateDayOfWeek = A.DayOfWeek,
AbsenceDays = 1,
StudentID = A.StudentID
FROM
AttendancesWithNumberedDays AS A
WHERE
NOT EXISTS (
SELECT
'no previous absence date'
FROM
AttendancesWithNumberedDays AS X
WHERE
X.StudentID = A.StudentID AND
X.DayNumber = CASE A.DayOfWeek
WHEN 1 THEN A.DayNumber - 3 -- When monday then friday (-3)
WHEN 7 THEN A.DayNumber - 2 -- When sunday then friday (-2)
ELSE A.DayNumber - 1 END)
UNION ALL
-- Recursive expression: find the next absence day, skipping weekends
SELECT
StartingAbsenceDate = S.StartingAbsenceDate,
CurrentDateNumber = A.DayNumber,
CurrentDateDayOfWeek = A.DayOfWeek,
AbsenceDays = S.AbsenceDays + 1,
StudentID = A.StudentID
FROM
AbsenceSpree AS S
INNER JOIN AttendancesWithNumberedDays AS A ON
S.StudentID = A.StudentID AND
A.DayNumber = CASE S.CurrentDateDayOfWeek
WHEN 5 THEN S.CurrentDateNumber + 3 -- When friday then monday (+3)
WHEN 6 THEN S.CurrentDateNumber + 2 -- When saturday then monday (+2)
ELSE S.CurrentDateNumber + 1 END
)
SELECT
StudentID = A.StudentID,
StartingAbsenceDate = A.StartingAbsenceDate,
EndingAbsenceDate = MAX(N.GeneratedDate),
AbsenceDays = MAX(A.AbsenceDays)
FROM
AbsenceSpree AS A
INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
GROUP BY
A.StudentID,
A.StartingAbsenceDate
HAVING
MAX(A.AbsenceDays) >= 3
OPTION
(MAXRECURSION 5000)
如果要列出原始考勤表行,可以替换最后一个选择:
SELECT
StudentID = A.StudentID,
StartingAbsenceDate = A.StartingAbsenceDate,
EndingAbsenceDate = MAX(N.GeneratedDate),
AbsenceDays = MAX(A.AbsenceDays)
FROM
AbsenceSpree AS A
INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
GROUP BY
A.StudentID,
A.StartingAbsenceDate
HAVING
MAX(A.AbsenceDays) >= 3
用这个CTE + SELECT
:
,
FilteredAbsenceSpree AS
(
SELECT
StudentID = A.StudentID,
StartingAbsenceDate = A.StartingAbsenceDate,
EndingAbsenceDate = MAX(N.GeneratedDate),
AbsenceDays = MAX(A.AbsenceDays)
FROM
AbsenceSpree AS A
INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
GROUP BY
A.StudentID,
A.StartingAbsenceDate
HAVING
MAX(A.AbsenceDays) >= 3
)
SELECT
A.*
FROM
Attendance AS A
INNER JOIN FilteredAbsenceSpree AS F ON A.StudentID = F.StudentID
WHERE
A.Date BETWEEN F.StartingAbsenceDate AND F.EndingAbsenceDate
OPTION
(MAXRECURSION 5000)
【讨论】:
【参考方案2】:如果您需要在某个时间段内(比如说过去 7 天)缺勤,那么您可以这样做
SELECT
ID,
StudentID,
[Date],
AbsenceReasonID
FROM(
SELECT
ID,
StudentID,
[Date],
AbsenceReasonID,
COUNT(StudentID) OVER(PARTITION BY StudentID ORDER BY StudentID) AS con,
((DATEPART(dw, [Date]) + @@DATEFIRST) % 7) AS dw
FROM attendance
) D
WHERE
D.con > 2
AND [Date] >= '2018-02-02'
AND [Date] <= GETDATE()
AND dw NOT IN(0,1)
根据您给定的数据,输出将是
| ID | StudentID | Date | AbsenceReasonID |
|--------|-----------|------------|-----------------|
| 430957 | 10158 | 2018-02-02 | 2 |
你可以随意调整输出。
SQL Fiddle
【讨论】:
谢谢,但我要求查询仅返回单个学生连续缺勤 3 次或更多(跳过周末)的一组行。此外,这需要是动态的,并根据您的查询设置为 7 天的静态范围。 @ChristianTownsend 很抱歉听到这个消息,因为我确信当我发布我的答案时,帖子中没有提到这一点!无论如何,我已经更新了我的答案【参考方案3】:试试这个:
CTE
包含学生在前一天和后一天(不包括周末)缺勤的缺勤日期。末尾的 2 个UNION
加回每组的第一个和最后一个并消除重复项。
with cte(id, studentId, dateof , absenceReasonId)
as
(
select a.*
from attendance a
where exists (select 1 from attendance preva
where preva.studentID = a.studentID
and datediff(day, preva.dateof, a.dateof)
<= (case when datepart(dw, preva.dateof) >= 5
then 8 - datepart(dw, preva.dateof)
else 1
end)
and preva.dateof < a.dateof)
and exists (select 1 from attendance nexta
where nexta.studentID = a.studentID
and datediff(day, a.dateof, nexta.dateof)
<= (case when datepart(dw, a.dateof) >= 5
then 8 - datepart(dw, a.dateof)
else 1
end)
and nexta.dateof > a.dateof))
select cte.*
from cte
union -- use union to remove duplicates
select preva.*
from
attendance preva
inner join
cte
on preva.studentID = cte.studentID
and preva.dateof < cte.dateof
and datediff(day, preva.dateof, cte.dateof)
<= (case when datepart(dw, preva.dateof) >= 5
then 8 - datepart(dw, preva.dateof)
else 1
end)
union
select nexta.*
from attendance nexta
inner join
cte
on nexta.studentID = cte.studentID
and datediff(day, cte.dateof, nexta.dateof)
<= (case when datepart(dw, cte.dateof) >= 5
then 8 - datepart(dw, cte.dateof)
else 1
end)
and nexta.dateof > cte.dateof
order by studentId, dateof
sqlfiddle
【讨论】:
以上是关于SQL:查找连续几天不存在的行组的主要内容,如果未能解决你的问题,请参考以下文章
SQL Server 2008 R2:查找 column2 值存在于 column1 中的行
Oracle SQL - 过滤掉包含具有特定值的行的分区或行组