SQL:查找连续几天不存在的行组

Posted

技术标签:

【中文标题】SQL:查找连续几天不存在的行组【英文标题】:SQL: Find group of rows for consecutive days absent 【发布时间】:2018-03-19 01:47:23 【问题描述】:

我的 Microsoft SQL Server 2016 中有以下 Attendance 表:

ID         StudentID  Date          AbsenceReasonID
----------------------------------------------------
430957     10158      2018-02-02    2   
430958     10158      2018-02-03    2   
430959     10158      2018-02-04    11  
430960     12393      2018-03-15    9   
430961     1          2018-03-15    9   
430962     12400      2018-03-15    9   
430963     5959       2018-03-15    11  

我想要一个查询来检索一组行,其中单个学生 (StudentID) 的 Date 列连续出现 3 次或更多缺勤。理想情况下,运行查询后的以下数据将是

ID         StudentID  Date          AbsenceReasonID
----------------------------------------------------
430957     10158      2018-02-02    2   
430958     10158      2018-02-03    2   
430959     10158      2018-02-04    11  

请注意,如果学生在周五缺席,我希望将其延续到周末至周一(忽略周末日期)。

如果需要更多信息来更好地帮助您帮助我,请告诉我。我已使用以下查询作为入门,但知道这不是我想要的:

SELECT 
    CONVERT(datetime, A.DateOF, 103),
    A.SchoolNum, EI.FullName,
    COUNT(A.SchoolNum) as 'Absences'
FROM 
    Attendance A
INNER JOIN 
    EntityInformation EI ON EI.SchoolNum = A.SchoolNum AND EI.Deleted = 0
INNER JOIN 
    Enrolment E ON EI.SchoolNum = E.SchoolNum AND E.Deleted = 0
GROUP BY 
    A.SchoolNum, A.DateOf, FullName
HAVING 
    COUNT(A.SchoolNum) > 1
    AND A.DateOf = GETDATE()
    AND A.SchoolNum in (SELECT SchoolNum FROM Attendance A1 
                        WHERE A1.DateOf = A.DateOf -7)

这更像是一种静态解决方案,用于检索学生 ID 在过去 7 天内出现两次的缺勤情况。这既不是连续的,也不是三天或更长时间。

【问题讨论】:

解释你的日历表是​​什么样子的——你知道的,那个显示有效日期的表。周末不会是唯一会被删除的日子。 【参考方案1】:

您可以使用它来查找您的缺勤范围。在这里,我使用递归 CTE 对几年中的所有日子进行编号,同时记录他们的工作日。然后使用另一个递归 CTE 加入一天又一天的同一学生的缺勤日期,考虑到应该跳过周末(阅读 join 子句中的 CASE WHEN)。最后显示每个连续 N 天过滤的缺席狂欢。

SET DATEFIRST 1 -- Monday = 1, Sunday = 7

;WITH Days AS
(
    -- Recursive anchor: hard-coded first date
    SELECT
        GeneratedDate = CONVERT(DATE, '2017-01-01')

    UNION ALL

    -- Recursive expression: all days until day X
    SELECT
        GeneratedDate = DATEADD(DAY, 1, D.GeneratedDate)
    FROM
        Days AS D
    WHERE
        DATEADD(DAY, 1, D.GeneratedDate) <= '2020-01-01'
),
NumberedDays AS
(
    SELECT
        GeneratedDate = D.GeneratedDate,
        DayOfWeek = DATEPART(WEEKDAY, D.GeneratedDate),
        DayNumber = ROW_NUMBER() OVER (ORDER BY D.GeneratedDate ASC)
    FROM
        Days AS D
),
AttendancesWithNumberedDays AS
(
    SELECT
        A.*,
        N.*
    FROM
        Attendance AS A
        INNER JOIN NumberedDays AS N ON A.Date = N.GeneratedDate
),
AbsenceSpree AS
(
    -- Recursive anchor: absence day with no previous absence, skipping weekends
    SELECT
        StartingAbsenceDate = A.Date,
        CurrentDateNumber = A.DayNumber,
        CurrentDateDayOfWeek = A.DayOfWeek,
        AbsenceDays = 1,
        StudentID = A.StudentID
    FROM
        AttendancesWithNumberedDays AS A
    WHERE
        NOT EXISTS (
            SELECT
                'no previous absence date'
            FROM
                AttendancesWithNumberedDays AS X
            WHERE
                X.StudentID = A.StudentID AND
                X.DayNumber = CASE A.DayOfWeek 
                    WHEN 1 THEN A.DayNumber - 3 -- When monday then friday (-3)
                    WHEN 7 THEN A.DayNumber - 2 -- When sunday then friday (-2)
                    ELSE A.DayNumber - 1 END)

    UNION ALL

    -- Recursive expression: find the next absence day, skipping weekends
    SELECT
        StartingAbsenceDate = S.StartingAbsenceDate,
        CurrentDateNumber = A.DayNumber,
        CurrentDateDayOfWeek = A.DayOfWeek,
        AbsenceDays = S.AbsenceDays + 1,
        StudentID = A.StudentID
    FROM
        AbsenceSpree AS S
        INNER JOIN AttendancesWithNumberedDays AS A ON
            S.StudentID = A.StudentID AND
            A.DayNumber = CASE S.CurrentDateDayOfWeek
                WHEN 5 THEN S.CurrentDateNumber + 3 -- When friday then monday (+3)
                WHEN 6 THEN S.CurrentDateNumber + 2 -- When saturday then monday (+2)
                ELSE S.CurrentDateNumber + 1 END
)
SELECT
    StudentID = A.StudentID,
    StartingAbsenceDate = A.StartingAbsenceDate,
    EndingAbsenceDate = MAX(N.GeneratedDate),
    AbsenceDays = MAX(A.AbsenceDays)
FROM
    AbsenceSpree AS A
    INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
GROUP BY
    A.StudentID,
    A.StartingAbsenceDate
HAVING
    MAX(A.AbsenceDays) >= 3
OPTION
    (MAXRECURSION 5000)

如果要列出原始考勤表行,可以替换最后一个选择:

SELECT
    StudentID = A.StudentID,
    StartingAbsenceDate = A.StartingAbsenceDate,
    EndingAbsenceDate = MAX(N.GeneratedDate),
    AbsenceDays = MAX(A.AbsenceDays)
FROM
    AbsenceSpree AS A
    INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
GROUP BY
    A.StudentID,
    A.StartingAbsenceDate
HAVING
    MAX(A.AbsenceDays) >= 3

用这个CTE + SELECT:

,
FilteredAbsenceSpree AS
(
    SELECT
        StudentID = A.StudentID,
        StartingAbsenceDate = A.StartingAbsenceDate,
        EndingAbsenceDate = MAX(N.GeneratedDate),
        AbsenceDays = MAX(A.AbsenceDays)
    FROM
        AbsenceSpree AS A
        INNER JOIN NumberedDays AS N ON A.CurrentDateNumber = N.DayNumber
    GROUP BY
        A.StudentID,
        A.StartingAbsenceDate
    HAVING
        MAX(A.AbsenceDays) >= 3
)
SELECT
    A.*
FROM
    Attendance AS A
    INNER JOIN FilteredAbsenceSpree AS F ON A.StudentID = F.StudentID
WHERE
    A.Date BETWEEN F.StartingAbsenceDate AND F.EndingAbsenceDate
OPTION
    (MAXRECURSION 5000)

【讨论】:

【参考方案2】:

如果您需要在某个时间段内(比如说过去 7 天)缺勤,那么您可以这样做

 SELECT 
    ID,
    StudentID,
    [Date], 
    AbsenceReasonID
FROM(
SELECT 
    ID,
    StudentID,
    [Date], 
    AbsenceReasonID, 
    COUNT(StudentID) OVER(PARTITION BY StudentID ORDER BY StudentID) AS con, 
    ((DATEPART(dw, [Date]) + @@DATEFIRST) % 7) AS dw
FROM attendance
) D
WHERE 
     D.con > 2
AND [Date] >= '2018-02-02'
AND [Date] <= GETDATE()
AND dw NOT IN(0,1)

根据您给定的数据,输出将是

|     ID | StudentID |       Date | AbsenceReasonID |
|--------|-----------|------------|-----------------|
| 430957 |     10158 | 2018-02-02 |               2 |

你可以随意调整输出。

SQL Fiddle

【讨论】:

谢谢,但我要求查询仅返回单个学生连续缺勤 3 次或更多(跳过周末)的一组行。此外,这需要是动态的,并根据您的查询设置为 7 天的静态范围。 @ChristianTownsend 很抱歉听到这个消息,因为我确信当我发布我的答案时,帖子中没有提到这一点!无论如何,我已经更新了我的答案【参考方案3】:

试试这个:

CTE 包含学生在前一天和后一天(不包括周末)缺勤的缺勤日期。末尾的 2 个UNION 加回每组的第一个和最后一个并消除重复项。

with cte(id, studentId, dateof , absenceReasonId)
as
(
select a.* 
from attendance a
where exists (select 1 from attendance preva
              where preva.studentID = a.studentID
              and   datediff(day, preva.dateof, a.dateof)
                    <= (case when datepart(dw, preva.dateof) >= 5
                        then 8 - datepart(dw, preva.dateof)
                        else 1 
                        end)
              and preva.dateof < a.dateof)
and exists (select 1 from attendance nexta
              where nexta.studentID = a.studentID
              and   datediff(day, a.dateof, nexta.dateof)
                    <= (case when datepart(dw, a.dateof) >= 5
                        then 8 - datepart(dw, a.dateof)
                        else 1 
                        end)
              and nexta.dateof > a.dateof))              

select cte.*
from cte
union  -- use union to remove duplicates
select preva.* 
from
attendance preva
inner join
cte
on preva.studentID = cte.studentID
and preva.dateof < cte.dateof
and datediff(day, preva.dateof, cte.dateof)
                    <= (case when datepart(dw, preva.dateof) >= 5
                        then 8 - datepart(dw, preva.dateof)
                        else 1 
                        end) 
union
select nexta.*
from attendance nexta
inner join
cte
on nexta.studentID = cte.studentID
and   datediff(day, cte.dateof, nexta.dateof)
       <= (case when datepart(dw, cte.dateof) >= 5
                then 8 - datepart(dw, cte.dateof)
                else 1 
            end)
and nexta.dateof > cte.dateof  
order by studentId, dateof 

sqlfiddle

【讨论】:

以上是关于SQL:查找连续几天不存在的行组的主要内容,如果未能解决你的问题,请参考以下文章

SQL Server 2008 R2:查找 column2 值存在于 column1 中的行

如何用SQL解决连续几天的问题

Oracle SQL - 过滤掉包含具有特定值的行的分区或行组

sql 语句:一个字段,连续几天值大于0,获得天数 怎么解决的?请教

SQL:查找给定字段连续几天具有不同字符串值的记录

如何确定sql中前n个月存在的行或记录?