使用分组查询自联接

Posted

技术标签:

【中文标题】使用分组查询自联接【英文标题】:query for self join with grouping 【发布时间】:2018-09-02 09:17:54 【问题描述】:

鉴于一组课程预订,我需要确定参加每门课程的学生所需的房间总数和类型。课程可以并行或嵌套或重叠运行。 要实现的逻辑:对于每个课程持续时间,找到该持续时间内所有其他活动的课程,以及 Number_of_students 的总和 这些课程按 room_type 分组。 还存在其他复杂情况,但下面介绍了该问题的简化版本。 我目前使用 hsqldb,解决方案应该使用 std sql 语法,以便跨数据库移植。

预订表

BOOKING_ID| COURSE_ID| NUMBER_OF_STUDENTS| ROOM_TYPE_ID
    10    |    2     |        1          |    1
    20    |    1     |        2          |    1
    30    |    3     |        1          |    3
    40    |    1     |        3          |    4
    50    |    5     |        1          |    2
    60    |    6     |        2          |    2
    70    |    3     |        2          |    1
    80    |    4     |        1          |    3

课程表

COURSE_ID| START_DATE |  END_DATE
    1    | 2018-05-15 |  2018-06-14    //sample course
    2    | 2018-05-11 |  2018-05-20    //starts before ends between sample course
    3    | 2018-05-18 |  2018-05-22    //starts between ends between sample course
    4    | 2018-05-20 |  2018-06-20    //starts between ends after sample course
    5    | 2018-05-10 |  2018-06-20    //starts before ends after sample course
    6    | 2018-05-10 |  2018-05-14    //starts and ends before sample course
    7    | 2018-06-15 |  2018-06-20    //starts and ends after sample course

Rooms Table(我们这里不需要这个,只是为了完整性)

ROOM_TYPE_ID| ROOM_CAPACITY| ROOM_LOCATION
    1       |    1         |  HILL
    2       |    2         |  HILL
    3       |    1         |  OCEAN
    4       |    2         |  OCEAN

输出(仅显示 course_id 1,所有课程都需要)

COURSE_ID | ROOMTYPE | COURSE_STUDENT | OTHER_STUDENTS 
    1     |   1      |        2       |      3           //1(course 2) + 2 (course 3)
    1     |   2      |        0       |      1           //1(course 5)
    1     |   3      |        0       |      2           //1(course 3) + 1(course 4)
    1     |   4      |        3       |      0           //no students on others

我只能找出匹配给定课程的重叠课程的条件 startDateendDate

Courses.START_DATE <= startDate  AND Courses.END_DATE >= endDate    OR        //matches any course spanning current course
Courses.START_DATE >= startDate  AND Courses.END_DATE <= startDate  OR        //matches any course starting during the current course
Courses.START_DATE >= endDate    AND Courses.END_DATE <= endDate              //matches any course ending during the current course

除此之外,我微薄的 sql 技能,让我悲惨地失败了。我可以编写一些 java 代码来解决这个问题....但这会很蹩脚 && 效率低下。

【问题讨论】:

【参考方案1】:

感谢 Fredt 为我指明了正确的方向...将主表与自身的所有记录连接到所有记录,然后根据重叠的课程日期标准进行过滤。 下面的查询效率低,但现在可以完成工作....可能会有更多优化,我想听听其他意见

SELECT
        THIS_COURSE.COURSE_ID, 
        OTHER_COURSE.ROOM_TYPE_ID,
        SUM(CASE  WHEN THIS_COURSE.BOOKING_ID = OTHER_COURSE.BOOKING_ID THEN OTHER_COURSE.NUMBER_OF_STUDENTS ELSE 0 END) AS COURSE_STUDENTS,
        SUM(CASE  WHEN THIS_COURSE.BOOKING_ID <> OTHER_COURSE.BOOKING_ID THEN OTHER_COURSE.NUMBER_OF_STUDENTS ELSE 0 END) AS OTHER_STUDENTS,
        SUM(OTHER_COURSE.NUMBER_OF_STUDENTS) AS TOTAL_STUDENTS
FROM

(
    SELECT 
        BOOKINGS.BOOKING_ID, 
        BOOKINGS.COURSE_ID, 
        BOOKINGS.NUMBER_OF_STUDENTS, 
        BOOKINGS.ROOM_TYPE_ID, 
        COURSES.START_DATE, 
        COURSES.END_DATE 
    FROM 
        BOOKINGS , COURSES 
    WHERE 
        BOOKINGS.COURSE_ID = COURSES.COURSE_ID
) THIS_COURSE

LEFT JOIN 
(
    SELECT 
        BOOKINGS.BOOKING_ID, 
        BOOKINGS.COURSE_ID, 
        BOOKINGS.NUMBER_OF_STUDENTS, 
        BOOKINGS.ROOM_TYPE_ID, 
        COURSES.START_DATE, 
        COURSES.END_DATE 
    FROM 
        BOOKINGS , COURSES 
    WHERE 
        BOOKINGS.COURSE_ID = COURSES.COURSE_ID
) OTHER_COURSE

ON 
    THIS_COURSE.BOOKING_ID <> OTHER_COURSE.BOOKING_ID OR
    THIS_COURSE.BOOKING_ID = OTHER_COURSE.BOOKING_ID

WHERE
        (THIS_COURSE.START_DATE <= OTHER_COURSE.START_DATE AND THIS_COURSE.END_DATE >= OTHER_COURSE.END_DATE)  OR
        (THIS_COURSE.START_DATE <= OTHER_COURSE.START_DATE AND THIS_COURSE.END_DATE >= OTHER_COURSE.START_DATE)  OR
        (THIS_COURSE.START_DATE <= OTHER_COURSE.END_DATE   AND THIS_COURSE.END_DATE >= OTHER_COURSE.END_DATE)  

GROUP BY 
    THIS_COURSE.COURSE_ID, OTHER_COURSE.ROOM_TYPE_ID

下面是创建示例数据的sql

CREATE TABLE Bookings(BOOKING_ID INTEGER NOT NULL PRIMARY KEY, COURSE_ID INTEGER NOT NULL, NUMBER_OF_STUDENTS INTEGER NOT NULL, ROOM_TYPE_ID INTEGER NOT NULL)
CREATE TABLE Courses(COURSE_ID INTEGER NOT NULL PRIMARY KEY, START_DATE DATE,  END_DATE  DATE)
CREATE TABLE Rooms(ROOM_TYPE_ID INTEGER NOT NULL PRIMARY KEY, ROOM_CAPACITY INTEGER NOT NULL, ROOM_LOCATION VARCHAR(25))

INSERT INTO Bookings VALUES(    10   ,    2    ,        1         ,    1 )
INSERT INTO Bookings VALUES(    20   ,    1    ,        2         ,    1 )
INSERT INTO Bookings VALUES(    30   ,    3    ,        1         ,    3 )
INSERT INTO Bookings VALUES(    40   ,    1    ,        3         ,    4 )
INSERT INTO Bookings VALUES(    50   ,    5    ,        1         ,    2 )
INSERT INTO Bookings VALUES(    60   ,    6    ,        2         ,    2 )
INSERT INTO Bookings VALUES(    70   ,    3    ,        2         ,    1 )
INSERT INTO Bookings VALUES(    80   ,    4    ,        1         ,    3 )
INSERT INTO Bookings VALUES(    90   ,    7    ,        1         ,    4 )


INSERT INTO Courses VALUES(    1    ,'2018-05-15', '2018-06-14' )
INSERT INTO Courses VALUES(    2    ,'2018-05-11', '2018-05-20' )
INSERT INTO Courses VALUES(    3    ,'2018-05-18', '2018-05-22' )
INSERT INTO Courses VALUES(    4    ,'2018-05-20', '2018-06-20' )
INSERT INTO Courses VALUES(    5    ,'2018-05-10', '2018-06-20' )
INSERT INTO Courses VALUES(    6    ,'2018-05-10', '2018-05-14' )
INSERT INTO Courses VALUES(    7    ,'2018-06-15', '2018-06-20' )


INSERT INTO Rooms VALUES(    1       ,    1        ,  'HILL')
INSERT INTO Rooms VALUES(    2       ,    2        ,  'HILL')
INSERT INTO Rooms VALUES(    3       ,    1        ,  'OCEAN')
INSERT INTO Rooms VALUES(    4       ,    2        ,  'OCEAN')

【讨论】:

查询看起来不错。您可以使用 WITH 子句来分解重复的内部查询。您还可以对日期使用 OVERLAPS 谓词。例如PERIOD THIS_COURSE.START_DATE, THIS_COURSE.END_DATE + 1 DAY) OVERLAPS PERIOD (OTHER_COURSE_START_DATE, OTHER_COURSE_END_DATE + 1 DAY)。请注意,结束时间段是专有的,因此您需要添加 1 DAY 感谢 fredt,我将尝试合并 OVERLAPS 谓词,但是我无法理解您关于 WITH 子句删除 THIS_COURSE 和 OTHER_COURSE 块的建议,您能否进一步详细说明。 我使用您的示例数据重新编写了查询【参考方案2】:

您实际上想要每门课程的每种类型房间所需的房间数量。因此,您需要从 COURSES 表开始并将其与其他两个表连接起来。

SELECT * FROM COURSES JOIN BOOKINGS USING (COURSE_ID) JOIN ROOMS USING (ROOM_TYPE_ID)

这会为您提供所有房间预订的长列表。然后,您可以将此表视为子查询表,并根据日期期间将其连接到自身。

WITH ROOM_BOOKINGS AS (
  SELECT 
    BOOKINGS.BOOKING_ID, 
    BOOKINGS.COURSE_ID, 
    BOOKINGS.NUMBER_OF_STUDENTS, 
    BOOKINGS.ROOM_TYPE_ID, 
    COURSES.START_DATE, 
    COURSES.END_DATE, 
    ROOMS.ROOM_CAPACITY
  FROM 
    COURSES JOIN BOOKINGS USING (COURSE_ID) JOIN ROOMS USING (ROOM_TYPE_ID)
 ) 
 SELECT * FROM ROOM_BOOKINGS THIS_COURSE LEFT JOIN ROOM_BOOKINGS OTHER_COURSE
 ON (THIS_COURSE.START_DATE, THIS_COURSE.END_DATE + 1 DAY) OVERLAPS (OTHER_COURSE.START_DATE, OTHER_COURSE.END_DATE + 1 DAY)
 AND THIS_COURSE.ROOM_TYPE_ID = OTHER_COURSE.ROOM_TYPE_ID 
 AND THIS_COURSE.COURSE_ID  <> OTHER_COURSE.COURSE_ID

您需要完成上述查询,并在 SELECT 中添加条件以仅返回一门课程。您还需要 GROUP BY A.COURSE_ID、A.ROOM_TYPE_ID、A.NUMBER_OF_STUDENTS、... 和 SUM(B.NUMBER_OF_STUDENS) 来实现所需的输出。

如您所见,编写高级 SQL 查询并非易事,需要对 SQL 语言有很好的了解。

【讨论】:

以上是关于使用分组查询自联接的主要内容,如果未能解决你的问题,请参考以下文章

了解在 SQL 查询的自联接中使用“Between”条件时的逻辑查询处理

自联接的困难 MySQL 更新查询

SQL Server - 使用内部查询自联接更新值的代码

需要解释在同一列上进行自联接查询的工作[重复]

LINQ 查询中的自联接并返回视图

自联接、子查询或其他?