使用分组查询自联接
Posted
技术标签:
【中文标题】使用分组查询自联接【英文标题】:query for self join with grouping 【发布时间】:2018-09-02 09:17:54 【问题描述】:鉴于一组课程预订,我需要确定参加每门课程的学生所需的房间总数和类型。课程可以并行或嵌套或重叠运行。 要实现的逻辑:对于每个课程持续时间,找到该持续时间内所有其他活动的课程,以及 Number_of_students 的总和 这些课程按 room_type 分组。 还存在其他复杂情况,但下面介绍了该问题的简化版本。 我目前使用 hsqldb,解决方案应该使用 std sql 语法,以便跨数据库移植。
预订表
BOOKING_ID| COURSE_ID| NUMBER_OF_STUDENTS| ROOM_TYPE_ID
10 | 2 | 1 | 1
20 | 1 | 2 | 1
30 | 3 | 1 | 3
40 | 1 | 3 | 4
50 | 5 | 1 | 2
60 | 6 | 2 | 2
70 | 3 | 2 | 1
80 | 4 | 1 | 3
课程表
COURSE_ID| START_DATE | END_DATE
1 | 2018-05-15 | 2018-06-14 //sample course
2 | 2018-05-11 | 2018-05-20 //starts before ends between sample course
3 | 2018-05-18 | 2018-05-22 //starts between ends between sample course
4 | 2018-05-20 | 2018-06-20 //starts between ends after sample course
5 | 2018-05-10 | 2018-06-20 //starts before ends after sample course
6 | 2018-05-10 | 2018-05-14 //starts and ends before sample course
7 | 2018-06-15 | 2018-06-20 //starts and ends after sample course
Rooms Table(我们这里不需要这个,只是为了完整性)
ROOM_TYPE_ID| ROOM_CAPACITY| ROOM_LOCATION
1 | 1 | HILL
2 | 2 | HILL
3 | 1 | OCEAN
4 | 2 | OCEAN
输出(仅显示 course_id 1,所有课程都需要)
COURSE_ID | ROOMTYPE | COURSE_STUDENT | OTHER_STUDENTS
1 | 1 | 2 | 3 //1(course 2) + 2 (course 3)
1 | 2 | 0 | 1 //1(course 5)
1 | 3 | 0 | 2 //1(course 3) + 1(course 4)
1 | 4 | 3 | 0 //no students on others
我只能找出匹配给定课程的重叠课程的条件 startDate , endDate
Courses.START_DATE <= startDate AND Courses.END_DATE >= endDate OR //matches any course spanning current course
Courses.START_DATE >= startDate AND Courses.END_DATE <= startDate OR //matches any course starting during the current course
Courses.START_DATE >= endDate AND Courses.END_DATE <= endDate //matches any course ending during the current course
除此之外,我微薄的 sql 技能,让我悲惨地失败了。我可以编写一些 java 代码来解决这个问题....但这会很蹩脚 && 效率低下。
【问题讨论】:
【参考方案1】:感谢 Fredt 为我指明了正确的方向...将主表与自身的所有记录连接到所有记录,然后根据重叠的课程日期标准进行过滤。 下面的查询效率低,但现在可以完成工作....可能会有更多优化,我想听听其他意见
SELECT
THIS_COURSE.COURSE_ID,
OTHER_COURSE.ROOM_TYPE_ID,
SUM(CASE WHEN THIS_COURSE.BOOKING_ID = OTHER_COURSE.BOOKING_ID THEN OTHER_COURSE.NUMBER_OF_STUDENTS ELSE 0 END) AS COURSE_STUDENTS,
SUM(CASE WHEN THIS_COURSE.BOOKING_ID <> OTHER_COURSE.BOOKING_ID THEN OTHER_COURSE.NUMBER_OF_STUDENTS ELSE 0 END) AS OTHER_STUDENTS,
SUM(OTHER_COURSE.NUMBER_OF_STUDENTS) AS TOTAL_STUDENTS
FROM
(
SELECT
BOOKINGS.BOOKING_ID,
BOOKINGS.COURSE_ID,
BOOKINGS.NUMBER_OF_STUDENTS,
BOOKINGS.ROOM_TYPE_ID,
COURSES.START_DATE,
COURSES.END_DATE
FROM
BOOKINGS , COURSES
WHERE
BOOKINGS.COURSE_ID = COURSES.COURSE_ID
) THIS_COURSE
LEFT JOIN
(
SELECT
BOOKINGS.BOOKING_ID,
BOOKINGS.COURSE_ID,
BOOKINGS.NUMBER_OF_STUDENTS,
BOOKINGS.ROOM_TYPE_ID,
COURSES.START_DATE,
COURSES.END_DATE
FROM
BOOKINGS , COURSES
WHERE
BOOKINGS.COURSE_ID = COURSES.COURSE_ID
) OTHER_COURSE
ON
THIS_COURSE.BOOKING_ID <> OTHER_COURSE.BOOKING_ID OR
THIS_COURSE.BOOKING_ID = OTHER_COURSE.BOOKING_ID
WHERE
(THIS_COURSE.START_DATE <= OTHER_COURSE.START_DATE AND THIS_COURSE.END_DATE >= OTHER_COURSE.END_DATE) OR
(THIS_COURSE.START_DATE <= OTHER_COURSE.START_DATE AND THIS_COURSE.END_DATE >= OTHER_COURSE.START_DATE) OR
(THIS_COURSE.START_DATE <= OTHER_COURSE.END_DATE AND THIS_COURSE.END_DATE >= OTHER_COURSE.END_DATE)
GROUP BY
THIS_COURSE.COURSE_ID, OTHER_COURSE.ROOM_TYPE_ID
下面是创建示例数据的sql
CREATE TABLE Bookings(BOOKING_ID INTEGER NOT NULL PRIMARY KEY, COURSE_ID INTEGER NOT NULL, NUMBER_OF_STUDENTS INTEGER NOT NULL, ROOM_TYPE_ID INTEGER NOT NULL)
CREATE TABLE Courses(COURSE_ID INTEGER NOT NULL PRIMARY KEY, START_DATE DATE, END_DATE DATE)
CREATE TABLE Rooms(ROOM_TYPE_ID INTEGER NOT NULL PRIMARY KEY, ROOM_CAPACITY INTEGER NOT NULL, ROOM_LOCATION VARCHAR(25))
INSERT INTO Bookings VALUES( 10 , 2 , 1 , 1 )
INSERT INTO Bookings VALUES( 20 , 1 , 2 , 1 )
INSERT INTO Bookings VALUES( 30 , 3 , 1 , 3 )
INSERT INTO Bookings VALUES( 40 , 1 , 3 , 4 )
INSERT INTO Bookings VALUES( 50 , 5 , 1 , 2 )
INSERT INTO Bookings VALUES( 60 , 6 , 2 , 2 )
INSERT INTO Bookings VALUES( 70 , 3 , 2 , 1 )
INSERT INTO Bookings VALUES( 80 , 4 , 1 , 3 )
INSERT INTO Bookings VALUES( 90 , 7 , 1 , 4 )
INSERT INTO Courses VALUES( 1 ,'2018-05-15', '2018-06-14' )
INSERT INTO Courses VALUES( 2 ,'2018-05-11', '2018-05-20' )
INSERT INTO Courses VALUES( 3 ,'2018-05-18', '2018-05-22' )
INSERT INTO Courses VALUES( 4 ,'2018-05-20', '2018-06-20' )
INSERT INTO Courses VALUES( 5 ,'2018-05-10', '2018-06-20' )
INSERT INTO Courses VALUES( 6 ,'2018-05-10', '2018-05-14' )
INSERT INTO Courses VALUES( 7 ,'2018-06-15', '2018-06-20' )
INSERT INTO Rooms VALUES( 1 , 1 , 'HILL')
INSERT INTO Rooms VALUES( 2 , 2 , 'HILL')
INSERT INTO Rooms VALUES( 3 , 1 , 'OCEAN')
INSERT INTO Rooms VALUES( 4 , 2 , 'OCEAN')
【讨论】:
查询看起来不错。您可以使用 WITH 子句来分解重复的内部查询。您还可以对日期使用 OVERLAPS 谓词。例如PERIOD THIS_COURSE.START_DATE, THIS_COURSE.END_DATE + 1 DAY) OVERLAPS PERIOD (OTHER_COURSE_START_DATE, OTHER_COURSE_END_DATE + 1 DAY)
。请注意,结束时间段是专有的,因此您需要添加 1 DAY
感谢 fredt,我将尝试合并 OVERLAPS 谓词,但是我无法理解您关于 WITH 子句删除 THIS_COURSE 和 OTHER_COURSE 块的建议,您能否进一步详细说明。
我使用您的示例数据重新编写了查询【参考方案2】:
您实际上想要每门课程的每种类型房间所需的房间数量。因此,您需要从 COURSES 表开始并将其与其他两个表连接起来。
SELECT * FROM COURSES JOIN BOOKINGS USING (COURSE_ID) JOIN ROOMS USING (ROOM_TYPE_ID)
这会为您提供所有房间预订的长列表。然后,您可以将此表视为子查询表,并根据日期期间将其连接到自身。
WITH ROOM_BOOKINGS AS (
SELECT
BOOKINGS.BOOKING_ID,
BOOKINGS.COURSE_ID,
BOOKINGS.NUMBER_OF_STUDENTS,
BOOKINGS.ROOM_TYPE_ID,
COURSES.START_DATE,
COURSES.END_DATE,
ROOMS.ROOM_CAPACITY
FROM
COURSES JOIN BOOKINGS USING (COURSE_ID) JOIN ROOMS USING (ROOM_TYPE_ID)
)
SELECT * FROM ROOM_BOOKINGS THIS_COURSE LEFT JOIN ROOM_BOOKINGS OTHER_COURSE
ON (THIS_COURSE.START_DATE, THIS_COURSE.END_DATE + 1 DAY) OVERLAPS (OTHER_COURSE.START_DATE, OTHER_COURSE.END_DATE + 1 DAY)
AND THIS_COURSE.ROOM_TYPE_ID = OTHER_COURSE.ROOM_TYPE_ID
AND THIS_COURSE.COURSE_ID <> OTHER_COURSE.COURSE_ID
您需要完成上述查询,并在 SELECT 中添加条件以仅返回一门课程。您还需要 GROUP BY A.COURSE_ID、A.ROOM_TYPE_ID、A.NUMBER_OF_STUDENTS、... 和 SUM(B.NUMBER_OF_STUDENS) 来实现所需的输出。
如您所见,编写高级 SQL 查询并非易事,需要对 SQL 语言有很好的了解。
【讨论】:
以上是关于使用分组查询自联接的主要内容,如果未能解决你的问题,请参考以下文章