DISTINCT INNER JOIN 慢
Posted
技术标签:
【中文标题】DISTINCT INNER JOIN 慢【英文标题】:DISTINCT INNER JOIN slow 【发布时间】:2013-11-11 11:25:54 【问题描述】:我已经编写了以下 PostgreSQL 查询,它可以正常工作。但是,它似乎非常慢,有时需要长达 10 秒才能返回结果。我确信我的陈述中有一些东西导致它变慢了。
谁能帮助确定为什么这个查询很慢?
SELECT DISTINCT ON (school_classes.class_id,attendance_calendar.school_date)
school_classes.class_id, school_classes.class_name, school_classes.grade_id
, school_gradelevels.linked_calendar, attendance_calendars.calendar_id
, attendance_calendar.school_date, attendance_calendar.minutes
, teacher_join_classes_subjects.staff_id, staff.first_name, staff.last_name
FROM school_classes
INNER JOIN school_gradelevels ON school_gradelevels.id=school_classes.grade_id
INNER JOIN teacher_join_classes_subjects ON teacher_join_classes_subjects.class_id=school_classes.class_id
INNER JOIN staff ON staff.staff_id=teacher_join_classes_subjects.staff_id
INNER JOIN attendance_calendars ON attendance_calendars.title=school_gradelevels.linked_calendar
INNER JOIN attendance_calendar ON attendance_calendar.calendar_id=attendance_calendars.calendar_id
WHERE teacher_join_classes_subjects.syear='2013'
AND staff.syear='2013'
AND attendance_calendars.syear='2013'
AND teacher_join_classes_subjects.does_attendance='Y'
AND teacher_join_classes_subjects.subject_id IS NULL
AND attendance_calendar.school_date<CURRENT_DATE
AND attendance_calendar.school_date NOT IN (
SELECT com.school_date FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
我将NOT IN
替换为以下内容:
AND NOT EXISTS (
SELECT com.school_date
FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND com.school_date=attendance_calendar.school_date
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
解释分析的结果:
唯一(成本=2998.39..2998.41 行=3 宽度=85)(实际时间=10751.111..10751.118 行=1 循环=1) -> 排序(成本=2998.39..2998.40 行=3 宽度=85)(实际时间=10751.110..10751.110 行=2 循环=1) 排序键:school_classes.class_id、出席日历.school_date 排序方法:快速排序内存:25kB -> Hash Join (cost=2.03..2998.37 rows=3 width=85) (实际时间=6409.471..10751.045 rows=2 loops=1) 哈希条件:((teacher_join_classes_subjects.class_id = school_classes.class_id) AND (school_gradelevels.id = school_classes.grade_id)) 加入过滤器:(不是(子计划 1)) -> 嵌套循环(成本=0.00..120.69 行=94 宽度=81)(实际时间=2.468..1187.397 行=26460 循环=1) 加入过滤器:(attendance_calendars.calendar_id = admission_calendar.calendar_id) -> 嵌套循环(成本=0.00..42.13 行=1 宽度=70)(实际时间=0.087..3.247 行=735 循环=1) 加入过滤器:((attendance_calendars.title)::text = (school_gradelevels.linked_calendar)::text) -> 嵌套循环(成本=0.00..40.80 行=1 宽度=277)(实际时间=0.077..1.005 行=245 循环=1) -> 嵌套循环(成本=0.00..39.61 行=1 宽度=27)(实际时间=0.064..0.572 行=49 循环=1) -> 对teacher_join_classes_subjects 的序列扫描(成本=0.00..10.48 行=4 宽度=14)(实际时间=0.022..0.143 行=49 循环=1) 过滤器:((subject_id IS NULL) AND (syear = 2013::numeric) AND ((does_attendance)::text = 'Y'::text)) -> 使用staff_pkey 对staff 进行索引扫描(成本=0.00..7.27 行=1 宽度=20)(实际时间=0.006..0.007 行=1 循环=49) 指数条件:(staff.staff_id = teacher_join_classes_subjects.staff_id) 过滤器:(staff.syear = 2013::numeric) -> 在出勤日历上进行 Seq 扫描(成本=0.00..1.18 行=1 宽度=250)(实际时间=0.003..0.006 行=5 循环=49) 过滤器:(attendance_calendars.syear = 2013::numeric) -> Seq Scan on school_gradelevels(成本=0.00..1.15 行=15 宽度=11)(实际时间=0.001..0.005 行=15 循环=245) -> 在出勤_日历上进行 Seq 扫描(成本=0.00..55.26 行=1864 宽度=18)(实际时间=0.003..1.129 行=1824 循环=735) 过滤器:(attendance_calendar.school_date Hash (cost=1.41..1.41 rows=41 width=18) (实际时间=0.040..0.040 rows=41 loops=1) -> Seq Scan on school_classes (cost=0.00..1.41 rows=41 width=18) (实际时间=0.006..0.015 rows=41 loops=1) 子计划 1 -> 在出勤_完成的 com 上进行 Seq 扫描(成本=0.00..958.28 行=5 宽度=4)(实际时间=0.228..5.411 行=17 循环=1764) 过滤器: ((class_id = $0) AND (((period_id = 101::numeric) AND ($1 >= 151::numeric)) OR ((period_id = 95::numeric) AND ($1 = 150::numeric)) ))【问题讨论】:
而不是 NOT IN,如果我 DO AND NOT EXISTS,那么整个事情运行得非常快,所以我假设 NOT IN 语句中有问题。有什么建议吗? 我已经通过使用 NOT EXISTS 而不是使用 NOT IN 解决了这个问题。它现在变得超级快。 你真的得到同样的结果吗?我相信 NOT EXISTS 只是检查“内部”查询是否返回任何行。由于语法错误,仅在查询中将 NOT IN 更改为 NOT EXISTS 实际上应该不起作用。您能否将 EXPLAIN ANALYZE 的结果粘贴到您的原始查询中? 感谢您的回复 Petter,我已经用 EXPLAIN ANALYZE 结果更新了它。并且还包括似乎有帮助的 NOT EXISTS 语句。 【参考方案1】:NOT EXISTS
是一个很好的选择。几乎总是比NOT IN
好。 More details here.
我稍微简化了您的查询(通常看起来不错):
SELECT DISTINCT ON (c.class_id, a.school_date)
c.class_id, c.class_name, c.grade_id
,g.linked_calendar, aa.calendar_id
,a.school_date, a.minutes
,t.staff_id, s.first_name, s.last_name
FROM school_classes c
JOIN teacher_join_classes_subjects t USING (class_id)
JOIN staff s USING (staff_id)
JOIN school_gradelevels g ON g.id = c.grade_id
JOIN attendance_calendars aa ON aa.title = g.linked_calendar
JOIN attendance_calendar a ON a.calendar_id = aa.calendar_id
WHERE t.syear = 2013
AND s.syear = 2013
AND aa.syear = 2013
AND t.does_attendance = 'Y' -- looks like it should be boolean!
AND t.subject_id IS NULL
AND a.school_date < CURRENT_DATE
AND NOT EXISTS (
SELECT 1
FROM attendance_completed x
WHERE x.class_id = c.class_id
AND x.school_date = a.school_date
AND (x.period_id = 101 AND a.minutes >= 151 OR -- actually numbers?
x.period_id = 95 AND a.minutes = 150)
)
ORDER BY c.class_id, a.school_date, ???
似乎缺少的是ORDER BY
which should accompany your DISTINCT ON
。添加更多 ORDER BY
项目来代替 ???
。如果有重复项可供选择,您可能需要定义 which 来选择。
Numeric literals 不需要单引号,boolean
值应该这样编码。
您可能想重新访问chapter about data types。
【讨论】:
感谢您花时间添加额外的信息和链接,不知道这个。以上是关于DISTINCT INNER JOIN 慢的主要内容,如果未能解决你的问题,请参考以下文章
Knex.js INNER JOIN 结果的 DISTINCT
即使使用 INNER JOIN 而不是 IN,MySQL 查询也非常慢
表的基本查询语句及使用连表(inner joinleft join)子查询