在 WHERE 子句中使用 OR 的慢速 JOIN 查询 - 缺少可能的索引?
Posted
技术标签:
【中文标题】在 WHERE 子句中使用 OR 的慢速 JOIN 查询 - 缺少可能的索引?【英文标题】:Slow JOIN Query with OR in WHERE Clause - Missing Possible Indexes? 【发布时间】:2017-09-15 21:16:23 【问题描述】:我正在尝试检索关于属于特定用户的“案例”的分页列表和“通知”总数。
通知有几个条件是“未锁定”、“非私有”、“尚未看到”,应该返回 # found,然后按创建日期降序排列。
最后一个条件是通知不是由用户自己创建的,或者通知的类型是“行为”(枚举)并且在通知“ref_id”中引用了 user_id
此查询需要 5 秒以上才能针对 recent_changes 中的 200k 行和 cases
中少于 4k 的行和 50 个用户运行。
+-----+
| cnt |
+-----+
| 13 |
+-----+
1 row in set (4.67 sec)
这个查询可以自己优化,还是需要重组?
SELECT count(*) as cnt
FROM recent_changes rc
LEFT JOIN `case` c on c.id = rc.case_id
LEFT JOIN `user` u on u.id = rc.user_id
WHERE
(
rc.user_id != c.user_id AND c.user_id = '25'
OR
(rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N' AND rc.private != 'Y'
AND seen = 'false'
ORDER BY rc.datecreated DESC;
解释输出
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
| 1 | SIMPLE | c | ALL | PRIMARY,user_user_id_idx | NULL | NULL | NULL | 3699 | Using where; Using temporary |
| 1 | SIMPLE | rc | ref | idx_recent_changes_case | idx_recent_changes_case | 5 | xxxxxxxxxxxxx.c.id | 25 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | xxxxxxxxxxxxx.rc.user_id | 1 | Using index |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
recent_changes 的索引:
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| recent_changes | 0 | PRIMARY | 1 | id | A | 182807 | NULL | NULL | | BTREE | |
| recent_changes | 1 | recent_changes_user_id_idx | 1 | user_id | A | 96 | NULL | NULL | YES | BTREE | |
| recent_changes | 1 | idx_recent_changes_user_case | 1 | user_id | A | 92 | NULL | NULL | YES | BTREE | |
| recent_changes | 1 | idx_recent_changes_user_case | 2 | case_id | A | 18280 | NULL | NULL | YES | BTREE | |
| recent_changes | 1 | idx_recent_changes_case | 1 | case_id | A | 7312 | NULL | NULL | YES | BTREE | |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
case
表上的索引:
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| case | 0 | PRIMARY | 1 | id | A | 3753 | NULL | NULL | | BTREE | |
| case | 1 | id_idx | 1 | member_id | A | 3753 | NULL | NULL | YES | BTREE | |
| case | 1 | user_user_id_idx | 1 | user_id | A | 2 | NULL | NULL | YES | BTREE | |
| case | 1 | case_ha_id | 1 | health_authority_id | A | 28 | NULL | NULL | YES | BTREE | |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
它在概念上做了以下事情:
在recent_changes 中查找最近的行,其中:
i) recent_changes 行通过当前 user_id 拥有的 case_id 连接到 case
表
ii) 并且最近的更改行不是由当前用户 ID 创建的
或
i)recent_changes 行是“行为”类型,当前 user_id 在recent_changes.ref_id 列中
如果我删除“OR (rc.type = 'conduct' AND rc.ref_id = '25')”条件,我会得到
如果我删除“rc.user_id != c.user_id AND c.user_id = '25' OR”条件,它仍然需要大约 5 秒才能完成。
编辑
更改连接顺序缩短了 1/2 秒,尽管我无法在 rc
.case_id 上加入 case
,直到我先加入 rc
:'where 子句中的未知列'rc.user_id' '。
新查询:
SELECT count(*) as cnt
FROM `user` u
LEFT JOIN `recent_changes` rc on u.id = rc.user_id
LEFT JOIN `case` c on c.id = rc.case_id
WHERE
(
rc.user_id != c.user_id AND c.user_id = '25'
OR
(rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N' AND rc.private != 'Y'
AND seen = 'false'
ORDER BY rc.datecreated DESC;
删除“ORDER BY”子句似乎不会增加新的连接顺序查询,尽管我现在更清楚它对性能的影响。
使用 UNION 并没有更快,但单独运行每个选择指出第一个 SELECT 只需要 0.3 秒,而第二个选择超过 4 秒:
select count(*) as cnt
FROM (
SELECT count(*) FROM `user` u
LEFT JOIN `recent_changes` rc on u.id = rc.user_id
LEFT JOIN `case` c on c.id = rc.case_id
WHERE rc.user_id != c.user_id AND c.user_id = '25'
AND c.locked = 'N' AND rc.private != 'Y'
AND seen = 'false'
UNION ALL
SELECT count(*) as cnt
FROM `user` u
LEFT JOIN `recent_changes` rc on u.id = rc.user_id
LEFT JOIN `case` c on c.id = rc.case_id
WHERE rc.type = 'conduct' AND rc.ref_id = '25'
AND c.locked = 'N' AND rc.private != 'Y'
AND seen = 'false') x
我认为根据 EXPLAIN,recent_changes rc
表没有必要的索引:
EXPLAIN SELECT count(*) FROM `user` u LEFT JOIN `recent_changes` rc on u.id = rc.user_id LEFT JOIN `case` c on c.id = rc.case_id WHERE rc.user_id != c.user_id AND c.user_id = '25' AND c.locked = 'N' AND rc.private != 'Y' AND seen = 'false';
在
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | c | ref | PRIMARY,user_user_id_idx | user_user_id_idx | 5 | const | 383 | Using where |
| 1 | SIMPLE | rc | ref | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5 | hsaedmp_jason.c.id | 20 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | hsaedmp_jason.rc.user_id | 1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
运行时间 > 4 秒
EXPLAIN SELECT count(*) as cnt FROM `user` u LEFT JOIN `recent_changes` rc on u.id = rc.user_id LEFT JOIN `case` c on c.id = rc.case_id WHERE rc.type = 'conduct' AND rc.ref_id = '25' AND c.locked = 'N' AND rc.private != 'Y' AND seen = 'false';
Key = NULL 这不好。
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | c | ALL | PRIMARY | NULL | NULL | NULL | 3797 | Using where |
| 1 | SIMPLE | rc | ref | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5 | hsaedmp_jason.c.id | 20 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | hsaedmp_jason.rc.user_id | 1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
我很困惑,EXPLAIN 显示 case
表没有使用键,但似乎 recent_changes
表是需要在 ref_id
列上具有索引的表?
这是带有该索引的说明,这里看起来好多了,但我还不能在生产环境中对其进行测试。
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys
| key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
| 1 | SIMPLE | rc | NULL | ref | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case,idx_recent_changes_case_date,idx_recent_changes_r
ef | idx_recent_changes_ref | 5 | const | 2096 | 3.12 | Using where |
| 1 | SIMPLE | u | NULL | eq_ref | PRIMARY
| PRIMARY | 4 | hsaedmp_jason.rc.user_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY
| PRIMARY | 4 | hsaedmp_jason.rc.case_id | 1 | 50.00 | Using where |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
更新
我使用 UNION 语句重新设计了查询,更改了 JOIN 顺序,并通过在 recent_changes
表上添加复合索引一起使查询响应时间小于 10 毫秒。
这是使用 UNION 语句的新查询。
select count(*) as num
FROM (
(
SELECT rc1.*
FROM `user` u1
LEFT JOIN `recent_changes` rc1 on u1.id = rc1.user_id
LEFT JOIN `case` c1 on c1.id = rc1.case_id
WHERE
(rc1.user_id != c1.user_id AND c1.user_id = '1')
AND c1.locked = 'Y'
AND rc1.private != 'Y'
AND seen = 'false'
ORDER BY rc1.datecreated DESC
)
UNION
(
SELECT rc.*
FROM `user` u
LEFT JOIN `recent_changes` rc on u.id = rc.user_id
LEFT JOIN `case` c on c.id = rc.case_id
WHERE
(rc.type = 'conduct' AND rc.ref_id = '1')
AND c.locked = 'Y'
AND rc.private != 'Y'
AND seen = 'false'
ORDER BY rc.datecreated DESC
)
) x;
以及我根据我需要的最终查询创建的索引。
ALTER TABLE recent_changes ADD INDEX idx_recent_changes_notification (type, ref_id, private, seen, user_id);
感谢大家的意见!
【问题讨论】:
OR
是 mysql 中的性能杀手。尝试将其拆分为两个查询,并与 UNION
结合使用。
另外,运行 EXPLAIN EXTENDED 后跟 SHOW WARNINGS。它将揭示有关 MySQL 如何解释您的 JOIN 的有用信息
尝试在recent_changes (type, ref_id, private, user_id)
上创建一个复合索引,这就是所谓的复合覆盖索引,将有助于加速您的 UNION 的第二部分。请edit您的问题让我们知道它是否有帮助。在尝试修复性能问题时,在表中放置大量单列索引通常是有害的。
而且,如果您希望获得有关修改查询以提高效率的建议,如果您 edit 您的问题能够解释您要计算的内容,将会有所帮助。
在recent_changes 上创建键并使用union 重新处理查询已使该查询现在达到
【参考方案1】:
较小的表应该放在连接子句的第一个。 这取决于表中有多少条记录。我认为您的用户表是最小的。所以先放吧。似乎“rc”表是最大的一个。你应该把它放在加入的最后。
这是一个例子。
SELECT count(*) as cnt
FROM `user` u
LEFT JOIN `case` c on c.id = rc.case_id
LEFT JOIN `recent_changes` on u.id = rc.user_id
WHERE
(
rc.user_id != c.user_id AND c.user_id = '25'
OR
(rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N' AND rc.private != 'Y'
AND seen = 'false'
ORDER BY rc.datecreated DESC;
另外,请参阅下面的帖子。这是 mssql 的东西,但几乎所有 DBMS 都有相同的点
https://www.mssqltips.com/sqlservertutorial/3201/how-join-order-can-affect-the-query-plan/
更新
我查看了您的问题,发现了另一个嫌疑人,这是关于 order by 条款的问题。 由于查询返回的行数很多,“order by”的时间成本将显着增加。根据我的经验,这是一个经常出现的问题。您是否尝试过删除 order by 子句?是不是快了很多?
见Why is this INNER JOIN/ORDER BY mysql query so slow?
【讨论】:
以上是关于在 WHERE 子句中使用 OR 的慢速 JOIN 查询 - 缺少可能的索引?的主要内容,如果未能解决你的问题,请参考以下文章
使用 wpdb prepare 安全地收集带有“join”的“where”子句数组
在 WHERE 子句中使用连接列时,Mysql 未在 LEFT JOIN 中使用索引
在 from 子句或 where 子句中进行 equi join 是不是更好