优化缓慢的 MySQL 选择查询
Posted
技术标签:
【中文标题】优化缓慢的 MySQL 选择查询【英文标题】:Optimizing slow MySQL select query 【发布时间】:2015-03-08 21:29:56 【问题描述】:编辑:在查看了这里的一些答案和数小时的研究后,我的团队得出的结论是,很可能没有办法比我们能够实现的 4.5 秒进一步优化(除非可能对 offer_clicks 进行分区,但这会产生一些难看的副作用)。最终,经过大量头脑风暴,我们决定拆分两个查询,创建两组用户 ID(一组来自 users 表,一组来自 offer_clicks),并将它们与 Python 中的集合进行比较。 users 表中的 id 集仍然是从 SQL 中提取的,但我们决定将 offer_clicks 移至 Lucene 并在其上添加一些缓存,因此现在可以从中提取另一组 id。最终结果是,有缓存的时间缩短到半秒左右,没有缓存的时间缩短到 0.9 秒。
原帖开头:我无法优化查询。查询的第一个版本很好,但是在第二个查询中加入 offer_clicks 时,查询变得相当慢。 users 表包含 1000 万行,offers_clicks 包含 5300 万行。
可接受的性能:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
不好:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
在没有指定任何索引的情况下,它看起来是这样的(更糟糕的是):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
解释:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
解释而不指定任何索引:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
这是我尝试过的一大堆索引,但没有太多成功:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
简化的用户架构:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
简化的优惠点击架构:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
【问题讨论】:
请发布您的架构! 请注意,DISTINCT 不是函数 草莓,+1。在 distinct 之后使用的括号被简单地忽略。distinct(user.id)
比 distinct user.id
更好,因为“distinct 不是函数”
请问两张表的记录数是多少?
1000 万用户,5300 万 offer_clicks
【参考方案1】:
这是您的查询:
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc JOIN
users u
ON oc.user_id = u.id
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49;
首先,您可以考虑将查询编写为:而不是count(distinct)
:
SELECT count(*) AS count_1
FROM users u
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND
EXISTS (SELECT 1
FROM offers_clicks oc
WHERE oc.user_id = u.id AND
oc.date > '2015-02-14' AND
oc.ranking_score > 0.24 AND oc.ranking_score < 3.49
)
那么,此查询的最佳索引是:users(country, last_active, id)
和 offers_clicks(user_id, date, ranking_score)
或 offers_clicks(user_id, ranking_score, date)
。
【讨论】:
我用 users(country, last_active) 和 offer_clicks(user_id, date,ranking_score) 尝试了这个。速度差不多。一组中的 1 行(6.45 秒)。 id 在 users 表的复合索引中有多重要?我想了解如何影响查询。明天我可以尝试在 (country, last_active 和 id) 上添加一个索引,看看这会如何影响事情。 您可以尝试使用= 'US'
而不是in
进行查询吗?这可能会阻止索引的最佳使用。 user_id
并不那么重要。它只是允许索引成为覆盖索引,因此引擎不必从数据页中获取数据。
谢谢戈登;明天我将尝试将“id”添加到用户表的复合索引中。我之前也试过 = 'US';似乎根本没有对任何差异产生太大影响(没有完全对其进行基准测试,但速度似乎大致相同)。
在 users 表中添加了涵盖 country、last_active 和 id 的索引,不幸的是它并没有太大的区别。 SELECT count(*) AS count_1 FROM users u USE INDEX (country_3) WHERE u.country = 'US' AND u.last_active > '2015-02-26' AND EXISTS (SELECT 1 FROM offer_clicks oc USE INDEX (user_id_3) WHERE oc .user_id = u.id AND oc.date > '2015-02-14' AND oc.ranking_score > 0.24 AND oc.ranking_score
在(country, last_active)
索引中提供不带和带id
的EXPLAIN SELECT...
。如果表是 InnoDB,它们将可能相同。这是因为 PRIMARY KEY 以静默方式附加到每个辅助键。【参考方案2】:
SELECT count(distinct u.id) AS count_1
FROM users u
STRAIGHT_JOIN offers_clicks oc
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您对用户有索引 - (id
,last_active
,country
) 列
和 offer_clicks - (user_id
,date
,ranking_score
)
或者你可以颠倒顺序
SELECT count(distinct u.id) AS count_1
FROM offers_clicks oc
STRAIGHT_JOIN users u
ON oc.user_id = u.id
WHERE
u.country IN ('US')
AND u.last_active > '2015-02-26'
AND oc.date > '2015-02-14'
AND oc.ranking_score > 0.24
AND oc.ranking_score < 3.49;
确保您在 offer_clicks - (user_id
) 列上有索引
和用户 - (id
,last_active
,country
)
【讨论】:
【参考方案3】:SELECT count(users.id) AS count_1
FROM users
INNER JOIN
(SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
) as clicks
ON clicks.user_id = users.id
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
你能给sqlfiddle提供一些数据吗?
你能告诉我这个查询的执行时间是多少:
SELECT
DISTINCT user_id
FROM
offers_clicks
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
编辑问题 这个需要多长时间?
SELECT
DISTINCT user_id
FROM
offers_clicks USE INDEX (user_id_4)
WHERE offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
【讨论】:
我明天会尝试设置 sqlfiddle。仅 offer_clicks 的执行时间约为 4-5 秒,几乎与包括用户在内的查询一样慢(运行时间约为 5-6 秒,比原始查询快约 1-2 秒)。 这里是对 offer_clicks 查询的解释:| 1 |简单 |优惠点击次数 |范围 |日期,date_2,date_3,ranking_score |日期_2 | 8 |空 | 2738102 |使用哪里;使用临时 | 但它会带来正确的结果吗?它比你之前的(17-18)更好(5-6)?所以现在我只需要改进它以获得不到 1 秒? @MathijsdeJong colud 你提供带有offers_clicks
的sqlfiddle 和1 000-10 000 条记录好吗?或者只是一个带有导出表的 .sql 文件?
感谢您的努力,但 5-6s(您的,以及其他一些帖子)或 6-7s(原始)秒都非常重。我最初是在寻找 sub 1,但我的猜测是,offers_clicks 表太大了,无法对其进行任何有意义的查询......恐怕我问的是不可能的。【参考方案4】:
换一种方式试试:
SELECT COUNT(users.id)
FROM users, offers_clicks
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
【讨论】:
【参考方案5】:首先我也认为你应该使用join,并尝试只加入你在结果中真正需要的行。 至于表 offer_clicks 我认为您不应该使用索引 user_id_3 并使用 user_id_2 因为 user_id_2 的基数高于 user_id_3 的基数(根据您的索引) 而且应该更快。
SELECT
count(distinct(users.id)) AS count_1
FROM users USE INDEX (country_2)
JOIN offers_clicks USE INDEX (user_id_2)
ON offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24
WHERE users.country = 'US' AND users.last_active > '2015-02-26'
;
对于这个查询,您不需要更改表,这就是我认为您可以尝试的原因。 尝试减少日期范围可能会有所帮助,并且由于减少结果中的行数,它应该会更快。
不确定我是否会有所帮助...
【讨论】:
【参考方案6】:试试这个:
SELECT count(distinct users.id) AS count_1
FROM users USE index (<see below>)
JOIN offers_clicks USE index (<see below>)
ON offers_clicks.user_id = users.id
AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE
AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49
WHERE users.country = 'US'
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE
确保users(country, last_active, id)
和offers_clicks(user_id, ranking_score, date)
和USE
上有索引。
让我知道它是如何执行的,如果它有效,我会解释原因。
【讨论】:
以上是关于优化缓慢的 MySQL 选择查询的主要内容,如果未能解决你的问题,请参考以下文章