为啥这个 WHERE 子句会使我的查询慢 180 倍?
Posted
技术标签:
【中文标题】为啥这个 WHERE 子句会使我的查询慢 180 倍?【英文标题】:Why does this WHERE clause make my query 180 times slower?为什么这个 WHERE 子句会使我的查询慢 180 倍? 【发布时间】:2012-11-15 20:30:48 【问题描述】:以下查询在 1.6 秒内执行
SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
在最后的 WHERE 子句中添加 AND shops.shop_id=86
会使查询在 292 秒内执行:
SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id AND
shops.shop_id=86
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
我原以为用AND shops.shop_id=86
限制shops 表会减少执行时间。相反,执行时间似乎取决于 products 表中 products.shop_id 等于指定的 shop.shop_id 的行数。 products.shop_id=86 的 products 表中大约有 34K 行,执行时间为 292 秒。 products.shop_id=50 大约有 28K 行,执行时间为 210 秒。 products.shop_id=175,大约有2K行,执行时间为2.8秒。怎么回事?
EXPLAIN EXTENDED 为 1.6 秒查询是:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1203 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 1203 100.00
3 DERIVED sex ALL product_id_2,product_id NULL NULL NULL 526846 75.00 Using where; Using temporary; Using filesort
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED shops eq_ref PRIMARY PRIMARY 4 mydatabase.p1.shop_id 1 100.00
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
显示此 EXPLAIN EXTENDED 的警告是
-----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(@num:=if(((@current_shop_id) = `testtable`.`shop_id`),if(((@current_product_id) = `testtable`.`product_id`),(@num),((@num) + 1)),0)) AS `row_number`,(@current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(@current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select `mydatabase`.`shops`.`shop` AS `shop`,`mydatabase`.`shops`.`shop_id` AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`fav4`.`product_id` = `mydatabase`.`sex`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`mydatabase`.`shops`.`shop_id` = `mydatabase`.`p1`.`shop_id`) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by `mydatabase`.`shops`.`shop`,`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+------
292 秒查询的 EXPLAIN EXTENDED 是:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 36 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 36 100.00
3 DERIVED shops const PRIMARY PRIMARY 4 1 100.00 Using temporary; Using filesort
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED sex eq_ref product_id_2,product_id product_id_2 5 mydatabase.p1.product_id 1 100.00 Using where
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
显示此 EXPLAIN EXTENDED 的警告是
----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(@num:=if(((@current_shop_id) = `testtable`.`shop_id`),if(((@current_product_id) = `testtable`.`product_id`),(@num),((@num) + 1)),0)) AS `row_number`,(@current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(@current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select 'shop.nordstrom.com' AS `shop`,'86' AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`fav4`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`p1`.`shop_id` = 86) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by 'shop.nordstrom.com',`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+-----
我正在运行 mysql 客户端版本:5.1.56。 shop 表在 shop_id 上有一个主索引:
Action Keyname Type Unique Packed Column Cardinality Collation Null Comment
Edit Drop PRIMARY BTREE Yes No shop_id 163 A
我已经分析了商店表,但这没有帮助。
我注意到,如果我删除 LEFT JOIN
,执行时间的差异会下降到 0.12 秒而不是 0.28 秒。
Cez 的解决方案,即使用 1.6 秒版本的查询并通过将 rowed_results.shop_dummy=86
添加到外部查询(如下所示)来删除不相关的结果,执行时间为 1.7 秒。这规避了这个问题,但为什么 292 秒查询如此缓慢仍然是个谜。
SET @num :=0, @current_shop_id := NULL, @current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, @num := IF( @current_shop_id = shop_id, IF(@current_product_id=product_id,@num,@num+1), 0) AS row_number, @current_shop_id := shop_id AS shop_dummy, @current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND sex.sex=0
INNER JOIN shops ON shops.shop_id = p1.shop_id
WHERE sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7) AND
rowed_results.shop_dummy=86;
【问题讨论】:
“EXPLAIN EXTENDED ..EXPLAIN
的输出,看看有什么不同。此外,单个 WHERE
子句中的所有列都应该放在一个键中。
@Cez @feela 感谢您的提醒,我已在问题中添加了EXPLAIN EXTENDED
查询。 SHOW WARNINGS
没有产生任何结果。
@feela 这是否意味着我应该将AND shops.shop_id=86
资格放在商店INNER JOIN
而不是WHERE
子句中?这就是我最初制定查询的方式,但执行时间是相同的。
@jela 在 INNER JOIN 中使用 p1.shop_id=86 或 shop.shop_id=86 是否有任何改变?另外,请发布 SHOW WARNINGS 的输出,以便优化查询可用
【参考方案1】:
在聊天室之后,实际创建表/列来匹配查询,我想出了以下查询。
我已经开始对性别、产品(对于 shop_id)和收藏夹表进行最内部的查询。由于您描述了 ShopA 的 ProductX = Product ID = 1 但 ShopB 的 ProductX = 产品 ID = 2(仅作为示例),因此每个商店的每个产品始终是唯一的,并且从不重复。也就是说,我可以在此查询中获取产品和 shop_id 以及收藏夹数量(如果有),但仅对 product_id 进行分组 .. 因为我使用 MAX() 的每个产品都不会更改 shop_id。由于您总是按“昨天”和性别(sex=0 女性)的日期查找,因此我会将 SEX 表编入索引(日期、性别、product_id)...我猜您不会每添加 1000 个项目天...产品显然会在 product_id(主键)上有一个索引,而收藏夹应该在 product_id 上有一个索引。
根据该结果(别名“sxFav”),我们可以通过该“Product_ID”直接连接到性别和产品表,以获取您可能想要的任何其他信息,例如商店名称、添加产品的日期、产品描述等。然后,此结果按产品销售的 shop_id、日期和最后的产品 ID 排序(但您可以考虑在内部查询中获取描述列并将其用作排序依据)。这导致别名“PreQuery”。
由于商店的订单都是正确的,我们现在可以添加@MySQLVariable 引用来为每个产品分配一个类似于您最初尝试的行号。但是,只有在商店 ID 更改时才会重置回 1。
SELECT
PreQuery.*,
@num := IF( @current_shop_id = PreQuery.shop_id, @num +1, 1 ) AS RowPerShop,
@current_shop_id := PreQuery.shop_id AS shop_dummy
from
( SELECT
sxFav.product_id,
sxFav.shop_id,
sxFav.Favorites_Count
from
( SELECT
sex.product_id,
MAX( p.shop_id ) shop_id,
SUM( CASE WHEN F.current = 1 AND F.closeted = 1 THEN 1
WHEN F.current = 1 AND F.closeted = 0 THEN -1
ELSE 0 END ) AS favorites_count
from
sex
JOIN products p
ON sex.Product_ID = p.Product_ID
LEFT JOIN Favorites F
ON sex.product_id = F.product_ID
where
sex.date >= subdate( now(), interval 1 day)
and sex.sex = 0
group by
sex.product_id ) sxFav
JOIN sex
ON sxFav.Product_ID = sex.Product_ID
JOIN products p
ON sxFav.Product_ID = p.Product_ID
order by
sxFav.shop_id,
sex.date,
sxFav.product_id ) PreQuery,
( select @num :=0,
@current_shop_id := 0 ) as SQLVars
现在,如果您正在寻找特定的“分页”信息(例如每个商店 7 个条目),请将上面的 ENTIRE 查询包装成类似...
select * from ( entire query above ) where RowPerShop between 1 and 7
(或根据需要介于 8 和 14、15 和 21 等之间) 甚至
RowPerShop between RowsPerPage*PageYouAreShowing and RowsPerPage*(PageYouAreShowing +1)
【讨论】:
我一直在修补这个问题。这个查询比我的要快得多:原始查询为 0.13 秒,而我的原始查询为 3.4 秒(在比我以前使用的更大的数据集上)。我对JOIN sex ON sxFav.Product_ID = sex.Product_ID
子句感到困惑。添加此子句似乎会检索意外结果,因为查询的目的是仅检索sex.sex=0
的结果,但该子句还将添加带有sex.sex=1
的行,只要有sxFav.Product_ID = sex.Product_ID
的匹配项。我证实这增加了 11 行我的原始查询未返回的额外行。
从您的查询中删除 JOIN sex ON sxFav.Product_ID = sex.Product_ID
会导致它检索到与我的原始查询相同数量的结果(sex.sex=1
没有结果)。我也对JOIN products p ON sxFav.Product_ID = p.Product_ID
感到困惑,因为这个 JOIN 在创建 sxFav 时已经发生,所以我似乎可以从 sxFav 表中的 products 表中选择相关列。删除此子句似乎不会更改执行时间。我可能误解了这两个子句的作用。
@jela,继续聊天室吗?我会跟着。
我在chat.***.com/rooms/info/20085/…创建了聊天室
@jela,这是我在聊天中向您提到的复杂 sqlvariables 问题的链接...***.com/questions/9057820/…【参考方案2】:
您应该将 shop.shop_id=86 移动到商店的 JOIN 条件。没有理由把它放在 JOIN 之外,你冒着 MySQL JOINing 的风险,然后过滤。 JOIN 可以完成与 WHERE 子句相同的工作,尤其是在您不引用其他表的情况下。
....
INNER JOIN shops ON shops.shop_id = p1.shop_id AND shops.shop_id=86
....
性连接也是如此:
...
INNER JOIN shops ON shops.shop_id = p1.shop_id
AND sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
...
派生表很棒,但它们没有索引。通常这无关紧要,因为它们通常在 RAM 中。但是在没有索引的过滤和排序之间,事情可以加起来。
请注意,在耗时更长的第二个查询中,表处理顺序发生了变化。 shop 表在慢查询中位于顶部,p1 表在快速查询中检索 11799 行而不是 1 行。它也不再使用主键。这很可能是您的问题所在。
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
【讨论】:
我更改了 INNER JOIN 条件并删除了 WHERE 子句。查询在 283 秒内执行。我想知道如何强制慢速查询正确使用主键。 我用您建议的对 JOIN 条件的更改更新了原始问题。我还观察到查询执行时间似乎取决于 products 表中 products.shop_id 等于指定 shop.shop_id 的行数。 products.shop_id=86 的 products 表中大约有 34K 行,执行时间为 292 秒。 products.shop_id=50 大约有 28K 行,执行时间为 210 秒。 products.shop_id=175,大约有2K行,执行时间为2.8秒。我不确定如何修改查询以纠正这种行为。 由于同一查询的速度差异很大,我的第一个猜测是您的 sort_buffer_size 太小了。当它不够大时,MySQL 的性能会急剧下降。【参考方案3】:从讨论来看,查询规划器在指定较低级别的商店时表现不佳。
将rowed_results.shop_dummy=86
添加到外部查询以获取您要查找的结果。
【讨论】:
这通过使用外部查询来消除内部查询返回的不相关结果来解决问题。我将暂时保留这个问题,希望有人可以建议重新制定仅返回所需结果的内部查询,并且比我的 292 秒版本更有效地执行。以上是关于为啥这个 WHERE 子句会使我的查询慢 180 倍?的主要内容,如果未能解决你的问题,请参考以下文章
在 from 子句 *and* where 子句中添加连接条件使查询更快。为啥?
为啥这个 angularjs ui-router 代码会使我的浏览器崩溃?