需要帮助来提高 MYSQL 子查询性能
Posted
技术标签:
【中文标题】需要帮助来提高 MYSQL 子查询性能【英文标题】:Need help to improve MYSQL SubQuery Performance 【发布时间】:2017-07-07 05:04:37 【问题描述】:我刚刚学习 mysql,我有这样的 MySql 子查询:
EXPLAIN EXTENDED SELECT brand_name, stars, hh_stock, hh_stock_value, sales_monthly_1, sales_monthly_2, sales_monthly_3, sold_monthly_1, sold_monthly_2,
sold_monthly_3, price_uvp, price_ecp, price_default, price_margin AS margin, vc_percent as vc, cogs, products_length, products_id, material_expenses,
MAX(price) AS products_price, SUM(total_sales) AS total_sales,
IFNULL(MAX(active_age), DATEDIFF(NOW(), products_date_added)) AS products_age, DATEDIFF(NOW(), products_date_added) AS jng_products_age,
AVG(sales_weekly) AS sales_weekly, AVG(sales_monthly) AS sales_monthly, SUM(total_sold) AS total_sold, SUM(total_returned) AS total_returned,
((SUM(total_returned)/SUM(total_sold)) * 100) AS returned_rate
FROM
(
SELECT p.products_id, jc.price, jc.price_end_customer AS price_ecp, jc.total_sales, jc.active_age, jc.sales_weekly,
jc.sales_monthly, jc.total_sold, jc.total_returned, jc.price_uvp, p.price_margin, p.vc_percent, p.material_expenses,
p.products_date_added, p.stars , pb.brand_name, p.family_id, p.products_price_default AS price_default, pl.sales_monthly_1,
pl.sales_monthly_2, pl.sales_monthly_3, pl.sold_monthly_1, pl.sold_monthly_2, pl.sold_monthly_3, pst.stock AS hh_stock,
(pst.stock * p.average_stock_value) AS hh_stock_value, pnc.products_length,
IF(ploc.cogs IS NULL OR ploc.cogs=0,
(CASE p.complexity
WHEN 'F' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2),2)
WHEN 'E' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2),2)
WHEN 'N' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2),2)
WHEN 'M' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2),2)
WHEN 'I' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2),2)
WHEN 'H' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2),2)
ELSE ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+5+0.25+2.2),2) END), ploc.cogs) AS cogs
FROM products p
LEFT JOIN jng_sp_catalog jc ON jc.products_id=p.products_id
LEFT JOIN products_description pd ON pd.products_id = p.products_id AND pd.language_id = 2
LEFT JOIN products_description2 pd2 ON pd2.products_id = p.products_id
LEFT JOIN products_brand pb ON pb.products_brand_id = p.products_brand_id
LEFT JOIN products_log pl ON pl.products_id = p.products_id
LEFT JOIN products_log_static pls ON pls.products_id=p.products_id
LEFT JOIN products_local ploc ON ploc.products_id = p.products_id
LEFT JOIN products_non_configurator pnc ON pnc.products_id = p.products_id
INNER JOIN
(
SELECT shp.products_id, CONCAT(',', GROUP_CONCAT(shp.styles_id), ',') AS styles_id
FROM styles_has_products shp GROUP BY shp.products_id HAVING styles_id NOT LIKE '%,1967,%') subquery_styles ON subquery_styles.products_id = p.products_id
LEFT JOIN products_stock_temp pst ON pst.products_id=p.products_id WHERE p.active_status='1' AND p.categories_top_id = '1') dt GROUP BY products_id ORDER BY products_id;
explain的结果是这样的:
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
| 1 | PRIMARY | p | NULL | ALL | PRIMARY | NULL | NULL | NULL | 40458 | 1.00 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | pb | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_brand_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | ploc | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | pl | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | pls | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | pst | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | pd2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | pnc | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | pd | NULL | eq_ref | PRIMARY | PRIMARY | 8 | manobo_central.p.products_id,const | 1 | 100.00 | Using index |
| 1 | PRIMARY | jc | NULL | ref | products_id | products_id | 4 | manobo_central.p.products_id | 4 | 100.00 | Using where |
| 1 | PRIMARY | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 4 | manobo_central.p.products_id | 10 | 100.00 | Using where |
| 3 | DERIVED | shp | NULL | index | PRIMARY,products_id | PRIMARY | 8 | NULL | 208226 | 100.00 | Using index; Using filesort |
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
我有选择。
-
我将删除子查询并使用 VIEWS 来输出数据,就像使用查询一样。因为我在 FROM 中有子查询,所以我将使用 VIEWS 中的 VIEWS。但也有人说会影响演出。你们怎么看?
我仍将使用子查询,但会尝试搜索如何优化查询。对于这个,我想问你们,对于EXPLAIN TABLE中的第一个结果行,它显示了类型为'all'的表生成p,如何避免'all'?我已经设法将类型“eq_ref”用于其他表,但仍然不知道为什么产品表是“全部”?
再次, 你认为我需要切换到 VIEW 吗?或者只是尝试再次优化子查询。
非常感谢!
编辑:表产品索引
create index family_id on products (family_id);
create index idx_products_date_added on products (products_date_added);
create index material_expenses on products (material_expenses);
create index products_brand_id on products (products_brand_id);
create index products_ean on products (products_ean);
create index products_status on products (products_status);
create index tb_status on products (tb_status);
编辑:表 style_has_products
CREATE TABLE `styles_has_products` (
`styles_id` int(10) unsigned NOT NULL DEFAULT '0',
`products_id` int(10) unsigned NOT NULL DEFAULT '0',
`date_added` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`styles_id`,`products_id`),
KEY `products_id` (`products_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
【问题讨论】:
products 表上有哪些索引? @Shadow 我添加了产品索引。请看主帖。谢谢! 能否请您也发布表格的唯一键?我问是为了知道,每个product_id
是否可以有多个products_description2
,每个product_id
是否可以有多个products_local
等等。我想它是products_id
+ language
for products_description
,所以我们通过查询得到了一个产品的描述,是吗?但是其他表呢?
浏览量不会影响性能。它们只是为了方便起见,因此您不必一次又一次地键入相同的内容。 (这并不完全正确;视图可以影响性能,但只会以负面的方式。它们永远不会更快,但有时会更慢,例如当它们嵌套或聚合时,因此优化器在应用时遇到问题早期标准。)
一些格式会很好
【参考方案1】:
首先,永远不要为实时使用编写如此复杂的查询。我会建议做批处理和维护数据仓库。并对数据仓库进行实时查询。
仍然有很多事情是你不应该对实时使用的 SQL 查询做的以获得性能。就像永远不要使用更多的连接操作,永远不要放更多的 if else 条件,永远不要应用 group by 尤其是如果 table 很大,在 table 中寻找适当的索引,分区结构。
【讨论】:
我不同意。我不认为这个查询很复杂。那么,为什么不使用 SQL 来聚合每个组的数据呢?这是 SQL 的优势之一。查询中通常涉及许多表。再次:为什么不加入表格?关系数据库就是这样,您为了选择数据而加入的相关表。【参考方案2】:我首先注意到的是您的subquery_styles
。除了过滤之外,您不会使用它的结果。然而,在我看来,标准属于WHERE
子句。您似乎想排除存在style_id
1967 的产品,我会使用NOT EXISTS
或NOT IN
:
WHERE p.active_status = 1
AND p.categories_top_id = 1
AND p.products_id NOT IN
(
SELECT products_id
FROM styles_has_products
WHERE styles_id = 1967
)
第二件事是您的查询没有合适的索引。您正在选择 active_status 1 和 categories_top_id 1 的产品,但这些列上没有索引。由于 product_id 的第三个条件与 style_id 1967 不匹配,我建议使用以下索引之一:
create index idx1 on products (active_status, categories_top_id, products_id);
create index idx2 on products (categories_top_id, active_status, products_id);
创建两者,查看哪个正在使用,然后删除另一个。
最后一点可以并且可能应该优化/更改是您的聚合。但是为了在这里提供帮助,我必须知道表的唯一键。一旦你发布它们,我会扩展这个答案:-)
【讨论】:
【参考方案3】:基于 Thorsten 的建议,而不是 NOT IN ( SELECT )
,使用
NOT EXISTS( SELECT * FROM styles_has_products
WHERE products_id = p.products_id
AND styles_id = 1967 )
styles_has_products
需要 INDEX(products_id, styles_id)
以任意顺序。
请告诉我们SHOW CREATE TABLE styles_has_products
。如果是many:many映射表,请看提示here。
索引需要在您要进入的表上,而不是从。所以products
的索引列表可能不会被使用。这个复合索引可能有用:
INDEX(categories_top_id, active_status) -- in either order
VIEWs
只是语法糖;它们本身并不提供任何性能优势。在某些情况下,它们会损害性能。
pd
、pd2
、pls
等,不使用;删除他们的JOINs
。
SUMs
和 AVGs
可能不正确。这是因为 JOIN
+ GROUP BY
发生了“爆炸内爆”。清理一些其他的东西,然后我们可以讨论如何重新排列这些东西,以便 SUMs
和 AVGs
在每个 product_id
中只使用一行。
【讨论】:
嗨瑞克,感谢您的帮助和提示。我已经在上面的描述中更新了表 style_has_products。将尝试首先了解您的帖子,并会跟进您的答案。再次感谢。 你应该从 MyISAM 迁移到 InnoDB。以上是关于需要帮助来提高 MYSQL 子查询性能的主要内容,如果未能解决你的问题,请参考以下文章
优化 SQL:如何重写此查询以提高性能? (使用子查询,摆脱 GROUP BY?)