菜鸟的 MySQL 查询优化和解释
Posted
技术标签:
【中文标题】菜鸟的 MySQL 查询优化和解释【英文标题】:MySQL query optimization and EXPLAIN for a noob 【发布时间】:2011-02-08 14:59:43 【问题描述】:我从事数据库工作已经有很长时间了,但我是查询优化的新手。我有以下查询(其中一些是代码生成的):
SELECT DISTINCT COALESCE(gi.start_time, '') start_time,
COALESCE(b.name, '') bank,
COALESCE(a.id, '') account_id,
COALESCE(a.account_number, '') account_number,
COALESCE(at.code, '') account_type,
COALESCE(a.open_date, '') open_date,
COALESCE(a.interest_rate, '') interest_rate,
COALESCE(a.maturity_date, '') maturity_date,
COALESCE(a.opening_balance, '') opening_balance,
COALESCE(a.has_e_statement, '') has_e_statement,
COALESCE(a.has_bill_pay, '') has_bill_pay,
COALESCE(a.has_overdraft_protection, '') has_overdraft_protection,
COALESCE(a.balance, '') balance,
COALESCE(a.business_or_personal, '') business_or_personal,
COALESCE(a.cumulative_balance, '') cumulative_balance,
COALESCE(c.customer_number, '') customer_number,
COALESCE(c.social_security_number, '') social_security_number,
COALESCE(c.name, '') customer_name,
COALESCE(c.phone, '') phone,
COALESCE(c.deceased, '') deceased,
COALESCE(c.do_not_mail, '') do_not_mail,
COALESCE(cdob.date_of_birth, '') date_of_birth,
COALESCE(ad.line1, '') line1,
COALESCE(ad.line2, '') line2,
COALESCE(ad.city, '') city,
COALESCE(s.name, '') state,
COALESCE(ad.zip, '') zip,
COALESCE(o.officer_number, '') officer_number,
COALESCE(o.name, '') officer_name,
COALESCE(po.line1, '') po_box,
COALESCE(po.city, '') po_city,
COALESCE(po_state.name, '') po_state,
COALESCE(po.zip, '') zip,
COALESCE(br.number, '') branch_number,
COALESCE(cd_type.code, '') cd_type,
COALESCE(mp.product_number, '') macatawa_product_number,
COALESCE(mp.product_name, '') macatawa_product_name,
COALESCE(pt.name, '') macatawa_product_type,
COALESCE(hhsc.name, '') harte_hanks_service_category,
COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy,
COALESCE(cft.name, '') core_file_type,
COALESCE(oa.line1, '') original_address_line1,
COALESCE(oa.line2, '') original_address_line2,
COALESCE(uc.code, '') use_class
FROM account a
JOIN customer c ON a.customer_id = c.id
JOIN officer o ON a.officer_id = o.id
JOIN account_address aa ON aa.account_id = a.id
LEFT JOIN account_po_box apb ON apb.account_id = a.id
JOIN address ad ON aa.address_id = ad.id
JOIN original_address oa ON oa.address_id = ad.id
LEFT JOIN address po ON apb.address_id = po.id
JOIN state s ON s.id = ad.state_id
LEFT JOIN state po_state ON po_state.id = po.state_id
LEFT JOIN branch br ON a.branch_id = br.id
JOIN account_import ai ON a.account_import_id = ai.id
JOIN generic_import gi ON gi.id = ai.generic_import_id
JOIN import_bundle ib ON gi.import_bundle_id = ib.id
JOIN bank b ON b.id = ib.bank_id
LEFT JOIN customer_date_of_birth cdob ON cdob.customer_id = c.id
LEFT JOIN cd_type ON a.cd_type_id = cd_type.id
LEFT JOIN account_macatawa_product amp ON amp.account_id = a.id
LEFT JOIN macatawa_product mp ON mp.id = amp.macatawa_product_id
LEFT JOIN product_type pt ON pt.id = mp.product_type_id
LEFT JOIN harte_hanks_service_category hhsc
ON hhsc.id = mp.harte_hanks_service_category_id
LEFT JOIN core_file_type cft ON cft.id = mp.core_file_type_id
LEFT JOIN use_class uc ON a.use_class_id = uc.id
LEFT JOIN account_type at ON a.account_type_id = at.id
WHERE 1
AND gi.active = 1
AND b.id = 8 AND ib.is_finished = 1
ORDER BY a.id
LIMIT 10
而且速度很慢。在我的开发服务器上运行大约需要一分钟,而在我的生产服务器上,有更多数据,我什至无法完成。这是EXPLAIN
的样子:
http://i.stack.imgur.com/eR6lq.png
我知道EXPLAIN
的基础知识。我知道对于key
下的所有内容,我有除NULL
之外的其他内容很好。但总的来说,我不知道我的查询有多少改进空间。我知道Extra
下的Using temporary; Using filesort
不好,但我不知道该怎么办。
【问题讨论】:
你要加入的表有索引吗? 我在我要加入的大多数表的某些列上都有索引。 @jason swett - 此查询中的相关列是您加入的列。请参阅下面的答案。 EXPLAIN 结果是来自生产服务器还是开发服务器? 你能在生产服务器上尝试EXPLAIN
,在所有相关表上的ANALYZE TABLE
之后?我猜,适当的行数可能会导致非常不同的执行路径......
【参考方案1】:
您的大部分 JOIN
字段似乎都没有索引。确保您用作JOIN
键的每个 字段在两个表上都有索引。
如果有 23 个连接,并且看起来只有 2 个相关索引,那么性能会很差。
由于没有索引可以引用,查询引擎会检查两个表中的每一行来比较它们,这显然是非常低效的。
编辑:
例如,在您的查询中,您有
JOIN customer c ON a.customer_id = c.id
确保您在a.customer_id
AND customer.id
上有一个索引。在两个表(JOIN
ed 字段)上都有索引将成倍地加快查询速度。
【讨论】:
很好,显然还有很大的改进空间。当您说“确保每个字段...在两个表上都有索引”时,索引到底是什么意思?在我有限的知识中,索引是添加到列中的东西,而不是添加到表中。 @Jason - 它在列上。我将编辑答案以获取详细信息。 这里有一个猜测:我会为customer.id
和account.customer_id
添加一个索引吗?然后在任何地方都应用同样的想法?
太棒了。非常感谢你。只是为了确保在完成所有这些更改之前我正在做这件事,我想我会做create index index_customer_id on customer (id) using btree
和create index index_account_customer_id on account (customer_id) using btree
。这是正确的想法吗?【参考方案2】:
除了@JNK 在他的回答中提到的关于确保您拥有索引的内容之外,我还重新构建了您的查询并在顶部添加了“STRAIGHT_JOIN”子句,该子句告诉优化器按照表呈现的顺序执行查询它。
由于您的查询基于通用导入,因此要将捆绑包导入银行,我已将 THOSE 移至列表的前面... where 将首先对 THOSE 记录进行资格预审,而不是查看所有可能永远不要成为结果的一部分。因此,现在按照您开始使用的相同关系,将连接从通用导入反转回帐户。
我还在它们要加入的表下直接关联了各自的 JOIN / ON 条件,以提高可读性和以下表关系。我也这样做了,所以 ON 子句有 Table1.ID = JoinedTable.ID... 虽然有些颠倒,否则没什么大不了的,知道某事是如何基于 join INTO 另一个只会让可读性更容易。
因此,请确保各个表在连接的任何键列上都有索引,并且从这个示例查询中,确保您的 GI 表(别名)在“Active”上有一个索引,并且您的 IB(别名)在“Active”上有一个索引Is_Finished。
最后,你的 WHERE 子句有 WHERE 1 AND... 没有“1”的目的,所以我把它去掉了。
SELECT STRAIGHT_JOIN DISTINCT
COALESCE(gi.start_time, '') start_time,
COALESCE(b.name, '') bank,
COALESCE(a.id, '') account_id,
COALESCE(a.account_number, '') account_number,
COALESCE(at.code, '') account_type,
COALESCE(a.open_date, '') open_date,
COALESCE(a.interest_rate, '') interest_rate,
COALESCE(a.maturity_date, '') maturity_date,
COALESCE(a.opening_balance, '') opening_balance,
COALESCE(a.has_e_statement, '') has_e_statement,
COALESCE(a.has_bill_pay, '') has_bill_pay,
COALESCE(a.has_overdraft_protection, '') has_overdraft_protection,
COALESCE(a.balance, '') balance,
COALESCE(a.business_or_personal, '') business_or_personal,
COALESCE(a.cumulative_balance, '') cumulative_balance,
COALESCE(c.customer_number, '') customer_number,
COALESCE(c.social_security_number, '') social_security_number,
COALESCE(c.name, '') customer_name,
COALESCE(c.phone, '') phone,
COALESCE(c.deceased, '') deceased,
COALESCE(c.do_not_mail, '') do_not_mail,
COALESCE(cdob.date_of_birth, '') date_of_birth,
COALESCE(ad.line1, '') line1,
COALESCE(ad.line2, '') line2,
COALESCE(ad.city, '') city,
COALESCE(s.name, '') state,
COALESCE(ad.zip, '') zip,
COALESCE(o.officer_number, '') officer_number,
COALESCE(o.name, '') officer_name,
COALESCE(po.line1, '') po_box,
COALESCE(po.city, '') po_city,
COALESCE(po_state.name, '') po_state,
COALESCE(po.zip, '') zip,
COALESCE(br.number, '') branch_number,
COALESCE(cd_type.code, '') cd_type,
COALESCE(mp.product_number, '') macatawa_product_number,
COALESCE(mp.product_name, '') macatawa_product_name,
COALESCE(pt.name, '') macatawa_product_type,
COALESCE(hhsc.name, '') harte_hanks_service_category,
COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy,
COALESCE(cft.name, '') core_file_type,
COALESCE(oa.line1, '') original_address_line1,
COALESCE(oa.line2, '') original_address_line2,
COALESCE(uc.code, '') use_class
FROM
generic_import gi
JOIN import_bundle ib
ON gi.import_bundle_id = ib.id
JOIN bank b
ON ib.bank_id = b.id
JOIN account_import ai
ON gi.id = ai.generic_import_id
JOIN account a
ON ai.id = a.account_import_id
JOIN customer c
ON a.customer_id = c.id
LEFT JOIN customer_date_of_birth cdob
ON c.id = cdob.customer_id
JOIN officer o
ON a.officer_id = o.id
LEFT JOIN branch br
ON a.branch_id = br.id
LEFT JOIN cd_type
ON a.cd_type_id = cd_type.id
LEFT JOIN account_macatawa_product amp
ON a.id = amp.account_id
LEFT JOIN macatawa_product mp
ON amp.macatawa_product_id = mp.id
LEFT JOIN product_type pt
ON mp.product_type_id = pt.id
LEFT JOIN harte_hanks_service_category hhsc
ON mp.harte_hanks_service_category_id = hhsc.id
LEFT JOIN core_file_type cft
ON mp.core_file_type_id = cft.id
LEFT JOIN use_class uc
ON a.use_class_id = uc.id
LEFT JOIN account_type at
ON a.account_type_id = at.id
JOIN account_address aa
ON a.id = aa.account_id
JOIN address ad
ON aa.address_id = ad.id
JOIN original_address oa
ON ad.id = oa.address_id
JOIN state s
ON ad.state_id = s.id
LEFT JOIN account_po_box apb
ON a.id = apb.account_id
LEFT JOIN address po
ON apb.address_id = po.id
LEFT JOIN state po_state
ON po.state_id = po_state.id
WHERE
gi.active = 1
AND ib.is_finished = 1
AND b.id = 8
ORDER BY
a.id
LIMIT
10
【讨论】:
感谢您这样做。我感到困惑的一件事是:如果我尝试运行查询,它会抱怨“ai”不是唯一的别名,这是真的,因为您有两次JOIN account_import ai
,但我不知道该怎么做。
@Jason Swett,已修改,我取出了第二个实例...它与我反转主要查询元素时重复...试试这个。以上是关于菜鸟的 MySQL 查询优化和解释的主要内容,如果未能解决你的问题,请参考以下文章