菜鸟的 MySQL 查询优化和解释

Posted

技术标签:

【中文标题】菜鸟的 MySQL 查询优化和解释【英文标题】:MySQL query optimization and EXPLAIN for a noob 【发布时间】:2011-02-08 14:59:43 【问题描述】:

我从事数据库工作已经有很长时间了,但我是查询优化的新手。我有以下查询(其中一些是代码生成的):

SELECT DISTINCT COALESCE(gi.start_time, '') start_time,
COALESCE(b.name, '') bank,
COALESCE(a.id, '') account_id,
COALESCE(a.account_number, '') account_number,
COALESCE(at.code, '') account_type,
COALESCE(a.open_date, '') open_date,
COALESCE(a.interest_rate, '') interest_rate,
COALESCE(a.maturity_date, '') maturity_date,
COALESCE(a.opening_balance, '') opening_balance,
COALESCE(a.has_e_statement, '') has_e_statement,
COALESCE(a.has_bill_pay, '') has_bill_pay,
COALESCE(a.has_overdraft_protection, '') has_overdraft_protection,
COALESCE(a.balance, '') balance,
COALESCE(a.business_or_personal, '') business_or_personal,
COALESCE(a.cumulative_balance, '') cumulative_balance,
COALESCE(c.customer_number, '') customer_number,
COALESCE(c.social_security_number, '') social_security_number,
COALESCE(c.name, '') customer_name,
COALESCE(c.phone, '') phone,
COALESCE(c.deceased, '') deceased,
COALESCE(c.do_not_mail, '') do_not_mail,
COALESCE(cdob.date_of_birth, '') date_of_birth,
COALESCE(ad.line1, '') line1,
COALESCE(ad.line2, '') line2,
COALESCE(ad.city, '') city,
COALESCE(s.name, '') state,
COALESCE(ad.zip, '') zip,
COALESCE(o.officer_number, '') officer_number,
COALESCE(o.name, '') officer_name,
COALESCE(po.line1, '') po_box,
COALESCE(po.city, '') po_city,
COALESCE(po_state.name, '') po_state,
COALESCE(po.zip, '') zip,
COALESCE(br.number, '') branch_number,
COALESCE(cd_type.code, '') cd_type,
COALESCE(mp.product_number, '') macatawa_product_number,
COALESCE(mp.product_name, '') macatawa_product_name,
COALESCE(pt.name, '') macatawa_product_type,
COALESCE(hhsc.name, '') harte_hanks_service_category,
COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy,
COALESCE(cft.name, '') core_file_type,
COALESCE(oa.line1, '') original_address_line1,
COALESCE(oa.line2, '') original_address_line2,
COALESCE(uc.code, '') use_class
            FROM account a
            JOIN customer c ON a.customer_id = c.id
            JOIN officer o ON a.officer_id = o.id
            JOIN account_address aa ON aa.account_id = a.id
       LEFT JOIN account_po_box apb ON apb.account_id = a.id                
            JOIN address ad ON aa.address_id = ad.id
            JOIN original_address oa ON oa.address_id = ad.id
       LEFT JOIN address po ON apb.address_id = po.id
            JOIN state s ON s.id = ad.state_id
       LEFT JOIN state po_state ON po_state.id = po.state_id
       LEFT JOIN branch br ON a.branch_id = br.id
            JOIN account_import ai ON a.account_import_id = ai.id
            JOIN generic_import gi ON gi.id = ai.generic_import_id
            JOIN import_bundle ib ON gi.import_bundle_id = ib.id
            JOIN bank b ON b.id = ib.bank_id
       LEFT JOIN customer_date_of_birth cdob ON cdob.customer_id = c.id
       LEFT JOIN cd_type ON a.cd_type_id = cd_type.id
       LEFT JOIN account_macatawa_product amp ON amp.account_id = a.id
       LEFT JOIN macatawa_product mp ON mp.id = amp.macatawa_product_id
       LEFT JOIN product_type pt ON pt.id = mp.product_type_id
       LEFT JOIN harte_hanks_service_category hhsc
            ON hhsc.id = mp.harte_hanks_service_category_id
       LEFT JOIN core_file_type cft ON cft.id = mp.core_file_type_id
       LEFT JOIN use_class uc ON a.use_class_id = uc.id
       LEFT JOIN account_type at ON a.account_type_id = at.id

         WHERE 1
           AND gi.active = 1
           AND b.id = 8 AND ib.is_finished = 1

        ORDER BY a.id
           LIMIT 10

而且速度很慢。在我的开发服务器上运行大约需要一分钟,而在我的生产服务器上,有更多数据,我什至无法完成。这是EXPLAIN 的样子:

http://i.stack.imgur.com/eR6lq.png

我知道EXPLAIN 的基础知识。我知道对于key 下的所有内容,我有除NULL 之外的其他内容很好。但总的来说,我不知道我的查询有多少改进空间。我知道Extra 下的Using temporary; Using filesort 不好,但我不知道该怎么办。

【问题讨论】:

你要加入的表有索引吗? 我在我要加入的大多数表的某些列上都有索引。 @jason swett - 此查询中的相关列是您加入的列。请参阅下面的答案。 EXPLAIN 结果是来自生产服务器还是开发服务器? 你能在生产服务器上尝试EXPLAIN,在所有相关表上的ANALYZE TABLE 之后?我猜,适当的行数可能会导致非常不同的执行路径...... 【参考方案1】:

您的大部分 JOIN 字段似乎都没有索引。确保您用作JOIN 键的每个 字段在两个表上都有索引。

如果有 23 个连接,并且看起来只有 2 个相关索引,那么性能会很差。

由于没有索引可以引用,查询引擎会检查两个表中的每一行来比较它们,这显然是非常低效的。

编辑:

例如,在您的查询中,您有

JOIN customer c ON a.customer_id = c.id

确保您在a.customer_id AND customer.id 上有一个索引。在两个表(JOINed 字段)上都有索引将成倍地加快查询速度。

【讨论】:

很好,显然还有很大的改进空间。当您说“确保每个字段...在两个表上都有索引”时,索引到底是什么意思?在我有限的知识中,索引是添加到列中的东西,而不是添加到表中。 @Jason - 它在列上。我将编辑答案以获取详细信息。 这里有一个猜测:我会为customer.idaccount.customer_id 添加一个索引吗?然后在任何地方都应用同样的想法? 太棒了。非常感谢你。只是为了确保在完成所有这些更改之前我正在做这件事,我想我会做create index index_customer_id on customer (id) using btreecreate index index_account_customer_id on account (customer_id) using btree。这是正确的想法吗?【参考方案2】:

除了@JNK 在他的回答中提到的关于确保您拥有索引的内容之外,我还重新构建了您的查询并在顶部添加了“STRAIGHT_JOIN”子句,该子句告诉优化器按照表呈现的顺序执行查询它。

由于您的查询基于通用导入,因此要将捆绑包导入银行,我已将 THOSE 移至列表的前面... where 将首先对 THOSE 记录进行资格预审,而不是查看所有可能永远不要成为结果的一部分。因此,现在按照您开始使用的相同关系,将连接从通用导入反转回帐户。

我还在它们要加入的表下直接关联了各自的 JOIN / ON 条件,以提高可读性和以下表关系。我也这样做了,所以 ON 子句有 Table1.ID = JoinedTable.ID... 虽然有些颠倒,否则没什么大不了的,知道某事是如何基于 join INTO 另一个只会让可读性更容易。

因此,请确保各个表在连接的任何键列上都有索引,并且从这个示例查询中,确保您的 GI 表(别名)在“Active”上有一个索引,并且您的 IB(别名)在“Active”上有一个索引Is_Finished。

最后,你的 WHERE 子句有 WHERE 1 AND... 没有“1”的目的,所以我把它去掉了。

SELECT STRAIGHT_JOIN DISTINCT 
      COALESCE(gi.start_time, '') start_time, 
      COALESCE(b.name, '') bank, 
      COALESCE(a.id, '') account_id, 
      COALESCE(a.account_number, '') account_number, 
      COALESCE(at.code, '') account_type, 
      COALESCE(a.open_date, '') open_date, 
      COALESCE(a.interest_rate, '') interest_rate, 
      COALESCE(a.maturity_date, '') maturity_date, 
      COALESCE(a.opening_balance, '') opening_balance, 
      COALESCE(a.has_e_statement, '') has_e_statement, 
      COALESCE(a.has_bill_pay, '') has_bill_pay, 
      COALESCE(a.has_overdraft_protection, '') has_overdraft_protection, 
      COALESCE(a.balance, '') balance, 
      COALESCE(a.business_or_personal, '') business_or_personal, 
      COALESCE(a.cumulative_balance, '') cumulative_balance, 
      COALESCE(c.customer_number, '') customer_number, 
      COALESCE(c.social_security_number, '') social_security_number, 
      COALESCE(c.name, '') customer_name, 
      COALESCE(c.phone, '') phone, 
      COALESCE(c.deceased, '') deceased, 
      COALESCE(c.do_not_mail, '') do_not_mail, 
      COALESCE(cdob.date_of_birth, '') date_of_birth, 
      COALESCE(ad.line1, '') line1, 
      COALESCE(ad.line2, '') line2, 
      COALESCE(ad.city, '') city, 
      COALESCE(s.name, '') state, 
      COALESCE(ad.zip, '') zip, 
      COALESCE(o.officer_number, '') officer_number, 
      COALESCE(o.name, '') officer_name, 
      COALESCE(po.line1, '') po_box, 
      COALESCE(po.city, '') po_city, 
      COALESCE(po_state.name, '') po_state, 
      COALESCE(po.zip, '') zip, 
      COALESCE(br.number, '') branch_number, 
      COALESCE(cd_type.code, '') cd_type, 
      COALESCE(mp.product_number, '') macatawa_product_number, 
      COALESCE(mp.product_name, '') macatawa_product_name, 
      COALESCE(pt.name, '') macatawa_product_type, 
      COALESCE(hhsc.name, '') harte_hanks_service_category, 
      COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy, 
      COALESCE(cft.name, '') core_file_type, 
      COALESCE(oa.line1, '') original_address_line1, 
      COALESCE(oa.line2, '') original_address_line2, 
      COALESCE(uc.code, '') use_class             
   FROM 
      generic_import gi 
         JOIN import_bundle ib 
            ON gi.import_bundle_id = ib.id
            JOIN bank b 
               ON ib.bank_id = b.id 
         JOIN account_import ai 
            ON gi.id = ai.generic_import_id
         JOIN  account a
            ON ai.id = a.account_import_id
            JOIN customer c 
               ON a.customer_id = c.id
               LEFT JOIN customer_date_of_birth cdob 
                  ON c.id = cdob.customer_id
            JOIN officer o 
               ON a.officer_id = o.id
            LEFT JOIN branch br 
               ON a.branch_id = br.id
            LEFT JOIN cd_type 
               ON a.cd_type_id = cd_type.id
            LEFT JOIN account_macatawa_product amp 
               ON a.id = amp.account_id
               LEFT JOIN macatawa_product mp 
                  ON amp.macatawa_product_id = mp.id
                  LEFT JOIN product_type pt 
                     ON mp.product_type_id = pt.id
                  LEFT JOIN harte_hanks_service_category hhsc 
                     ON mp.harte_hanks_service_category_id = hhsc.id
                  LEFT JOIN core_file_type cft 
                     ON mp.core_file_type_id = cft.id
            LEFT JOIN use_class uc 
               ON a.use_class_id = uc.id
            LEFT JOIN account_type at 
               ON a.account_type_id = at.id
            JOIN account_address aa 
               ON a.id = aa.account_id 
               JOIN address ad 
                  ON aa.address_id = ad.id 
                  JOIN original_address oa 
                     ON ad.id = oa.address_id
                  JOIN state s 
                     ON ad.state_id = s.id 
            LEFT JOIN account_po_box apb 
               ON a.id = apb.account_id 
               LEFT JOIN address po 
                  ON apb.address_id = po.id
                  LEFT JOIN state po_state 
                     ON po.state_id = po_state.id
      WHERE 
              gi.active = 1
          AND ib.is_finished = 1
          AND b.id = 8 
      ORDER BY 
          a.id
       LIMIT 
          10 

【讨论】:

感谢您这样做。我感到困惑的一件事是:如果我尝试运行查询,它会抱怨“ai”不是唯一的别名,这是真的,因为您有两次 JOIN account_import ai,但我不知道该怎么做。 @Jason Swett,已修改,我取出了第二个实例...它与我反转主要查询元素时重复...试试这个。

以上是关于菜鸟的 MySQL 查询优化和解释的主要内容,如果未能解决你的问题,请参考以下文章

如何优化这个 mysql 查询 - 解释包括的输出

需要帮助优化一个有趣的 MySQL 查询

MySQL具体解释(19)----------海量数据分页查询优化

MySql 查询优化器

基于查询计划优化 MySQL 查询的建议

MySQL使用explains优化慢查询