为啥删除 ORDER BY 会显着加快此查询的速度?

Posted

技术标签:

【中文标题】为啥删除 ORDER BY 会显着加快此查询的速度?【英文标题】:Why does removing the ORDER BY significantly speed up this query?为什么删除 ORDER BY 会显着加快此查询的速度? 【发布时间】:2011-02-08 16:57:39 【问题描述】:

我有以下查询(其中一些是代码生成的,请原谅格式不佳):

SELECT DISTINCT COALESCE(gi.start_time, '') start_time,
COALESCE(b.name, '') bank,
COALESCE(a.id, '') account_id,
COALESCE(a.account_number, '') account_number,
COALESCE(at.code, '') account_type,
COALESCE(a.open_date, '') open_date,
COALESCE(a.interest_rate, '') interest_rate,
COALESCE(a.maturity_date, '') maturity_date,
COALESCE(a.opening_balance, '') opening_balance,
COALESCE(a.has_e_statement, '') has_e_statement,
COALESCE(a.has_bill_pay, '') has_bill_pay,
COALESCE(a.has_overdraft_protection, '') has_overdraft_protection,
COALESCE(a.balance, '') balance,
COALESCE(a.business_or_personal, '') business_or_personal,
COALESCE(a.cumulative_balance, '') cumulative_balance,
COALESCE(c.customer_number, '') customer_number,
COALESCE(c.social_security_number, '') social_security_number,
COALESCE(c.name, '') customer_name,
COALESCE(c.phone, '') phone,
COALESCE(c.deceased, '') deceased,
COALESCE(c.do_not_mail, '') do_not_mail,
COALESCE(cdob.date_of_birth, '') date_of_birth,
COALESCE(ad.line1, '') line1,
COALESCE(ad.line2, '') line2,
COALESCE(ad.city, '') city,
COALESCE(s.name, '') state,
COALESCE(ad.zip, '') zip,
COALESCE(o.officer_number, '') officer_number,
COALESCE(o.name, '') officer_name,
COALESCE(po.line1, '') po_box,
COALESCE(po.city, '') po_city,
COALESCE(po_state.name, '') po_state,
COALESCE(po.zip, '') zip,
COALESCE(br.number, '') branch_number,
COALESCE(cd_type.code, '') cd_type,
COALESCE(mp.product_number, '') macatawa_product_number,
COALESCE(mp.product_name, '') macatawa_product_name,
COALESCE(pt.name, '') macatawa_product_type,
COALESCE(hhsc.name, '') harte_hanks_service_category,
COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy,
COALESCE(cft.name, '') core_file_type,
COALESCE(oa.line1, '') original_address_line1,
COALESCE(oa.line2, '') original_address_line2,
COALESCE(uc.code, '') use_class
            FROM account a
            JOIN customer c ON a.customer_id = c.id
            JOIN officer o ON a.officer_id = o.id
            JOIN account_address aa ON aa.account_id = a.id
       LEFT JOIN account_po_box apb ON apb.account_id = a.id                
            JOIN address ad ON aa.address_id = ad.id
            JOIN original_address oa ON oa.address_id = ad.id
       LEFT JOIN address po ON apb.address_id = po.id
            JOIN state s ON s.id = ad.state_id
       LEFT JOIN state po_state ON po_state.id = po.state_id
       LEFT JOIN branch br ON a.branch_id = br.id
            JOIN account_import ai ON a.account_import_id = ai.id
            JOIN generic_import gi ON gi.id = ai.generic_import_id
            JOIN import_bundle ib ON gi.import_bundle_id = ib.id
            JOIN bank b ON b.id = ib.bank_id
       LEFT JOIN customer_date_of_birth cdob ON cdob.customer_id = c.id
       LEFT JOIN cd_type ON a.cd_type_id = cd_type.id
       LEFT JOIN account_macatawa_product amp ON amp.account_id = a.id
       LEFT JOIN macatawa_product mp ON mp.id = amp.macatawa_product_id
       LEFT JOIN product_type pt ON pt.id = mp.product_type_id
       LEFT JOIN harte_hanks_service_category hhsc ON hhsc.id = mp.harte_hanks_service_category_id
       LEFT JOIN core_file_type cft ON cft.id = mp.core_file_type_id
       LEFT JOIN use_class uc ON a.use_class_id = uc.id
       LEFT JOIN account_type at ON a.account_type_id = at.id

         WHERE 1
           AND gi.active = 1
           AND b.id = 8 AND ib.is_finished = 1
      ORDER BY a.id
         LIMIT 10

我在所有适当的列上都有索引,包括 account.id AKA a.id。尽管如此,如果我删除ORDER BY,我的查询会显着加快(从 10 秒到 0 秒)。这是为什么呢?

【问题讨论】:

排序需要时间? ORDER BY 是确保数据顺序的必要手段,否则无法保证但可能会基于插入顺序。 我知道排序需要时间,但对于我的 ~30,000 行结果集来说,10 秒似乎很多。也许它在WHERE 之前排序。 结果集很少与运行查询所需的时间有关。唯一影响的是 I/O 时间。查询性能还取决于您正在查询的数据的大小以及您正在应用的任何聚合。您有索引很好,但请参阅下面的 Ned 的回答。另外,请确保您的聚集索引在您的 id 列上,因为它单调增加。 我什至不想知道那个查询的笛卡尔积:) 【参考方案1】:

因为使用 ORDER BY,它必须检索所有行以对它们进行排序,才能通过 a.id 获得前 10 行。如果没有 ORDER BY,它可以简单地检索它找到的前 10 行并忽略其余行。

另外,在分析查询时要小心:第一个可以用数据填充缓存,随后的查询速度更快不是因为 SQL 不同,而是因为它从缓存而不是磁盘中提取数据。

【讨论】:

我想知道这一点——是否有某种标志可以用来确保查询结果不是来自缓存? @yc:这取决于 DBM,但对于 mysql,我认为您可以在运行查询之前使用 RESET QUERY CACHE。请注意,您需要 RELOAD 权限才能执行此操作。 @bitxwise 将重置整个数据库的缓存,但是,是的,而不仅仅是查询用户? @yc:我想这有点像霰弹枪的方法……哈哈 @bitxwise 我们显然跑题了,但是添加无效的条件呢?比如OR 1=2?这会迫使它在没有缓存的情况下进行查询吗?

以上是关于为啥删除 ORDER BY 会显着加快此查询的速度?的主要内容,如果未能解决你的问题,请参考以下文章

MySQL:ORDER BY 显着降低查询速度

ORDER BY 让我的查询超级慢。里面的例子。有啥加快速度的想法吗?

为啥主键上的“order by”会更改查询计划,从而忽略有用的索引?

添加 Limit 可以加快 mysql 查询速度,为啥?

我可以在不存储 group by 和 order by value 的情况下加快此查询吗?

为啥在 c# 中重用数组会显着提高性能?