为啥子查询中的 OR 会使查询慢得多?

Posted

技术标签:

【中文标题】为啥子查询中的 OR 会使查询慢得多?【英文标题】:Why does OR in subquery make query so much slower?为什么子查询中的 OR 会使查询慢得多? 【发布时间】:2021-10-02 10:59:45 【问题描述】:

我正在使用 mysql,并且有以下我正在尝试改进的查询:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            ft.conversion_rate
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft ON (
                ft.commercial_document_id = pa.payment_source_id
                AND ft.commercial_document_type = pa.payment_source_type
            )
            OR (
                ft.commercial_document_id = cn.id
                AND ft.commercial_document_type = 'CreditNote'
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            AND ft.company_id = 14792
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

运行需要 10 秒。通过删除子查询中的 OR 语句并使用 COALESCE 获得结果,我能够将其提高到 0.047 秒:

SELECT
    *
FROM
    overpayments AS op
    JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
    AND overpayment_pa.allocatable_type = 'Overpayment'
    JOIN (
        SELECT
            pa.payment_source_type,
            pa.payment_source_id,
            coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
        FROM
            payment_allocations AS pa
            LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
            LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
            LEFT JOIN financial_transactions AS ft_one ON (
                ft_one.commercial_document_id = pa.payment_source_id
                AND ft_one.commercial_document_type = pa.payment_source_type
                AND ft_one.company_id = 14792
            )
            LEFT JOIN financial_transactions AS ft_two ON (
                ft_two.commercial_document_id = cn.id
                AND ft_two.commercial_document_type = 'CreditNote'
                AND ft_two.company_id = 14792
            )
        WHERE
            pa.allocatable_type = 'Overpayment'
            AND pa.company_id = 14792
            
    ) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
    AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;

但是,我真的不明白为什么会这样?原来的子查询跑得很快,只返回了 2 个结果,那为什么它会减慢查询速度呢?解释第一个查询返回以下内容:

# id select_type table partitions type possible_keys key key_len ref rows filtered Extra FIELD13
1 SIMPLE pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_company_id index_payment_allocations_on_company_id 5 const 191 10.00 Using where
1 SIMPLE overpayment_pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_allocatable_id index_payment_allocations_on_payment_source_id 5 rails.pa.payment_source_id 1 3.42 Using where
1 SIMPLE op eq_ref PRIMARY PRIMARY 4 rails.overpayment_pa.allocatable_id 1 100.00
1 SIMPLE li eq_ref PRIMARY PRIMARY 4 rails.pa.payment_source_id 1 100.00
1 SIMPLE cn eq_ref PRIMARY PRIMARY 8 rails.li.parent_document_id 1 100.00 Using where; Using index
1 SIMPLE ft ALL transactions_unique_by_commercial_doc 12587878 0.00 Range checked for each record (index map: 0x2)

第二次我得到以下信息:

# id select_type table partitions type possible_keys key key_len ref rows filtered Extra FIELD13 FIELD14
1 SIMPLE pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_company_id index_payment_allocations_on_company_id 5 const 191 10.00 Using where
1 SIMPLE overpayment_pa ref index_payment_allocations_on_payment_source_id index_payment_allocations_on_allocatable_id index_payment_allocations_on_payment_source_id 5 rails.pa.payment_source_id 1 3.42 Using where
1 SIMPLE op eq_ref PRIMARY PRIMARY 4 rails.overpayment_pa.allocatable_id 1 100.00
1 SIMPLE ft_one ref transactions_unique_by_commercial_doc index_financial_transactions_on_company_id transactions_unique_by_commercial_doc 773 rails.pa.payment_source_id rails.pa.payment_source_type 1 100.00 Using where
1 SIMPLE li eq_ref PRIMARY PRIMARY 4 rails.pa.payment_source_id 1 100.00
1 SIMPLE cn eq_ref PRIMARY PRIMARY 8 rails.li.parent_document_id 1 100.00 Using where; Using index
1 SIMPLE ft_two ref transactions_unique_by_commercial_doc index_financial_transactions_on_company_id transactions_unique_by_commercial_doc 773 rails.cn.id const 1 100.00 Using where

但我真的不知道如何解释这些结果。

【问题讨论】:

【参考方案1】:

查看您的第一个 EXPLAIN 的最后一行的右侧。它不使用索引,它必须扫描百万行。那很慢。您的第二个查询对查询的每一步都使用了索引,因此速度要快得多。

如果您的第二个查询产生了正确的结果,请使用它并且不要回头。恭喜!您已优化查询。

OR 操作,尤其是在 ON 子句中,比通常的查询计划器模块更难满足,因为它们通常意味着它必须采用两个单独的子查询的并集。看起来计划者选择在你的情况下强行使用它。 (蛮力 === 扫描多行。)

不知道您的索引,很难进一步帮助您。

阅读本文以了解更多信息。 https://use-the-index-luke.com

【讨论】:

【参考方案2】:

这些可能会进一步加快第二个公式:

overpayment_pa:
    INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id,  payment_source_type)
financial_transactions:
    INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)

【讨论】:

以上是关于为啥子查询中的 OR 会使查询慢得多?的主要内容,如果未能解决你的问题,请参考以下文章

MySQL 查询调优 - 为啥使用变量中的值比使用文字慢得多?

为啥通过 django QuerySet 进行查询比在 Django 中使用游标慢得多?

使用 OR 的 SQL 查询比 2 个单独的查询慢得多

为啥在 SQL Azure 上运行查询要慢得多?

使用参数化查询时,TVF 慢得多

为啥在 SQL 查询中 NOT IN 比 IN 慢得多