查询很慢,我可以做些啥来改进?

Posted

技术标签:

【中文标题】查询很慢,我可以做些啥来改进?【英文标题】:Very slow query, what can I do to improve?查询很慢,我可以做些什么来改进? 【发布时间】:2020-07-22 17:20:49 【问题描述】:

我对这个从我正在使用的后端框架自动生成的查询有疑问。 查询速度很慢,导致我的脚本出现问题。

更新:按照@The Impaler 的回答后,我的初始查询问题已解决,执行时间从 30 秒变为 300 毫秒。

唯一的问题是在我的 WHERE 子句中添加了一些额外的条件后,执行时间又回到了 28s-32s。

旧查询:

SELECT `growthservice_growthserviceaccountdata`.`id`, `growthservice_growthserviceaccountdata`.`username`, 
   `growthservice_growthserviceaccountdata`.`name`, `growthservice_growthserviceaccountdata`.`bio`, 
   `growthservice_growthserviceaccountdata`.`avatar`, `growthservice_growthserviceaccountdata`.`language`, 
   `growthservice_growthserviceaccountdata`.`gender`, `growthservice_growthserviceaccountdata`.`follower_count`,
   `growthservice_growthserviceaccountdata`.`following_count`, `growthservice_growthserviceaccountdata`.`like_count`, 
   `growthservice_growthserviceaccountdata`.`post_count`, `growthservice_growthserviceaccountdata`.`is_private`, 
   `growthservice_growthserviceaccountdata`.`is_business`, `growthservice_growthserviceaccountdata`.`is_verified`, 
   `growthservice_growthserviceaccountdata`.`is_fetched`, `growthservice_growthserviceaccountdata`.`created_date`, 
   `growthservice_growthserviceaccountdata`.`updated_date` 
FROM `growthservice_growthserviceaccountdata` 
INNER JOIN `growthservice_growthservicerelationdata` 
ON (`growthservice_growthserviceaccountdata`.`id` = `growthservice_growthservicerelationdata`.`account_id`) 
WHERE (
    `growthservice_growthservicerelationdata`.`source_id` = 6812397029810258950 
    AND NOT (`growthservice_growthserviceaccountdata`.`id` IN (
        SELECT U0.`subject_id` AS Col1 FROM `growthservice_log` U0 
        WHERE (U0.`account_id` = 6570863662218543109 AND U0.`action` = 'LIKE' AND U0.`subject_type` = 'USER')
        )
        )
        ) LIMIT 55;

新查询

SELECT `growthservice_growthserviceaccountdata`.`id`, `growthservice_growthserviceaccountdata`.`username`, 
   `growthservice_growthserviceaccountdata`.`name`, `growthservice_growthserviceaccountdata`.`bio`, 
   `growthservice_growthserviceaccountdata`.`avatar`, `growthservice_growthserviceaccountdata`.`language`, 
   `growthservice_growthserviceaccountdata`.`gender`, `growthservice_growthserviceaccountdata`.`follower_count`,
   `growthservice_growthserviceaccountdata`.`following_count`, `growthservice_growthserviceaccountdata`.`like_count`, 
   `growthservice_growthserviceaccountdata`.`post_count`, `growthservice_growthserviceaccountdata`.`is_private`, 
   `growthservice_growthserviceaccountdata`.`is_business`, `growthservice_growthserviceaccountdata`.`is_verified`, 
   `growthservice_growthserviceaccountdata`.`is_fetched`, `growthservice_growthserviceaccountdata`.`created_date`, 
   `growthservice_growthserviceaccountdata`.`updated_date` 
FROM `growthservice_growthserviceaccountdata` 
INNER JOIN `growthservice_growthservicerelationdata` 
ON (`growthservice_growthserviceaccountdata`.`id` = `growthservice_growthservicerelationdata`.`account_id`) 
WHERE (
    `growthservice_growthservicerelationdata`.`source_id` = 6812397029810258950 
    AND `growthservice_growthserviceaccountdata`.`following_count` >= 30 
    AND `growthservice_growthserviceaccountdata`.`follower_count` >= 10 
    AND NOT (`growthservice_growthserviceaccountdata`.`username` LIKE BINARY '%user%') 
    AND NOT (`growthservice_growthserviceaccountdata`.`id` IN (
        SELECT U0.`subject_id` AS Col1 FROM `growthservice_log` U0 
        WHERE (U0.`account_id` = 6570863662218543109 AND U0.`action` = 'LIKE' AND U0.`subject_type` = 'USER')
        )
        )
        ) 
ORDER BY `growthservice_growthserviceaccountdata`.`created_date` DESC
LIMIT 55;

编辑查询以显示表别名与长名称的更易读性

SELECT 
            gsad.id, 
            gsad.username,
            gsad.name,
            gsad.bio,
            gsad.avatar,
            gsad.`language`,
            gsad.gender, 
            gsad.follower_count,
            gsad.following_count, 
            gsad.like_count,
            gsad.post_count, 
            gsad.is_private, 
            gsad.is_business, 
            gsad.is_verified,
            gsad.is_fetched, 
            gsad.created_date,
            gsad.updated_date 
        FROM
            growthservice_growthserviceaccountdata gsad
                INNER JOIN growthservice_growthservicerelationdata gsrd
                    ON sad.id = gsrd.account_id
        WHERE 
                gsrd.source_id = 6812397029810258950
            AND sad.following_count >= 30
            AND sad.follower_count >= 10 
            AND NOT sad.username LIKE BINARY '%user%'
            AND NOT sad.id IN ( SELECT U0.subject_id AS Col1 
                                    FROM growthservice_log U0
                                    WHERE (U0.account_id = 6570863662218543109 
                                        AND U0.action = 'LIKE' 
                                        AND U0.subject_type = 'USER') 
                            )
        ORDER BY 
            sad.created_date DESC
        LIMIT 55;






The old EXPLAIN output is:

    | id | select_type    

| table                                   | partitions | type   | possible_keys                                                                                                                                     | key                                                          | key_len | ref                                                                    | rows | filtered | Extra       |
|  1 | PRIMARY            | growthservice_growthservicerelationdata | NULL       | ref    | growthservice_growth_account_id_93684974_fk_growthser,growthservice_growth_source_id_86fb3471_fk_growthser                                        | growthservice_growth_source_id_86fb3471_fk_growthser         | 8       | const                                                                  | 5741 |   100.00 | Using where |
|  1 | PRIMARY            | growthservice_growthserviceaccountdata  | NULL       | eq_ref | PRIMARY,follower_count,following_count                                                                                                            | PRIMARY                                                      | 8       | app.growthservice_growthservicerelationdata.account_id                 | 1 |    22.22 | Using where |
|  2 | DEPENDENT SUBQUERY | U0                                      | NULL       | ref    | growthservice_log_account_id_ac95df3e_fk_accounts_account_id,growthservice_log_action_45cfd84b,growthservice_log_subject_id_17399893,subject_type | growthservice_log_account_id_ac95df3e_fk_accounts_account_id | 8       | const                                                                  | 2822 |     2.50 | Using where |

新的 EXPLAIN 输出是:

| id | select_type        | table                                   | partitions | type   | possible_keys                                                                                              | key                                                  | key_len | ref                                                                    | rows | filtered | Extra       
|  1 | PRIMARY            | growthservice_growthservicerelationdata | NULL       | ref    | growthservice_growth_account_id_93684974_fk_growthser,growthservice_growth_source_id_86fb3471_fk_growthser | growthservice_growth_source_id_86fb3471_fk_growthser | 8       | const                                                                  | 5741 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | growthservice_growthserviceaccountdata  | NULL       | eq_ref | PRIMARY,follower_count,following_count,ix1                                                                 | PRIMARY                                              | 8       | app.growthservice_growthservicerelationdata.account_id |    1 |    25.00 | Using where
|  2 | DEPENDENT SUBQUERY | U0                                      | NULL       | ref    | growthservice_log_action_45cfd84b,growthservice_log_subject_id_17399893,subject_type,ix1                   | ix1                                                  | 1612    | const,const,const                                                      | 2564 |    10.00 | Using where 

查询平均耗时 30 秒。

表 growthservice_growthserviceaccountdata 有 700 万行。

表 growthservice_growthservicerelationdata 有 700 万行。

表growthservice_log有15万行。

我已经在 where 子句中过滤的所有字段上都有单列索引。

究竟什么会减慢查询速度,我可以做些什么来解决它?

很奇怪的是,如果我从 WHERE 子句中删除 NOT IN,则查询仅在 300 毫秒内执行,而不是 30 秒。

【问题讨论】:

可以使用 Not Exist 运算符(反半联接)代替 NOT IN,以提高性能。我怀疑 not in 可能会导致引擎执行额外的过滤器操作,使其变慢。 在深入了解表和列别名之前,我会先熟悉一下 @Strawberry 正如我在 OP 中所说,我正在使用的后端框架自动生成的查询。 我应该请你用别名修复名称以帮助区分 growthservice_growthserviceaccountdatagrowthservice_growthservicerelationdata -- AS aAS r 怎么样?好的,您接近 gsadgsrd。但那是sad?? 【参考方案1】:

单列索引对子查询没有帮助,因为它有一个三列相等谓词。要提高子查询的性能,您可以添加索引:

create index ix1 on `growthservice_log` (`account_id`, `action`, `subject_type`);

查询的其余部分看起来非常简单,因为您只限制为 55 行,并且没有强制引擎读取大量行的排序操作。

可能有问题的是引擎将子查询视为“依赖子查询”。如果可能的话,改写此查询以避免任何相关性可能会很有用。

【讨论】:

嘿伙计,非常感谢您回答我的问题!我已经添加了索引,它解决了初始查询的问题(执行时间现在是 0.5 毫秒而不是 30 秒),但我注意到我需要的真正查询在 WHERE 子句中有更多条件,一旦我添加了这些条件,执行时间增加到 30 秒,你能不能看看,让我知道我错过了什么?我编辑了我的问题以添加我正在使用的最后一个 WHERE 子句。 @EduardoAlves 我仍然看到相同的查询。你改了吗?现在有了索引,可以更新执行计划了吗? 我已经编辑了问题以同时显示新旧查询,以便更容易阅读和发现差异。我还发布了新的 EXPLAIN 输出。【参考方案2】:

对于第一个变体,请尝试更改:

              AND  NOT (a.`id` IN (
                        SELECT  U0.`subject_id` AS Col1
                            FROM  `growthservice_log` U0
                            WHERE  (U0.`account_id` = 6570863662218543109
                               AND  U0.`action` = 'LIKE'
                               AND  U0.`subject_type` = 'USER') ) ) 

     AND NOT EXISTS ( SELECT 1
           FROM growthservice_log AS U0
           WHERE  U0.`account_id` = 6570863662218543109
             AND  U0.`action` = 'LIKE'
             AND  U0.`subject_type` = 'USER'
             AND  U0.subject_cell = a.`id` )

并且拥有

U0:  (subject_type, action, account_id, subject_id)  -- The order does not matter
r:   (source_id, account_id)  -- In this order

我已经在 where 子句中过滤的所有字段上都有单列索引。

一个常见的新手错误。 “复合索引”(如上)有时比单列索引好得多。

没有ORDER BYLIMIT 可以随意给你随机行。

第二次查询...

这是非常低效的:

NOT sad.username LIKE BINARY '%user%'

目的是什么?

对于a,可以使用以下之一:

INDEX(follower_count),
INDEX(following_count)

要尝试的另一件事是将NOT IN(或NOT EXISTS)从WHERE 子句移至HAVING 子句。如果这不能像删除它那样提高速度,那么我们可以讨论一个子查询:

SELECT *
    FROM (( SELECT ... all of the query but the NOT IN ...))
    WHERE NOT EXISTS ( ... )
    ORDER BY ...;   -- Yes, repeated

警告:您可能只得到 54 行,但这可能“足够好”并且值得加速?

【讨论】:

以上是关于查询很慢,我可以做些啥来改进?的主要内容,如果未能解决你的问题,请参考以下文章

我可以做些啥来优化适用于 Postgres 和 MySQL 的 SQL 查询?

我的固定背景使网站滚动很慢,我能做些啥来改善它?

查询在数据库中花费了更多时间,尽管在连接条件中使用了索引列,那么我们可以在代码中做些啥来优化

我的代码有点脏,我想不出改进它的方法,我能做些啥来获得更紧凑和更好的解决方案?

我可以做些啥来提高 Lua 程序的性能?

我可以做些啥来加快 S3 上传/更新?