当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间

Posted

技术标签:

【中文标题】当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间【英文标题】:When ORDER BY is performed on values aggregated by COUNT, it takes time to issue the query 【发布时间】:2021-12-31 10:08:24 【问题描述】:

它尝试按照与特定视频相同的标签数量的顺序检索视频。

以下查询大约需要 800 毫秒,但似乎使用了索引。 如果从 SQL 查询中删除 COUNT、GROUP BY 和 ORDER BY,它会运行得非常快。(1-5ms)

在这种情况下,单独改进 SQL 查询不会加快进程,并且 我需要使用 MATERIALIZED VIEW 吗?

SELECT "videos_video"."id",
       "videos_video"."title",
       "videos_video"."thumbnail_url",
       "videos_video"."preview_url",
       "videos_video"."embed_url",
       "videos_video"."duration",
       "videos_video"."views",
       "videos_video"."is_public",
       "videos_video"."published_at",
       "videos_video"."created_at",
       "videos_video"."updated_at",
       COUNT("videos_video"."id") AS "n"
FROM "videos_video"
INNER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
WHERE ("videos_video_tags"."tag_id" IN
       (SELECT U0."id"
        FROM "videos_tag" U0
        INNER JOIN "videos_video_tags" U1 ON (U0."id" = U1."tag_id")
        WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'))
GROUP BY "videos_video"."id"
ORDER BY "n" DESC
LIMIT 20;
                                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1040.69..1040.74 rows=20 width=24) (actual time=738.648..738.654 rows=20 loops=1)
   ->  Sort  (cost=1040.69..1044.29 rows=1441 width=24) (actual time=738.646..738.650 rows=20 loops=1)
         Sort Key: (count(videos_video.id)) DESC
         Sort Method: top-N heapsort  Memory: 27kB
         ->  HashAggregate  (cost=987.93..1002.34 rows=1441 width=24) (actual time=671.006..714.322 rows=188818 loops=1)
               Group Key: videos_video.id
               Batches: 1  Memory Usage: 28689kB
               ->  Nested Loop  (cost=35.20..980.73 rows=1441 width=16) (actual time=0.341..559.034 rows=240293 loops=1)
                     ->  Nested Loop  (cost=34.78..340.88 rows=1441 width=16) (actual time=0.278..92.806 rows=240293 loops=1)
                           ->  HashAggregate  (cost=34.35..34.41 rows=6 width=32) (actual time=0.188..0.200 rows=4 loops=1)
                                 Group Key: u0.id
                                 Batches: 1  Memory Usage: 24kB
                                 ->  Nested Loop  (cost=0.71..34.33 rows=6 width=32) (actual time=0.161..0.185 rows=4 loops=1)
                                       ->  Index Only Scan using videos_video_tags_video_id_tag_id_f8d6ba70_uniq on videos_video_tags u1  (cost=0.43..4.53 rows=6 width=16) (actual time=0.039..0.040 rows=4 loops=1)
                                             Index Cond: (video_id = '748b1814-f311-48da-a1f5-6bf8fe229c7f'::uuid)
                                             Heap Fetches: 0
                                       ->  Index Only Scan using videos_tag_pkey on videos_tag u0  (cost=0.28..4.97 rows=1 width=16) (actual time=0.035..0.035 rows=1 loops=4)
                                             Index Cond: (id = u1.tag_id)
                                             Heap Fetches: 0
                           ->  Index Scan using videos_video_tags_tag_id_2673cfc8 on videos_video_tags  (cost=0.43..35.90 rows=1518 width=32) (actual time=0.029..16.728 rows=60073 loops=4)
                                 Index Cond: (tag_id = u0.id)
                     ->  Index Only Scan using videos_video_pkey on videos_video  (cost=0.42..0.44 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=240293)
                           Index Cond: (id = videos_video_tags.video_id)
                           Heap Fetches: 46
 Planning Time: 1.980 ms
 Execution Time: 739.446 ms
(26 rows)

Time: 742.145 ms

--------- Edouard 回答的查询执行计划的结果。 ----------

                                                                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=30043.90..30212.53 rows=20 width=746) (actual time=239.142..239.219 rows=20 loops=1)
   ->  Limit  (cost=30043.48..30043.53 rows=20 width=24) (actual time=239.089..239.093 rows=20 loops=1)
         ->  Sort  (cost=30043.48..30607.15 rows=225467 width=24) (actual time=239.087..239.090 rows=20 loops=1)
               Sort Key: (count(*)) DESC
               Sort Method: top-N heapsort  Memory: 26kB
               ->  HashAggregate  (cost=21789.21..24043.88 rows=225467 width=24) (actual time=185.710..219.211 rows=188818 loops=1)
                     Group Key: vt.video_id
                     Batches: 1  Memory Usage: 22545kB
                     ->  Nested Loop  (cost=20.62..20187.24 rows=320395 width=16) (actual time=4.975..106.839 rows=240293 loops=1)
                           ->  Index Only Scan using videos_video_tags_video_id_tag_id_f8d6ba70_uniq on videos_video_tags vvt  (cost=0.43..4.53 rows=6 width=16) (actual time=0.033..0.043 rows=4 loops=1)
                                 Index Cond: (video_id = '748b1814-f311-48da-a1f5-6bf8fe229c7f'::uuid)
                                 Heap Fetches: 0
                           ->  Bitmap Heap Scan on videos_video_tags vt  (cost=20.19..3348.60 rows=1518 width=32) (actual time=4.311..20.663 rows=60073 loops=4)
                                 Recheck Cond: (tag_id = vvt.tag_id)
                                 Heap Blocks: exact=34757
                                 ->  Bitmap Index Scan on videos_video_tags_tag_id_2673cfc8  (cost=0.00..19.81 rows=1518 width=0) (actual time=3.017..3.017 rows=60073 loops=4)
                                       Index Cond: (tag_id = vvt.tag_id)
   ->  Index Scan using videos_video_pkey on videos_video v  (cost=0.42..8.44 rows=1 width=738) (actual time=0.005..0.005 rows=1 loops=20)
         Index Cond: (id = vt.video_id)
 Planning Time: 0.854 ms
 Execution Time: 241.392 ms
(21 rows)

Time: 242.909 ms

【问题讨论】:

1) 看来您可以删除videos_videovideos_tag,因为您只需要主键; 2) 之后,你可以尝试内部加入两个videos_video_tags 很抱歉,它确实是这样的 SELECT。你如何处理这个案子?已编辑的问题。 (a) 您可以正确执行查询吗?因为您选择了“videos_video”表的许多列,而您只对“id”上的行进行分组,这应该会导致错误(b)我确实认为您可以大大简化查询,并对性能产生潜在的重大影响( c)我不确定物化视图是否与您的情况相关,主要是因为更新问题。 @Edouard 如果是主键,则按 id 分组就足够了。 没有发生特定错误。什么是要简化的查询? 【参考方案1】:

下面是一些简化查询的想法。然后EXPLAIN ANALYSE 将确认对查询性能的潜在影响。

从子查询开始:

SELECT U0."id"
  FROM "videos_tag" U0
 INNER JOIN "videos_video_tags" U1 ON (U0."id" = U1."tag_id")
 WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'

根据JOIN 子句:U0."id" = U1."tag_id" 以便SELECT U0."id" 可以替换为SELECT U1."tag_id"

在这种情况下,表"videos_tag" U0 不再在子查询中使用,可以简化为:

SELECT U1."tag_id"
  FROM "videos_video_tags" U1
 WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'

而主查询的WHERE 子句变为:

WHERE "videos_video_tags"."tag_id" IN
      ( SELECT U1."tag_id"
          FROM "videos_video_tags" U1
         WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
      )

可以转换为表"videos_video_tags" 上的自连接,添加到主查询的FROM 子句中:

 FROM "videos_video" AS v
INNER JOIN "videos_video_tags" AS vt
   ON v."id" = vt."video_id"
INNER JOIN "videos_video_tags" AS vvt
   ON vvt."tag_id" = vt."tag_id"
WHERE vvt."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'

最后,GROUP BY "videos_video"."id" 子句可以根据两个表之间的JOIN 子句替换为GROUP BY "videos_video_tags"."video_id",并且与ORDER BY 子句和LIMIT 子句关联的这个新的GROUP BY 子句可以适用于仅涉及表 "videos_video_tags" 的子查询,在加入表 "videos_video" 之前:

SELECT v."id",
       v."title",
       v."thumbnail_url",
       v."preview_url",
       v."embed_url",
       v."duration",
       v."views",
       v."is_public",
       v."published_at",
       v."created_at",
       v."updated_at",
       w."n"
 FROM "videos_video" AS v
INNER JOIN
    ( SELECT vt."video_id"
           , count(*) AS "n"
        FROM "videos_video_tags" AS vt
       INNER JOIN "videos_video_tags" AS vvt
          ON vvt."tag_id" = vt."tag_id"
       WHERE vvt."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
       GROUP BY vt."video_id"
       ORDER BY "n" DESC
       LIMIT 20
    ) AS w
   ON v."id" = w."video_id"

【讨论】:

谢谢!您的查询执行时间有所改善!但是,我想把它加快到两位数左右,难道不能让它更快吗?我还把这个查询的执行计划的结果添加到了问题中。 @Jvn - 执行时间除以 3 还不错 :-) 新的执行计划很有趣,但我不确定有很多地方可以提高性能- 也许一个关于索引的想法,另一个关于配置参数的想法。您能否分享与“videos_video_tags”表相关的索引的定义?谢谢。

以上是关于当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间的主要内容,如果未能解决你的问题,请参考以下文章

order_by_group_by_having的用法区别

聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?

聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?

聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?

postgresql----排序ORDER BY,分组GROUP BY,分页OFFSET&&LIMIT

sql中order by和group by的区别