当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间
Posted
技术标签:
【中文标题】当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间【英文标题】:When ORDER BY is performed on values aggregated by COUNT, it takes time to issue the query 【发布时间】:2021-12-31 10:08:24 【问题描述】:它尝试按照与特定视频相同的标签数量的顺序检索视频。
以下查询大约需要 800 毫秒,但似乎使用了索引。 如果从 SQL 查询中删除 COUNT、GROUP BY 和 ORDER BY,它会运行得非常快。(1-5ms)
在这种情况下,单独改进 SQL 查询不会加快进程,并且 我需要使用 MATERIALIZED VIEW 吗?
SELECT "videos_video"."id",
"videos_video"."title",
"videos_video"."thumbnail_url",
"videos_video"."preview_url",
"videos_video"."embed_url",
"videos_video"."duration",
"videos_video"."views",
"videos_video"."is_public",
"videos_video"."published_at",
"videos_video"."created_at",
"videos_video"."updated_at",
COUNT("videos_video"."id") AS "n"
FROM "videos_video"
INNER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
WHERE ("videos_video_tags"."tag_id" IN
(SELECT U0."id"
FROM "videos_tag" U0
INNER JOIN "videos_video_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'))
GROUP BY "videos_video"."id"
ORDER BY "n" DESC
LIMIT 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1040.69..1040.74 rows=20 width=24) (actual time=738.648..738.654 rows=20 loops=1)
-> Sort (cost=1040.69..1044.29 rows=1441 width=24) (actual time=738.646..738.650 rows=20 loops=1)
Sort Key: (count(videos_video.id)) DESC
Sort Method: top-N heapsort Memory: 27kB
-> HashAggregate (cost=987.93..1002.34 rows=1441 width=24) (actual time=671.006..714.322 rows=188818 loops=1)
Group Key: videos_video.id
Batches: 1 Memory Usage: 28689kB
-> Nested Loop (cost=35.20..980.73 rows=1441 width=16) (actual time=0.341..559.034 rows=240293 loops=1)
-> Nested Loop (cost=34.78..340.88 rows=1441 width=16) (actual time=0.278..92.806 rows=240293 loops=1)
-> HashAggregate (cost=34.35..34.41 rows=6 width=32) (actual time=0.188..0.200 rows=4 loops=1)
Group Key: u0.id
Batches: 1 Memory Usage: 24kB
-> Nested Loop (cost=0.71..34.33 rows=6 width=32) (actual time=0.161..0.185 rows=4 loops=1)
-> Index Only Scan using videos_video_tags_video_id_tag_id_f8d6ba70_uniq on videos_video_tags u1 (cost=0.43..4.53 rows=6 width=16) (actual time=0.039..0.040 rows=4 loops=1)
Index Cond: (video_id = '748b1814-f311-48da-a1f5-6bf8fe229c7f'::uuid)
Heap Fetches: 0
-> Index Only Scan using videos_tag_pkey on videos_tag u0 (cost=0.28..4.97 rows=1 width=16) (actual time=0.035..0.035 rows=1 loops=4)
Index Cond: (id = u1.tag_id)
Heap Fetches: 0
-> Index Scan using videos_video_tags_tag_id_2673cfc8 on videos_video_tags (cost=0.43..35.90 rows=1518 width=32) (actual time=0.029..16.728 rows=60073 loops=4)
Index Cond: (tag_id = u0.id)
-> Index Only Scan using videos_video_pkey on videos_video (cost=0.42..0.44 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=240293)
Index Cond: (id = videos_video_tags.video_id)
Heap Fetches: 46
Planning Time: 1.980 ms
Execution Time: 739.446 ms
(26 rows)
Time: 742.145 ms
--------- Edouard 回答的查询执行计划的结果。 ----------
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=30043.90..30212.53 rows=20 width=746) (actual time=239.142..239.219 rows=20 loops=1)
-> Limit (cost=30043.48..30043.53 rows=20 width=24) (actual time=239.089..239.093 rows=20 loops=1)
-> Sort (cost=30043.48..30607.15 rows=225467 width=24) (actual time=239.087..239.090 rows=20 loops=1)
Sort Key: (count(*)) DESC
Sort Method: top-N heapsort Memory: 26kB
-> HashAggregate (cost=21789.21..24043.88 rows=225467 width=24) (actual time=185.710..219.211 rows=188818 loops=1)
Group Key: vt.video_id
Batches: 1 Memory Usage: 22545kB
-> Nested Loop (cost=20.62..20187.24 rows=320395 width=16) (actual time=4.975..106.839 rows=240293 loops=1)
-> Index Only Scan using videos_video_tags_video_id_tag_id_f8d6ba70_uniq on videos_video_tags vvt (cost=0.43..4.53 rows=6 width=16) (actual time=0.033..0.043 rows=4 loops=1)
Index Cond: (video_id = '748b1814-f311-48da-a1f5-6bf8fe229c7f'::uuid)
Heap Fetches: 0
-> Bitmap Heap Scan on videos_video_tags vt (cost=20.19..3348.60 rows=1518 width=32) (actual time=4.311..20.663 rows=60073 loops=4)
Recheck Cond: (tag_id = vvt.tag_id)
Heap Blocks: exact=34757
-> Bitmap Index Scan on videos_video_tags_tag_id_2673cfc8 (cost=0.00..19.81 rows=1518 width=0) (actual time=3.017..3.017 rows=60073 loops=4)
Index Cond: (tag_id = vvt.tag_id)
-> Index Scan using videos_video_pkey on videos_video v (cost=0.42..8.44 rows=1 width=738) (actual time=0.005..0.005 rows=1 loops=20)
Index Cond: (id = vt.video_id)
Planning Time: 0.854 ms
Execution Time: 241.392 ms
(21 rows)
Time: 242.909 ms
【问题讨论】:
1) 看来您可以删除videos_video
和videos_tag
,因为您只需要主键; 2) 之后,你可以尝试内部加入两个videos_video_tags
。
很抱歉,它确实是这样的 SELECT。你如何处理这个案子?已编辑的问题。
(a) 您可以正确执行查询吗?因为您选择了“videos_video”表的许多列,而您只对“id”上的行进行分组,这应该会导致错误(b)我确实认为您可以大大简化查询,并对性能产生潜在的重大影响( c)我不确定物化视图是否与您的情况相关,主要是因为更新问题。
@Edouard 如果是主键,则按 id 分组就足够了。
没有发生特定错误。什么是要简化的查询?
【参考方案1】:
下面是一些简化查询的想法。然后EXPLAIN ANALYSE
将确认对查询性能的潜在影响。
从子查询开始:
SELECT U0."id"
FROM "videos_tag" U0
INNER JOIN "videos_video_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
根据JOIN
子句:U0."id" = U1."tag_id"
以便SELECT U0."id"
可以替换为SELECT U1."tag_id"
。
在这种情况下,表"videos_tag" U0
不再在子查询中使用,可以简化为:
SELECT U1."tag_id"
FROM "videos_video_tags" U1
WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
而主查询的WHERE
子句变为:
WHERE "videos_video_tags"."tag_id" IN
( SELECT U1."tag_id"
FROM "videos_video_tags" U1
WHERE U1."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
)
可以转换为表"videos_video_tags"
上的自连接,添加到主查询的FROM
子句中:
FROM "videos_video" AS v
INNER JOIN "videos_video_tags" AS vt
ON v."id" = vt."video_id"
INNER JOIN "videos_video_tags" AS vvt
ON vvt."tag_id" = vt."tag_id"
WHERE vvt."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
最后,GROUP BY "videos_video"."id"
子句可以根据两个表之间的JOIN
子句替换为GROUP BY "videos_video_tags"."video_id"
,并且与ORDER BY
子句和LIMIT
子句关联的这个新的GROUP BY
子句可以适用于仅涉及表 "videos_video_tags"
的子查询,在加入表 "videos_video"
之前:
SELECT v."id",
v."title",
v."thumbnail_url",
v."preview_url",
v."embed_url",
v."duration",
v."views",
v."is_public",
v."published_at",
v."created_at",
v."updated_at",
w."n"
FROM "videos_video" AS v
INNER JOIN
( SELECT vt."video_id"
, count(*) AS "n"
FROM "videos_video_tags" AS vt
INNER JOIN "videos_video_tags" AS vvt
ON vvt."tag_id" = vt."tag_id"
WHERE vvt."video_id" = '748b1814-f311-48da-a1f5-6bf8fe229c7f'
GROUP BY vt."video_id"
ORDER BY "n" DESC
LIMIT 20
) AS w
ON v."id" = w."video_id"
【讨论】:
谢谢!您的查询执行时间有所改善!但是,我想把它加快到两位数左右,难道不能让它更快吗?我还把这个查询的执行计划的结果添加到了问题中。 @Jvn - 执行时间除以 3 还不错 :-) 新的执行计划很有趣,但我不确定有很多地方可以提高性能- 也许一个关于索引的想法,另一个关于配置参数的想法。您能否分享与“videos_video_tags”表相关的索引的定义?谢谢。以上是关于当对 COUNT 聚合的值执行 ORDER BY 时,发出查询需要时间的主要内容,如果未能解决你的问题,请参考以下文章
聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?
聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?
聚合函数种类和功能有哪些和分组查询 group by 与 order by的区别?