Postgres:强制分析器使用位图扫描而不是索引扫描
Posted
技术标签:
【中文标题】Postgres:强制分析器使用位图扫描而不是索引扫描【英文标题】:Postgres : forcing analyzer to use bitmap scan instead of index scan 【发布时间】:2018-03-02 06:41:02 【问题描述】:在这种情况下,表格有很多文本列,我需要对每一列执行(ilike)搜索。 我开始为这些列中的每一个创建一个 gin 索引(来自扩展 pg_trgm 的 gin_trgm_ops)以加快搜索速度,而且确实收获很大。 这些是常规索引的补充(一些查询使用简单的相等条件)。
但是,测试表明,如果太多条件被 OR 绑定在一起,Postgres 将无法选择正确的计划。
用正确的计划查询:
db=# explain analyze select topics_id from t_topics_header a
join t_topics_group b using (topics_group_id)
where a.subject ilike '%aaa%' or a.contents ilike '%aaa%' or a.topics_group_id = 7 and
( ( a.ext_col_01 ilike '%aaa%') or ( a.ext_col_50 ilike '%aaa%') or
( a.ext_col_57 ilike '%aaa%') or ( a.ext_col_56 ilike '%aaa%') or
( a.ext_col_63 ilike '%aaa%') or ( a.ext_col_64 ilike '%aaa%') or
( a.ext_col_54 ilike '%aaa%') or ( a.ext_col_69 ilike '%aaa%') or
( a.ext_col_31 ilike '%aaa%') or ( a.ext_col_32 ilike '%aaa%') or
( a.ext_col_41 ilike '%aaa%') or ( a.ext_col_91 ilike '%aaa%') or
( a.ext_col_42 ilike '%aaa%') or ( a.ext_col_92 ilike '%aaa%') or
( a.ext_col_43 ilike '%aaa%') or ( a.ext_col_93 ilike '%aaa%') or
( a.ext_col_44 ilike '%aaa%') or ( a.ext_col_94 ilike '%aaa%') or
( a.ext_col_45 ilike '%aaa%') or ( a.ext_col_95 ilike '%aaa%')
) order by topics_id desc OFFSET 0 LIMIT 10;
查询计划:
Limit (cost=2739.84..2739.87 rows=10 width=4) (actual time=0.437..0.437 rows=4 loops=1)
-> Sort (cost=2739.84..2741.03 rows=473 width=4) (actual time=0.436..0.436 rows=4 loops=1)
Sort Key: a.topics_id DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=708.03..2729.62 rows=473 width=4) (actual time=0.416..0.426 rows=4 loops=1)
Hash Cond: (a.topics_group_id = b.topics_group_id)
-> Bitmap Heap Scan on t_topics_header a (cost=706.72..2721.80 rows=473 width=8) (actual time=0.381..0.391 rows=4 loops=1)
Recheck Cond: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~~* '%aaa%'::text) OR
(ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41 ~~* '%aaa%'::text) O
R (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_94 ~~* '%aaa%'::text)
OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text)))
Filter: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((topics_group_id = 7) AND ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~
~* '%aaa%'::text) OR (ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41
~~* '%aaa%'::text) OR (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_
94 ~~* '%aaa%'::text) OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text))))
Heap Blocks: exact=4
-> BitmapOr (cost=706.72..706.72 rows=516 width=0) (actual time=0.375..0.375 rows=0 loops=1)
-> Bitmap Index Scan on t_topics_header_idx_subject_gin_trgm (cost=0.00..12.91 rows=122 width=0) (actual time=0.066..0.066 rows=4 loops=1)
Index Cond: (subject ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_contents_gin_trgm (cost=0.00..8.00 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (contents ~~* '%aaa%'::text)
-> BitmapOr (cost=685.32..685.32 rows=394 width=0) (actual time=0.292..0.292 rows=0 loops=1)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_01_gin_trgm (cost=0.00..8.85 rows=113 width=0) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: (ext_col_01 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_50_gin_trgm (cost=0.00..12.06 rows=8 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_50 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_57_gin_trgm (cost=0.00..12.12 rows=16 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_57 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_56_gin_trgm (cost=0.00..80.00 rows=1 width=0) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (ext_col_56 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_63_gin_trgm (cost=0.00..84.05 rows=7 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_63 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_64_gin_trgm (cost=0.00..72.04 rows=6 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_64 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_54_gin_trgm (cost=0.00..68.00 rows=1 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_54 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_69_gin_trgm (cost=0.00..12.02 rows=3 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_69 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_31_gin_trgm (cost=0.00..12.04 rows=6 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (ext_col_31 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_32_gin_trgm (cost=0.00..16.76 rows=101 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (ext_col_32 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_41_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (ext_col_41 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_91_gin_trgm (cost=0.00..20.58 rows=78 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (ext_col_91 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_42_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: (ext_col_42 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_92_gin_trgm (cost=0.00..12.19 rows=25 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (ext_col_92 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_43_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_43 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_93_gin_trgm (cost=0.00..8.20 rows=26 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (ext_col_93 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_44_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_44 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_94_gin_trgm (cost=0.00..8.04 rows=5 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (ext_col_94 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_45_gin_trgm (cost=0.00..36.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_45 ~~* '%aaa%'::text)
-> Bitmap Index Scan on t_topics_header_idx_ext_col_95_gin_trgm (cost=0.00..76.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (ext_col_95 ~~* '%aaa%'::text)
-> Hash (cost=1.14..1.14 rows=14 width=4) (actual time=0.027..0.027 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t_topics_group b (cost=0.00..1.14 rows=14 width=4) (actual time=0.013..0.019 rows=14 loops=1)
Planning time: 3.767 ms
Execution time: 0.687 ms
有错误计划的查询:
db=# explain analyze select topics_id from t_topics_header a
join t_topics_group b using (topics_group_id)
where a.subject ilike '%aaa%' or a.contents ilike '%aaa%' or a.topics_group_id = 7 and
( ( a.ext_col_01 ilike '%aaa%') or ( a.ext_col_50 ilike '%aaa%') or
( a.ext_col_57 ilike '%aaa%') or ( a.ext_col_56 ilike '%aaa%') or
( a.ext_col_63 ilike '%aaa%') or ( a.ext_col_64 ilike '%aaa%') or
( a.ext_col_54 ilike '%aaa%') or ( a.ext_col_69 ilike '%aaa%') or
( a.ext_col_31 ilike '%aaa%') or ( a.ext_col_32 ilike '%aaa%') or
( a.ext_col_41 ilike '%aaa%') or ( a.ext_col_91 ilike '%aaa%') or
( a.ext_col_42 ilike '%aaa%') or ( a.ext_col_92 ilike '%aaa%') or
( a.ext_col_43 ilike '%aaa%') or ( a.ext_col_93 ilike '%aaa%') or
( a.ext_col_44 ilike '%aaa%') or ( a.ext_col_94 ilike '%aaa%') or
( a.ext_col_45 ilike '%aaa%') or ( a.ext_col_95 ilike '%aaa%') or
( a.ext_col_70 ilike '%aaa%')
) order by topics_id desc OFFSET 0 LIMIT 10;
请注意,唯一的区别是添加了( a.ext_col_70 ilike '%aaa%')
。
而新的查询计划变成:
Limit (cost=0.43..2036.67 rows=10 width=4) (actual time=626.343..2784.151 rows=4 loops=1)
-> Nested Loop (cost=0.43..418447.61 rows=2055 width=4) (actual time=626.341..2784.147 rows=4 loops=1)
Join Filter: (a.topics_group_id = b.topics_group_id)
Rows Removed by Join Filter: 52
-> Index Scan Backward using t_report_master_pkey on t_topics_header a (cost=0.43..418014.89 rows=2055 width=8) (actual time=626.328..2784.119 rows=4 loops=1)
Filter: ((subject ~~* '%aaa%'::text) OR (contents ~~* '%aaa%'::text) OR ((topics_group_id = 7) AND ((ext_col_01 ~~* '%aaa%'::text) OR (ext_col_50 ~~* '%aaa%'::text) OR (ext_col_57 ~~* '%aaa%'::text) OR (ext_col_56 ~~* '%a
aa%'::text) OR (ext_col_63 ~~* '%aaa%'::text) OR (ext_col_64 ~~* '%aaa%'::text) OR (ext_col_54 ~~* '%aaa%'::text) OR (ext_col_69 ~~* '%aaa%'::text) OR (ext_col_31 ~~* '%aaa%'::text) OR (ext_col_32 ~~* '%aaa%'::text) OR (ext_col_41 ~~* '
%aaa%'::text) OR (ext_col_91 ~~* '%aaa%'::text) OR (ext_col_42 ~~* '%aaa%'::text) OR (ext_col_92 ~~* '%aaa%'::text) OR (ext_col_43 ~~* '%aaa%'::text) OR (ext_col_93 ~~* '%aaa%'::text) OR (ext_col_44 ~~* '%aaa%'::text) OR (ext_col_94 ~~*
'%aaa%'::text) OR (ext_col_45 ~~* '%aaa%'::text) OR (ext_col_95 ~~* '%aaa%'::text) OR (ext_col_70 ~~* '%aaa%'::text))))
Rows Removed by Filter: 1237807
-> Materialize (cost=0.00..1.21 rows=14 width=4) (actual time=0.002..0.003 rows=14 loops=4)
-> Seq Scan on t_topics_group b (cost=0.00..1.14 rows=14 width=4) (actual time=0.004..0.007 rows=14 loops=1)
Planning time: 3.704 ms
Execution time: 2784.262 ms
我的 ext_cols 数量大约是显示数量的两倍(我将它们减少到计划更改的最小断点)。 除了其中一些的常规索引(双重检查)之外,我在每个 ext_col 上都有 gin pg_trgm 索引。 我还在两张桌子上运行 VACUUM ANALYZE。
数据库版本为 PostgreSQL 9.6.6
那么有没有办法向分析器提示使用位图索引?有什么想法吗?
编辑:显然字符串的长度很重要。对于 3 个或更少的字母,分析器会选择(错误的)索引扫描。 4个或更多字符串,选择好的(位图索引)方案。
【问题讨论】:
您是否考虑过规范化您的设计,以便您只需在单个列上应用ilike
条件,然后将其连接到主表?
我意识到设计可能会更好,如果我可以从头开始,我会使用一个单独的 json 列来完成所有这些。不幸的是,这是您必须修复旧东西(十多年前制造)的时候之一,因为很多东西都是建立在它之上的,所以您无法真正改变:/
我不是在谈论单个 JSON 列。我说的是正确的一对多关系
您能详细说明一下吗?我不明白你的想法
将具有 100 列的单行移动到 100 行中,并在不同的表中具有单列 - 一对多的关系
【参考方案1】:
如果您坚持这种设计,您可以在所有列的连接值上创建一个索引,并在您的 where 条件中使用该表达式。
类似:
create function query_columns(p_rec t_topics_header)
returns text
as
$$
select concat_ws(' ', p_rec.ext_col_01, p_rec.ext_col_50, p_rec.ext_col_57, p_rec.ext_col_56);
$$
language sql
immutable;
将concat_ws()
函数中的列数调整为您要查询的列数。我不确定 GIN 索引的索引表达式的最大长度是多少。也许您的许多列将超过该限制。
然后您可以在该函数上创建索引:
create index on t_topics_header using gin ( (query_columns(x)) public.gin_trgm_ops);
那么下面应该使用索引:
select topics_id
from t_topics_header a
join t_topics_group b using (topics_group_id)
where (a.subject ilike '%aaa%' or a.contents ilike '%aaa%'
or a.topics_group_id = 7 )
and query_columns(a) ilike ='%aaa%';
【讨论】:
幸运的是,我的一百列没有超过 gin 索引的最大长度。它工作得很好,谢谢你的好答案! (您查询中的小错字,ilike 后没有 =)以上是关于Postgres:强制分析器使用位图扫描而不是索引扫描的主要内容,如果未能解决你的问题,请参考以下文章