为啥这个查询在 JSONB Gin 索引字段上花费了这么长时间?我可以修复它以便它实际使用索引吗?
Posted
技术标签:
【中文标题】为啥这个查询在 JSONB Gin 索引字段上花费了这么长时间?我可以修复它以便它实际使用索引吗?【英文标题】:Why is this query taking so long on JSONB Gin index field? Can I fix it so it actually uses the index?为什么这个查询在 JSONB Gin 索引字段上花费了这么长时间?我可以修复它以便它实际使用索引吗? 【发布时间】:2016-10-31 23:19:18 【问题描述】:最近我们将其中一个表的格式从使用列中的单个条目更改为具有 ["key1","key2","key3"] 等格式的 JSONB 列。虽然我们构建了 GIN JSONB 字段上的索引,我们在其上使用的查询非常慢(在解释计划中的范围为 50 分钟)。我试图找到一种优化查询和正确利用索引的方法。我粘贴了下面的查询以及它的解释计划。索引字段为 visit.visitor、launch.campaign_key、launch.launch_key、visit.store_key 和visits.stop JSONB 字段作为 GIN 索引。我们正在使用 PostgreSQL 9.4
explain (analyze on) select count(subselect.visitors) as visitors,
subselect.campaign as campaign
from (
select distinct visit.visitor as visitors,
launch.campaign_key as campaign
from visit
join launch on (jsonb_exists(visit.stops, launch.launch_key)) where
visit.store_key = 'ahBzfmdlYXJsYXVuY2gtaHVi'
and launch.state = 'PRODUCTION') as subselect group by subselect.campaign
解释结果:
HashAggregate (cost=63873548.47..63873550.47 rows=200 width=88) (actual time=248617.348..248617.365 rows=58 loops=1)
Group Key: launch.campaign_key
-> HashAggregate (cost=63519670.22..63661221.52 rows=14155130 width=88) (actual time=248587.320..248616.558 rows=1938 loops=1)
Group Key: visit.visitor, launch.campaign_key
-> HashAggregate (cost=63307343.27..63448894.57 rows=14155130 width=88) (actual time=248553.278..248584.868 rows=1938 loops=1)
Group Key: visit.visitor, launch.campaign_key
-> Nested Loop (cost=4903.09..56997885.96 rows=1261891461 width=88) (actual time=180648.410..248550.249 rows=2085 loops=1)
Join Filter: jsonb_exists(visit.stops, (launch.launch_key)::text)
Rows Removed by Join Filter: 624114512
-> Bitmap Heap Scan on launch (cost=3213.19..126084.38 rows=169389 width=123) (actual time=32.082..317.561 rows=166121 loops=1)
Recheck Cond: ((state)::text = 'PRODUCTION'::text)
Heap Blocks: exact=56635
-> Bitmap Index Scan on launch_state_idx (cost=0.00..3170.85 rows=169389 width=0) (actual time=21.172..21.172 rows=166121 loops=1)
Index Cond: ((state)::text = 'PRODUCTION'::text)
-> Materialize (cost=1689.89..86736.04 rows=22349 width=117) (actual time=0.000..0.487 rows=3757 loops=166121)
-> Bitmap Heap Scan on visit (cost=1689.89..86624.29 rows=22349 width=117) (actual time=1.324..14.381 rows=3757 loops=1)
Recheck Cond: ((store_key)::text = 'ahBzfmdlYXJsYXVuY2gtaHVicg8LEgVTdG9yZRinzbKcDQw'::text)
Heap Blocks: exact=3672
-> Bitmap Index Scan on visit_store_key_idx (cost=0.00..1684.31 rows=22349 width=0) (actual time=0.780..0.780 rows=3757 loops=1)
Index Cond: ((store_key)::text = 'ahBzfmdlYXJsYXVuY2gtaHVicg8LEgVTdG9yZRinzbKcDQw'::text)
Planning time: 0.232 ms
Execution time: 248708.088 ms
我应该提到停止索引是建立的 使用 GIN 访问时创建索引(停止)
我想知道是否切换到构建它 CREATE INDEX ON visit USING GIN (stops->'value')
会解决问题吗?
【问题讨论】:
见PostgreSQL operator uses index but underlying function does not 【参考方案1】:包装函数jsonb_exists()
阻止使用visits.stops
上的gin 索引。而不是
from visit
join launch on (jsonb_exists(visit.stops, launch.launch_key))
试试
from visit
join launch on visit.stops ? launch.launch_key::text
【讨论】:
以上是关于为啥这个查询在 JSONB Gin 索引字段上花费了这么长时间?我可以修复它以便它实际使用索引吗?的主要内容,如果未能解决你的问题,请参考以下文章
PostgreSQL 未对 JSONB 上的 GIN 索引使用索引扫描