结合关系查询提高 Postgres jsonb 查询的性能
Posted
技术标签:
【中文标题】结合关系查询提高 Postgres jsonb 查询的性能【英文标题】:Improving performance of Postgres jsonb queries combined with relational queries 【发布时间】:2021-01-21 02:28:52 【问题描述】:我有一个查询常规 postgres 表和一个 jsonb 列的 SELECT。 当我选择整个 jsonb 列时,查询速度很快(574 毫秒)。 但是,当我改为选择同一 jsonb 列的***路径时,查询速度会减慢 6 倍(3241 毫秒)。我的最终查询需要从这些*** jsonb 路径中的 4 个访问字符串数组值,这会将查询速度减慢到 5 秒。
cfiles
表中有大约 50K 条记录,而 jsonb 列 cfiles.property_values
的结构如下:
"Sample Names":["up to 200 short strings..."],
"Project IDs": ["up to 10 short strings..."],
"Run IDs": ["up to 10 short strings..."],
"Data Type": ["up to 10 short strings..."]
在this 回答之后,我尝试在下面添加一个 GIN 索引,但效果很小(在下面的 cmets 中运行时间),我假设因为我的查询不是使用 @>
运算符的纯 json 并且与一个关系查询。
CREATE INDEX ON cfiles USING GIN (property_values jsonb_path_ops);
我对获取整个列与查询甚至只是*** json 键的巨大差异感到惊讶。在这一点上,将整个 jsonb 列作为字符串获取并将其拆分为逗号并切碎引号似乎更高效,这是我希望避免的黑客攻击。
我的目标是 更新: 使用 PostgreSQL 版本 12
SELECT
-- FAST OPTION: getting all of json: no GIN=579ms; with GIN=574ms
cfiles.property_values as "1907",
-- == vs ==
-- SLOW OPTION: getting a json path: no GIN=3273ms; with GIN=3241ms
cfiles.property_values #>> '"Sample Names"' as "1907",
-- adding another path: with GIN=4028ms
cfiles.property_values #>> '"Project IDs"' as "1908",
-- adding yet another path: with GIN=4774ms
cfiles.property_values #>> '"Run IDs"' as "1909",
-- adding yet another path: with GIN=5558ms
cfiles.property_values #>> '"Data Type"' as "1910",
-- ==== rest of query below I can't change ====
user_permissions.notified_at::text as "111",
group_permissions.notified_at::text as "112",
user_permissions.task_id::text as "113",
group_permissions.task_id::text as "114",
datasets.id as "151",
datasets.name as "154",
datasets.path as "155",
datasets.last_modified as "156",
datasets.file_count as "157",
datasets.locked as "158",
datasets.content_types as "159",
cfiles.name as "105",
cfiles.last_modified as "107",
pg_size_pretty(cfiles.size::bigint) as "106",
cfiles.id as "101",
cfiles.tid as "102",
cfiles.uuid as "103",
cfiles.path as "104",
cfiles.content_type as "108",
cfiles.locked as "109",
cfiles.checksum as "110"
FROM cfiles
JOIN datasets ON datasets.id=cfiles.dataset_id
LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id)
LEFT JOIN users on users.id=user_permissions.user_id
LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id)
LEFT JOIN groups ON groups.id=group_permissions.group_id
LEFT JOIN user_groups ON groups.id=user_groups.group_id
LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id
WHERE
cfiles.tid=5
ORDER BY "107" desc
LIMIT 20
OFFSET 0
Table "public.cfiles"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('cfiles_id_seq'::regclass)
tid | bigint | | not null |
uuid | uuid | | not null | gen_random_uuid()
dataset_id | bigint | | not null |
path | character varying | | not null |
name | character varying | | |
checksum | character varying | | |
size | bigint | | |
last_modified | timestamp without time zone | | |
content_type | character varying | | |
locked | boolean | | not null | false
property_values | jsonb | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
Indexes:
"cfiles_pkey" PRIMARY KEY, btree (id)
"cfiles_property_values_idx" gin (property_values jsonb_path_ops)
"index_cfiles_dataset_id_path" UNIQUE, btree (dataset_id, path)
"index_cfiles_name" btree (name)
"index_cfiles_tid" btree (tid)
"index_cfiles_uuid_id_path" UNIQUE, btree (uuid)
Foreign-key constraints:
"cfiles_datasets_fk" FOREIGN KEY (dataset_id) REFERENCES datasets(id)
"cfiles_tenants_fk" FOREIGN KEY (tid) REFERENCES tenants(id)
Referenced by:
TABLE "group_permissions" CONSTRAINT "group_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)
TABLE "picklist_cfiles" CONSTRAINT "picklist_cfiles_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)
TABLE "user_permissions" CONSTRAINT "user_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)
慢查询计划:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=13700.06..13700.11 rows=20 width=662) (actual time=5702.511..5702.521 rows=20 loops=1)
Output: ((cfiles.property_values #>> '"Sample Names"'::text[])), ((cfiles.property_values #>> '"Project IDs"'::text[])), ((cfiles.property_values #>> '"Run IDs"'::text[])), ((cfiles.property_values #>> '"Data Type"'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
-> Sort (cost=13700.06..13810.61 rows=44219 width=662) (actual time=5702.508..5702.512 rows=20 loops=1)
Output: ((cfiles.property_values #>> '"Sample Names"'::text[])), ((cfiles.property_values #>> '"Project IDs"'::text[])), ((cfiles.property_values #>> '"Run IDs"'::text[])), ((cfiles.property_values #>> '"Data Type"'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
Sort Key: cfiles.last_modified DESC
Sort Method: top-N heapsort Memory: 344kB
-> Hash Left Join (cost=39.53..12523.41 rows=44219 width=662) (actual time=2.535..5526.409 rows=44255 loops=1)
Output: (cfiles.property_values #>> '"Sample Names"'::text[]), (cfiles.property_values #>> '"Project IDs"'::text[]), (cfiles.property_values #>> '"Run IDs"'::text[]), (cfiles.property_values #>> '"Data Type"'::text[]), (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
Hash Cond: (cfiles.id = picklist_cfiles.cfile_id)
-> Nested Loop Left Join (cost=38.19..10918.99 rows=44219 width=867) (actual time=1.639..632.739 rows=44255 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, user_permissions.notified_at, user_permissions.task_id, group_permissions.notified_at, group_permissions.task_id
Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id))
Rows Removed by Join Filter: 177020
-> Nested Loop Left Join (cost=38.19..7822.61 rows=44219 width=851) (actual time=1.591..464.449 rows=44255 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id
Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id))
Rows Removed by Join Filter: 354040
-> Hash Join (cost=35.75..4723.32 rows=44219 width=835) (actual time=1.301..163.411 rows=44255 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
Inner Unique: true
Hash Cond: (cfiles.dataset_id = datasets.id)
-> Seq Scan on public.cfiles (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.044..49.425 rows=44255 loops=1)
Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at
Filter: (cfiles.tid = 5)
Rows Removed by Filter: 1561
-> Hash (cost=28.11..28.11 rows=611 width=199) (actual time=1.234..1.235 rows=611 loops=1)
Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
Buckets: 1024 Batches: 1 Memory Usage: 149kB
-> Seq Scan on public.datasets (cost=0.00..28.11 rows=611 width=199) (actual time=0.012..0.571 rows=611 loops=1)
Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
-> Materialize (cost=2.44..3.97 rows=4 width=32) (actual time=0.000..0.002 rows=8 loops=44255)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
-> Hash Right Join (cost=2.44..3.95 rows=4 width=32) (actual time=0.170..0.248 rows=8 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
Hash Cond: (user_groups.group_id = groups.id)
-> Seq Scan on public.user_groups (cost=0.00..1.34 rows=34 width=8) (actual time=0.022..0.056 rows=34 loops=1)
Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at
-> Hash (cost=2.39..2.39 rows=4 width=40) (actual time=0.121..0.121 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=1.09..2.39 rows=4 width=40) (actual time=0.063..0.092 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
Hash Cond: (groups.id = group_permissions.group_id)
-> Seq Scan on public.groups (cost=0.00..1.19 rows=19 width=8) (actual time=0.010..0.017 rows=19 loops=1)
Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at
-> Hash (cost=1.04..1.04 rows=4 width=40) (actual time=0.032..0.033 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on public.group_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.017..0.022 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
-> Materialize (cost=0.00..1.06 rows=4 width=40) (actual time=0.000..0.001 rows=4 loops=44255)
Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
-> Seq Scan on public.user_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.021..0.025 rows=4 loops=1)
Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
-> Hash (cost=1.15..1.15 rows=15 width=8) (actual time=0.040..0.040 rows=15 loops=1)
Output: picklist_cfiles.cfile_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on public.picklist_cfiles (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.017 rows=15 loops=1)
Output: picklist_cfiles.cfile_id
Planning Time: 3.141 ms
Execution Time: 5702.799 ms
(61 rows)
更新:重构为 CTE 模式让我缩短到 20 毫秒
WITH T as (
select cfiles.property_values as prop_vals,
user_permissions.notified_at::text as "111",
group_permissions.notified_at::text as "112",
user_permissions.task_id::text as "113",
group_permissions.task_id::text as "114",
datasets.id as "151",
datasets.name as "154",
datasets.path as "155",
datasets.last_modified as "156",
datasets.file_count as "157",
datasets.locked as "158",
datasets.content_types as "159",
cfiles.name as "105",
cfiles.last_modified as "107",
pg_size_pretty(cfiles.size::bigint) as "106",
cfiles.id as "101",
cfiles.tid as "102",
cfiles.uuid as "103",
cfiles.path as "104",
cfiles.content_type as "108",
cfiles.locked as "109",
cfiles.checksum as "110"
FROM cfiles
JOIN datasets ON datasets.id=cfiles.dataset_id
LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id)
LEFT JOIN users on users.id=user_permissions.user_id
LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id)
LEFT JOIN groups ON groups.id=group_permissions.group_id
LEFT JOIN user_groups ON groups.id=user_groups.group_id
LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id
WHERE
cfiles.tid=5
LIMIT 20
)
SELECT
prop_vals ->> 'Sample Names' as "1907",
prop_vals ->> 'Project IDs' as "1908",
prop_vals ->> 'Run IDs' as "1909",
prop_vals ->> 'Data Type' as "1910",
"111", "112", "113", "114", "151", "154", "155", "156", "157",
"158", "159", "105", "107", "106", "101", "102", "103", "104",
"108", "109", "110"
FROM T
ORDER BY "107" desc;
CTE 查询计划:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=16.18..16.23 rows=20 width=662) (actual time=18.771..18.779 rows=20 loops=1)
Output: ((t.prop_vals ->> 'Sample Names'::text)), ((t.prop_vals ->> 'Project IDs'::text)), ((t.prop_vals ->> 'Run IDs'::text)), ((t.prop_vals ->> 'Data Type'::text)), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110"
Sort Key: t."107" DESC
Sort Method: quicksort Memory: 368kB
-> Subquery Scan on t (cost=4.05..15.74 rows=20 width=662) (actual time=1.091..18.412 rows=20 loops=1)
Output: (t.prop_vals ->> 'Sample Names'::text), (t.prop_vals ->> 'Project IDs'::text), (t.prop_vals ->> 'Run IDs'::text), (t.prop_vals ->> 'Data Type'::text), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110"
-> Limit (cost=4.05..15.34 rows=20 width=987) (actual time=0.320..1.241 rows=20 loops=1)
Output: cfiles.property_values, ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
-> Nested Loop Left Join (cost=4.05..24965.23 rows=44219 width=987) (actual time=0.318..1.224 rows=20 loops=1)
Output: cfiles.property_values, (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id))
Rows Removed by Join Filter: 80
-> Nested Loop Left Join (cost=4.05..20873.92 rows=44219 width=851) (actual time=0.273..1.056 rows=20 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id
Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id))
Rows Removed by Join Filter: 160
-> Nested Loop (cost=1.61..17774.63 rows=44219 width=835) (actual time=0.125..0.745 rows=20 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
Inner Unique: true
-> Hash Left Join (cost=1.34..4738.00 rows=44219 width=644) (actual time=0.094..0.475 rows=20 loops=1)
Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, cfiles.dataset_id
Hash Cond: (cfiles.id = picklist_cfiles.cfile_id)
-> Seq Scan on public.cfiles (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.046..0.360 rows=20 loops=1)
Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at
Filter: (cfiles.tid = 5)
Rows Removed by Filter: 629
-> Hash (cost=1.15..1.15 rows=15 width=8) (actual time=0.034..0.035 rows=15 loops=1)
Output: picklist_cfiles.cfile_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on public.picklist_cfiles (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.018 rows=15 loops=1)
Output: picklist_cfiles.cfile_id
-> Index Scan using datasets_pkey on public.datasets (cost=0.28..0.29 rows=1 width=199) (actual time=0.008..0.008 rows=1 loops=20)
Output: datasets.id, datasets.tid, datasets.bucket_path_id, datasets.path, datasets.name, datasets.last_modified, datasets.file_count, datasets.size, datasets.content_types, datasets.locked, datasets.created_at, datasets.updated_at
Index Cond: (datasets.id = cfiles.dataset_id)
-> Materialize (cost=2.44..3.97 rows=4 width=32) (actual time=0.005..0.009 rows=8 loops=20)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
-> Hash Right Join (cost=2.44..3.95 rows=4 width=32) (actual time=0.088..0.122 rows=8 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
Hash Cond: (user_groups.group_id = groups.id)
-> Seq Scan on public.user_groups (cost=0.00..1.34 rows=34 width=8) (actual time=0.007..0.016 rows=34 loops=1)
Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at
-> Hash (cost=2.39..2.39 rows=4 width=40) (actual time=0.069..0.069 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=1.09..2.39 rows=4 width=40) (actual time=0.043..0.064 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
Hash Cond: (groups.id = group_permissions.group_id)
-> Seq Scan on public.groups (cost=0.00..1.19 rows=19 width=8) (actual time=0.006..0.011 rows=19 loops=1)
Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at
-> Hash (cost=1.04..1.04 rows=4 width=40) (actual time=0.022..0.022 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on public.group_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.009..0.014 rows=4 loops=1)
Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
-> Materialize (cost=0.00..1.06 rows=4 width=40) (actual time=0.001..0.003 rows=4 loops=20)
Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
-> Seq Scan on public.user_permissions (cost=0.00..1.04 rows=4 width=40) (actual time=0.018..0.022 rows=4 loops=1)
Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
Planning Time: 4.049 ms
Execution Time: 19.128 ms
(60 rows)
【问题讨论】:
@LaurenzAlbe 感谢您的查看 - 是的,我尝试了->
和 ->>
运算符,但没有任何区别。刚刚在上面添加了快速执行计划。
【参考方案1】:
您的慢查询是对所有 44255 行的大型 jsonb 数据进行 deTOAST 处理,然后通过排序携带解析出的值以挑选出前 20 行。 (我不知道为什么它会像那样急切地去吐司)。所以 44235 JSONB 被 deTOASTed 只是为了被扔掉。
您的快速查询(大概)是从散列连接返回 TOAST 指针,使用这些小指针对行进行排序,然后仅对 20 个幸存者进行 deTOAST。在 EXPLAIN ANALYZE 的情况下,它甚至不会取消对幸存者的吐司,它只是将指针扔掉。
这就是“为什么”,至于如何处理它,如果您真的无法更改最顶部下方的任何查询,我怀疑您可以在服务器端做些什么。
如果您可以更大幅度地修改查询,那么您可以使用 CTE 改进运行时间。让 CTE 选择整个 jsonb,然后 CTE 上的选择将值从中提取出来。
WITH T as (select cfiles.property_values as "1907", <rest of query>)
SELECT "1907"->>'name1', "1907"->>'name2', <rest of select list> from T;
【讨论】:
可能问题出在旧 PostgreSQL 版本上。从 9.6 开始,SELECT
列表条目在排序后进行评估。
@jjanes 谢谢我没有听说过 TOAST - 如果我能够编辑任何/所有查询,有没有办法只取消幸存者但仍然将它们分配给列名?
@LaurenzAlbe 刚刚更新 - 这是 PostgreSQL 版本 12
@jjanes 谢谢 - 使用 CTE 格式将我的时间缩短到 17 毫秒,而对原始查询的更改最少【参考方案2】:
除了@jjanes 已经说过的内容之外,您可以先将记录数量限制为 20 条记录,然后再进行其余的工作。像这样的:
WITH i(id) AS (
-- core piece of SQL to select the records you're looking for
SELECT
cfiles.ID
FROM
cfiles
JOIN datasets ON datasets.ID = cfiles.dataset_id
WHERE
cfiles.tid = 5
ORDER BY
cfiles.last_modified DESC
LIMIT 20 OFFSET 0
)
SELECT-- FAST OPTION: getting all of json: no GIN=579ms; with GIN=574ms
cfiles.property_values AS "1907",
-- == vs ==
-- SLOW OPTION: getting a json path: no GIN=3273ms; with GIN=3241ms
cfiles.property_values #>> '"Sample Names"' AS "1907",
-- adding another path: with GIN=4028ms
cfiles.property_values #>> '"Project IDs"' AS "1908",
-- adding yet another path: with GIN=4774ms
cfiles.property_values #>> '"Run IDs"' AS "1909",
-- adding yet another path: with GIN=5558ms
cfiles.property_values #>> '"Data Type"' AS "1910",
-- ==== rest of query below I can't change ====
user_permissions.notified_at :: TEXT AS "101",
group_permissions.notified_at :: TEXT AS "102",
user_permissions.task_id :: TEXT AS "103",
group_permissions.task_id :: TEXT AS "104",
datasets.ID AS "151",
datasets.NAME AS "154",
datasets.PATH AS "155",
datasets.last_modified AS "156",
datasets.file_count AS "157",
datasets.locked AS "158",
datasets.content_types AS "159",
cfiles.NAME AS "105",
cfiles.last_modified AS "107",
pg_size_pretty ( cfiles.SIZE :: BIGINT ) AS "106",
cfiles.ID AS "101",
cfiles.tid AS "102",
cfiles.uuid AS "103",
cfiles.PATH AS "104",
cfiles.content_type AS "108",
cfiles.locked AS "109",
cfiles.checksum AS "110"
FROM
cfiles
JOIN i USING(id) -- should match just 20 records
JOIN datasets ON datasets.ID = cfiles.dataset_id
LEFT JOIN user_permissions ON ( user_permissions.cfile_id = cfiles.ID OR user_permissions.dataset_id = datasets.ID )
LEFT JOIN users ON users.ID = user_permissions.user_id
LEFT JOIN group_permissions ON ( group_permissions.cfile_id = cfiles.ID OR group_permissions.dataset_id = datasets.ID )
LEFT JOIN groups ON groups.ID = group_permissions.group_id
LEFT JOIN user_groups ON groups.ID = user_groups.group_id
LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id = cfiles.ID
ORDER BY
"107" DESC;
您可能想要重写两个具有 OR 条件的 LEFT JOIN,您可以使用 UNION ALL 的子查询。这可能会加快速度
【讨论】:
哇,这把它缩短到了 60 毫秒,谢谢 - 将不得不尝试重新处理整个查询 @simj:你能告诉我们 EXPLAIN ANALYZE 的结果吗?看看数据库在做什么总是很有趣以上是关于结合关系查询提高 Postgres jsonb 查询的性能的主要内容,如果未能解决你的问题,请参考以下文章