PostgreSQL对or exists产生的filter优化二
Posted robinson1988
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PostgreSQL对or exists产生的filter优化二相关的知识,希望对你有一定的参考价值。
PostgreSQL会对or exists产生的filter进行优化,上一篇文章没有测试exists中有大表的情况,今天来测试一下exists中有大表的情况
注意:测试期间没有对表添加索引
orcl=> select * from version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
create table a as select * from dba_objects;
create table b as select * from a;
create table c as select * from a;
create table d as select * from a;
insert into c select * from c;
.....反复执行,直到c有600MB.....
insert into d select * from d;
.....反复执行,直到d有600MB.....
orcl=> \\d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+-------+------------+-------------
public | a | table | scott | 10192 kB |
public | b | table | scott | 10192 kB |
public | c | table | scott | 635 MB |
public | d | table | scott | 635 MB |
orcl=> show work_mem;
work_mem
----------
64MB
orcl=> explain select count(*)
from a
where owner = 'SCOTT'
or exists (select null
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
and a.object_id = b.data_object_id
and a.object_name = c.object_name
and a.object_type = d.object_type);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1658265602.93..1658265602.94 rows=1 width=8)
-> Seq Scan on a (cost=0.00..1658265512.19 rows=36296 width=0)
Filter: (((owner)::text = 'SCOTT'::text) OR (SubPlan 1))
SubPlan 1
-> Nested Loop (cost=0.00..479762.07 rows=21 width=0)
-> Nested Loop (cost=0.00..477403.03 rows=21 width=24)
Join Filter: (c.data_object_id = d.data_object_id)
-> Seq Scan on d (cost=0.00..139367.54 rows=116181 width=6)
Filter: ((a.object_type)::text = (object_type)::text)
-> Materialize (cost=0.00..139366.27 rows=114 width=30)
-> Seq Scan on c (cost=0.00..139365.70 rows=114 width=30)
Filter: ((object_name)::text = (a.object_name)::text)
-> Materialize (cost=0.00..2358.78 rows=1 width=24)
-> Seq Scan on b (cost=0.00..2358.77 rows=1 width=24)
Filter: (((object_name)::text = (a.object_name)::text) AND (a.object_id = data_object_id))
work_mem=64MB,PG没有对filter进行自动优化,一直增加work_mem...直到加到6GB,PG才对filter做了自动优化
orcl=> set work_mem='6GB';
SET
orcl=> explain analyze select count(*)
from a
where owner = 'SCOTT'
or exists (select null
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
and a.object_id = b.data_object_id
and a.object_name = c.object_name
and a.object_type = d.object_type);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1658265602.93..1658265602.94 rows=1 width=8) (actual time=26938.024..26938.028 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..1658265512.19 rows=36296 width=0) (actual time=26916.210..26937.781 rows=7141 loops=1)
Filter: (((owner)::text = 'SCOTT'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 65444
SubPlan 1
-> Nested Loop (cost=0.00..479762.07 rows=21 width=0) (never executed)
-> Nested Loop (cost=0.00..477403.03 rows=21 width=24) (never executed)
Join Filter: (c.data_object_id = d.data_object_id)
-> Seq Scan on d (cost=0.00..139367.54 rows=116181 width=6) (never executed)
Filter: ((a.object_type)::text = (object_type)::text)
-> Materialize (cost=0.00..139366.27 rows=114 width=30) (never executed)
-> Seq Scan on c (cost=0.00..139365.70 rows=114 width=30) (never executed)
Filter: ((object_name)::text = (a.object_name)::text)
-> Materialize (cost=0.00..2358.78 rows=1 width=24) (never executed)
-> Seq Scan on b (cost=0.00..2358.77 rows=1 width=24) (never executed)
Filter: (((object_name)::text = (a.object_name)::text) AND (a.object_id = data_object_id))
SubPlan 2
-> Merge Join (cost=2066731.77..3002199.73 rows=62196959 width=70) (actual time=3085.476..13208.771 rows=67854336 loops=1)
Merge Cond: (d_1.data_object_id = c_1.data_object_id)
-> Sort (cost=642383.81..654001.92 rows=4647243 width=14) (actual time=853.094..912.878 rows=498689 loops=1)
Sort Key: d_1.data_object_id
Sort Method: quicksort Memory: 415522kB
-> Seq Scan on d d_1 (cost=0.00..127749.43 rows=4647243 width=14) (actual time=0.016..445.795 rows=4645440 loops=1)
-> Sort (cost=1424346.01..1445449.02 rows=8441205 width=36) (actual time=2232.371..4183.017 rows=67854337 loops=1)
Sort Key: c_1.data_object_id
Sort Method: quicksort Memory: 1058170kB
-> Hash Join (cost=2903.16..453226.84 rows=8441205 width=36) (actual time=12.636..1409.609 rows=8841152 loops=1)
Hash Cond: ((c_1.object_name)::text = (b_1.object_name)::text)
-> Seq Scan on c c_1 (cost=0.00..127747.96 rows=4647096 width=30) (actual time=0.012..316.364 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.557..12.558 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5073kB
-> Seq Scan on b b_1 (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.006..5.736 rows=72585 loops=1)
Planning Time: 0.222 ms
Execution Time: 26965.614 ms
exists子查询中b,c,d加起来才1.2GB,但是要将work_mem设置为6GB PG才能自动对filter优化
在生产环境中,exists里面的表加起来几个GB,几十GB很常见,这个时候要设置多大的work_mem PG才能自动优化呢?
还是老老实实的做SQL审核,SQL等价改写吧,现在来看一下改写之后跑多久
orcl=> show work_mem;
work_mem
----------
64MB
orcl=> explain analyze select count(*)
orcl-> from a
orcl-> left join (select b.data_object_id, c.object_name, d.object_type
orcl(> from b, c, d
orcl(> where b.object_name = c.object_name
orcl(> and c.data_object_id = d.data_object_id
orcl(> group by b.data_object_id, c.object_name, d.object_type) b
orcl-> on a.object_id = b.data_object_id
orcl-> and a.object_name = b.object_name
orcl-> and a.object_type = b.object_type
orcl-> where a.owner = 'SCOTT'
orcl-> or (b.data_object_id is not null and b.object_name is not null and
orcl(> b.object_type is not null);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12957110.90..12957110.91 rows=1 width=8) (actual time=16782.213..16844.980 rows=1 loops=1)
-> Merge Right Join (cost=3720449.80..12956932.14 rows=71502 width=0) (actual time=12482.389..16844.339 rows=7141 loops=1)
Merge Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
Rows Removed by Filter: 65444
-> Group (cost=3712593.67..11845927.77 rows=62995559 width=38) (actual time=12445.613..16805.434 rows=7872 loops=1)
Group Key: b.data_object_id, c.object_name, d.object_type
-> Gather Merge (cost=3712593.67..11373461.07 rows=62995560 width=38) (actual time=12445.611..16800.049 rows=39356 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Group (cost=3711593.61..3869082.51 rows=15748890 width=38) (actual time=11649.700..15711.736 rows=8070 loops=5)
Group Key: b.data_object_id, c.object_name, d.object_type
-> Sort (cost=3711593.61..3750965.83 rows=15748890 width=38) (actual time=11649.696..14017.939 rows=13445299 loops=5)
Sort Key: b.data_object_id, c.object_name, d.object_type
Sort Method: external merge Disk: 627720kB
Worker 0: Sort Method: external merge Disk: 538864kB
Worker 1: Sort Method: external merge Disk: 597080kB
Worker 2: Sort Method: external merge Disk: 660520kB
Worker 3: Sort Method: external merge Disk: 553536kB
-> Merge Join (cost=1149297.01..1398275.99 rows=15748890 width=38) (actual time=1983.411..4767.583 rows=13570867 loops=5)
Merge Cond: (c.data_object_id = d.data_object_id)
-> Sort (cost=427707.49..433060.49 rows=2141199 width=36) (actual time=735.984..793.578 rows=206721 loops=5)
Sort Key: c.data_object_id
Sort Method: external merge Disk: 63976kB
Worker 0: Sort Method: external merge Disk: 54928kB
Worker 1: Sort Method: external merge Disk: 62872kB
Worker 2: Sort Method: external merge Disk: 68224kB
Worker 3: Sort Method: external merge Disk: 59152kB
-> Parallel Hash Join (cost=2230.68..144009.05 rows=2141199 width=36) (actual time=8.979..402.091 rows=1768230 loops=5)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Parallel Seq Scan on c (cost=0.00..92890.60 rows=1161360 width=30) (actual time=0.013..91.337 rows=929088 loops=5)
-> Parallel Hash (cost=1696.97..1696.97 rows=42697 width=30) (actual time=6.640..6.640 rows=14517 loops=5)
Buckets: 131072 Batches: 1 Memory Usage: 5376kB
-> Parallel Seq Scan on b (cost=0.00..1696.97 rows=42697 width=30) (actual time=0.007..1.990 rows=14517 loops=5)
-> Materialize (cost=721588.23..744816.90 rows=4645734 width=14) (actual time=1247.415..1879.497 rows=13570868 loops=5)
-> Sort (cost=721588.23..733202.57 rows=4645734 width=14) (actual time=1247.409..1354.090 rows=498689 loops=5)
Sort Key: d.data_object_id
Sort Method: external merge Disk: 87904kB
Worker 0: Sort Method: external merge Disk: 87904kB
Worker 1: Sort Method: external merge Disk: 87912kB
Worker 2: Sort Method: external merge Disk: 87896kB
Worker 3: Sort Method: external merge Disk: 87888kB
-> Seq Scan on d (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.034..568.742 rows=4645440 loops=5)
-> Sort (cost=7856.14..8037.60 rows=72585 width=43) (actual time=23.339..26.246 rows=72585 loops=1)
Sort Key: a.object_id, a.object_name, a.object_type
Sort Method: quicksort Memory: 10790kB
-> Seq Scan on a (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.016..7.983 rows=72585 loops=1)
Planning Time: 0.653 ms
Execution Time: 16908.216 ms
改写完之后跑17秒左右,但是自动开了4个并行,work_mem 为64MB
现在禁止自动并行
orcl=> set max_parallel_workers_per_gather=0;
SET
orcl=> explain analyze select count(*)
from a
left join (select b.data_object_id, c.object_name, d.object_type
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
group by b.data_object_id, c.object_name, d.object_type) b
on a.object_id = b.data_object_id
and a.object_name = b.object_name
and a.object_type = b.object_type
where a.owner = 'SCOTT'
or (b.data_object_id is not null and b.object_name is not null and
b.object_type is not null);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14952868.49..14952868.50 rows=1 width=8) (actual time=55497.082..55497.086 rows=1 loops=1)
-> Merge Right Join (cost=13219585.91..14952689.74 rows=71502 width=0) (actual time=39422.940..55496.187 rows=7141 loops=1)
Merge Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
Rows Removed by Filter: 65444
-> Group (cost=13211729.77..13841685.36 rows=62995559 width=38) (actual time=39322.008..55460.899 rows=7872 loops=1)
Group Key: b.data_object_id, c.object_name, d.object_type
-> Sort (cost=13211729.77..13369218.67 rows=62995559 width=38) (actual time=39322.004..48948.069 rows=64897025 loops=1)
Sort Key: b.data_object_id, c.object_name, d.object_type
Sort Method: external merge Disk: 2977784kB
-> Merge Join (cost=2396381.36..3328514.34 rows=62995559 width=38) (actual time=3435.220..13449.670 rows=67854336 loops=1)
Merge Cond: (d.data_object_id = c.data_object_id)
-> Sort (cost=721588.23..733202.57 rows=4645734 width=14) (actual time=888.259..976.106 rows=498689 loops=1)
Sort Key: d.data_object_id
Sort Method: external merge Disk: 87912kB
-> Seq Scan on d (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.026..412.211 rows=4645440 loops=1)
-> Materialize (cost=1674792.54..1717616.52 rows=8564796 width=36) (actual time=2546.954..4554.523 rows=67854337 loops=1)
-> Sort (cost=1674792.54..1696204.53 rows=8564796 width=36) (actual time=2546.951..2709.503 rows=1033601 loops=1)
Sort Key: c.data_object_id
Sort Method: external merge Disk: 309128kB
-> Hash Join (cost=2903.16..454361.32 rows=8564796 width=36) (actual time=13.798..1468.422 rows=8841152 loops=1)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Seq Scan on c (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.015..309.911 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30) (actual time=13.719..13.719 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5073kB
-> Seq Scan on b (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.005..6.523 rows=72585 loops=1)
-> Sort (cost=7856.14..8037.60 rows=72585 width=43) (actual time=18.869..21.682 rows=72585 loops=1)
Sort Key: a.object_id, a.object_name, a.object_type
Sort Method: quicksort Memory: 10790kB
-> Seq Scan on a (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.011..5.940 rows=72585 loops=1)
Planning Time: 0.174 ms
Execution Time: 55748.392 ms
禁止自动并行之后跑了55秒,执行计划走了一堆sort merge join,现在禁止sort merge join,全走hash,看跑多久
orcl=> set enable_mergejoin=false;
SET
orcl=> explain analyze select count(*)
from a
left join (select b.data_object_id, c.object_name, d.object_type
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
group by b.data_object_id, c.object_name, d.object_type) b
on a.object_id = b.data_object_id
and a.object_name = b.object_name
and a.object_type = b.object_type
where a.owner = 'SCOTT'
or (b.data_object_id is not null and b.object_name is not null and
b.object_type is not null);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=83011075.70..83011075.71 rows=1 width=8) (actual time=53408.678..53408.683 rows=1 loops=1)
-> Hash Right Join (cost=81042285.63..83010896.95 rows=71502 width=0) (actual time=32888.138..53407.748 rows=7141 loops=1)
Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
Rows Removed by Filter: 65444
-> Group (cost=81039019.54..81668975.13 rows=62995559 width=38) (actual time=32789.399..53374.444 rows=8119 loops=1)
Group Key: b.data_object_id, c.object_name, d.object_type
-> Sort (cost=81039019.54..81196508.44 rows=62995559 width=38) (actual time=32789.396..46520.761 rows=67854336 loops=1)
Sort Key: b.data_object_id, c.object_name, d.object_type
Sort Method: external merge Disk: 2977688kB
-> Hash Join (cost=220458.56..71155804.11 rows=62995559 width=38) (actual time=586.236..11237.270 rows=67854336 loops=1)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Hash Join (cost=217555.40..68771834.50 rows=34168017 width=32) (actual time=573.644..4750.182 rows=33587200 loops=1)
Hash Cond: (d.data_object_id = c.data_object_id)
-> Seq Scan on d (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.038..441.587 rows=4645440 loops=1)
-> Hash (cost=127731.40..127731.40 rows=4645440 width=30) (actual time=569.515..569.516 rows=498688 loops=1)
Buckets: 1048576 Batches: 8 Memory Usage: 11678kB
-> Seq Scan on c (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.007..406.891 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.529..12.529 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5073kB
-> Seq Scan on b (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.005..5.511 rows=72585 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=43) (actual time=17.795..17.796 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6526kB
-> Seq Scan on a (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.009..6.135 rows=72585 loops=1)
Planning Time: 0.110 ms
Execution Time: 53629.598 ms
禁止sort merge join之后全走hash join要跑53秒,很大一部分耗时发生在Sort上
Sort (cost=10081039019.54..10081196508.44 rows=62995559 width=38) (actual time=32381.031..46111.439 rows=67854336 loops=1)
Sort Key: b.data_object_id, c.object_name, d.object_type
Sort Method: external merge Disk: 2977688kB
看来还是要加大work_mem才行,PG12的 GROUP BY 算法相比O就是个渣渣...有待加强啊!!!
HASH JOIN,SORT MERGE JOIN 也有待加强
orcl=> set work_mem='6GB';
SET
orcl=> explain analyze select count(*)
from a
left join (select b.data_object_id, c.object_name, d.object_type
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
group by b.data_object_id, c.object_name, d.object_type) b
on a.object_id = b.data_object_id
and a.object_name = b.object_name
and a.object_type = b.object_type
where a.owner = 'SCOTT'
or (b.data_object_id is not null and b.object_name is not null and
b.object_type is not null);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=73491444.96..73491444.97 rows=1 width=8) (actual time=24071.535..24071.539 rows=1 loops=1)
-> Hash Right Join (cost=71522654.89..73491266.20 rows=71502 width=0) (actual time=23838.264..24071.158 rows=7141 loops=1)
Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
Rows Removed by Filter: 65444
-> HashAggregate (cost=71519388.80..72149344.39 rows=62995559 width=38) (actual time=23819.981..24040.569 rows=8119 loops=1)
Group Key: b.data_object_id, c.object_name, d.object_type
-> Hash Join (cost=188702.56..71046922.11 rows=62995559 width=38) (actual time=574.309..11306.731 rows=67854336 loops=1)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Hash Join (cost=185799.40..68662952.50 rows=34168017 width=32) (actual time=561.869..4806.970 rows=33587200 loops=1)
Hash Cond: (d.data_object_id = c.data_object_id)
-> Seq Scan on d (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.026..350.947 rows=4645440 loops=1)
-> Hash (cost=127731.40..127731.40 rows=4645440 width=30) (actual time=551.953..551.953 rows=498688 loops=1)
Buckets: 8388608 Batches: 1 Memory Usage: 94476kB
-> Seq Scan on c (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.045..397.380 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.369..12.370 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5073kB
-> Seq Scan on b (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.004..5.643 rows=72585 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=43) (actual time=18.198..18.198 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6526kB
-> Seq Scan on a (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.016..6.175 rows=72585 loops=1)
Planning Time: 0.113 ms
Execution Time: 24363.224 ms
现在测试一下PG14,PG13懒得去测试了
orcl=> select * from version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 14beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
orcl=> show work_mem;
work_mem
----------
64MB
还是要将work_mem设置为6GB才能对filter自动优化
orcl=> set work_mem='6GB';
SET
orcl=> explain select count(*)
orcl-> from a
orcl-> where owner = 'SCOTT'
orcl-> or exists (select null
orcl(> from b, c, d
orcl(> where b.object_name = c.object_name
orcl(> and c.data_object_id = d.data_object_id
orcl(> and a.object_id = b.data_object_id
orcl(> and a.object_name = c.object_name
orcl(> and a.object_type = d.object_type);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=1752552267.57..1752552267.58 rows=1 width=8)
-> Seq Scan on a (cost=0.00..1752552176.83 rows=36297 width=0)
Filter: (((owner)::text = 'SCOTT'::text) OR (hashed SubPlan 2))
SubPlan 2
-> Merge Join (cost=2072764.75..2984181.18 rows=60595987 width=70)
Merge Cond: (d.data_object_id = c.data_object_id)
-> Sort (cost=642048.93..653660.57 rows=4644657 width=14)
Sort Key: d.data_object_id
-> Seq Scan on d (cost=0.00..127719.57 rows=4644657 width=14)
-> Sort (cost=1430710.80..1452265.14 rows=8621734 width=36)
Sort Key: c.data_object_id
-> Hash Join (cost=2903.16..437506.59 rows=8621734 width=36)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Seq Scan on c (cost=0.00..127727.45 rows=4645445 width=30)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30)
-> Seq Scan on b (cost=0.00..1995.85 rows=72585 width=30)
将work_mem设置为64MB,看一下改写后的SQL
orcl=> set work_mem='64MB';
SET
orcl=> explain analyze select count(*)
orcl-> from a
orcl-> left join (select b.data_object_id, c.object_name, d.object_type
orcl(> from b, c, d
orcl(> where b.object_name = c.object_name
orcl(> and c.data_object_id = d.data_object_id
orcl(> group by b.data_object_id, c.object_name, d.object_type) b
orcl-> on a.object_id = b.data_object_id
orcl-> and a.object_name = b.object_name
orcl-> and a.object_type = b.object_type
orcl-> where a.owner = 'SCOTT'
orcl-> or (b.data_object_id is not null and b.object_name is not null and
orcl(> b.object_type is not null);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11617963.19..11617963.20 rows=1 width=8) (actual time=24227.425..24227.437 rows=1 loops=1)
-> Hash Right Join (cost=8777347.46..11617784.44 rows=71502 width=0) (actual time=24204.902..24227.210 rows=7141 loops=1)
Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
Rows Removed by Filter: 65444
-> HashAggregate (cost=8774081.37..10326853.54 rows=60595987 width=38) (actual time=24181.780..24183.680 rows=8119 loops=1)
Group Key: b.data_object_id, c.object_name, d.object_type
Planned Partitions: 128 Batches: 1 Memory Usage: 13329kB
-> Merge Join (cost=2387890.48..3320442.54 rows=60595987 width=38) (actual time=3486.442..12655.561 rows=67854336 loops=1)
Merge Cond: (c.data_object_id = d.data_object_id)
-> Sort (cost=1666463.80..1688018.14 rows=8621734 width=36) (actual time=2566.337..2728.647 rows=1033601 loops=1)
Sort Key: c.data_object_id
Sort Method: external merge Disk: 309136kB
-> Hash Join (cost=2903.16..437506.59 rows=8621734 width=36) (actual time=17.774..1459.719 rows=8841152 loops=1)
Hash Cond: ((c.object_name)::text = (b.object_name)::text)
-> Seq Scan on c (cost=0.00..127727.45 rows=4645445 width=30) (actual time=0.022..300.843 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=30) (actual time=17.526..17.527 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5073kB
-> Seq Scan on b (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.006..8.267 rows=72585 loops=1)
-> Materialize (cost=721425.43..744648.71 rows=4644657 width=14) (actual time=920.094..2794.691 rows=67854337 loops=1)
-> Sort (cost=721425.43..733037.07 rows=4644657 width=14) (actual time=920.090..986.429 rows=498689 loops=1)
Sort Key: d.data_object_id
Sort Method: external merge Disk: 87896kB
-> Seq Scan on d (cost=0.00..127719.57 rows=4644657 width=14) (actual time=0.017..418.305 rows=4645440 loops=1)
-> Hash (cost=1995.85..1995.85 rows=72585 width=43) (actual time=22.874..22.875 rows=72585 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6526kB
-> Seq Scan on a (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.015..9.023 rows=72585 loops=1)
Planning Time: 0.280 ms
Execution Time: 24276.652 ms
PG14没开并行也能在24秒左右跑完,PG14对GROUP BY算法做了进一步优化,从之前的SORT GROUP BY 改成了 HASH GROUP BY了
-> HashAggregate (cost=8774081.37..10326853.54 rows=60595987 width=38) (actual time=24181.780..24183.680 rows=8119 loops=1)
在Oracle中测试一下
SQL> show parameter sga_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
sga_target big integer 596M
SQL> show parameter pga_aggregate_target
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 199M
SQL> show parameter optimizer_feature
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable string 11.2.0.1
SQL> set timi on autot trace
SQL> select count(*)
from a
left join (select b.data_object_id, c.object_name, d.object_type
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
group by b.data_object_id, c.object_name, d.object_type) b
on a.object_id = b.data_object_id
and a.object_name = b.object_name
and a.object_type = b.object_type
where a.owner = 'SCOTT'
or (b.data_object_id is not null and b.object_name is not null and
b.object_type is not null); 2 3 4 5 6 7 8 9 10 11 12 13
Elapsed: 00:00:06.50
Execution Plan
----------------------------------------------------------
Plan hash value: 3005719873
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 197 | | 342M (25)|999:59:59 |
| 1 | SORT AGGREGATE | | 1 | 197 | | | |
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN OUTER | | 30T| 5484T| 9616K| 342M (25)|999:59:59 |
| 4 | TABLE ACCESS FULL | A | 82734 | 8645K| | 308 (1)| 00:00:04 |
| 5 | VIEW | | 53G| 4480G| | 2913K (84)| 09:42:43 |
| 6 | HASH GROUP BY | | 53G| 9060G| | 2913K (84)| 09:42:43 |
|* 7 | HASH JOIN | | 53G| 9060G| 166M| 611K (24)| 02:02:19 |
| 8 | TABLE ACCESS FULL | D | 4838K| 110M| | 17975 (1)| 00:03:36 |
|* 9 | HASH JOIN | | 50M| 7616M| 6552K| 35687 (1)| 00:07:09 |
| 10 | TABLE ACCESS FULL| B | 73673 | 5683K| | 308 (1)| 00:00:04 |
| 11 | TABLE ACCESS FULL| C | 3940K| 296M| | 17967 (1)| 00:03:36 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."OWNER"='SCOTT' OR "B"."DATA_OBJECT_ID" IS NOT NULL AND
"B"."OBJECT_NAME" IS NOT NULL AND "B"."OBJECT_TYPE" IS NOT NULL)
3 - access("A"."OBJECT_TYPE"="B"."OBJECT_TYPE"(+) AND
"A"."OBJECT_NAME"="B"."OBJECT_NAME"(+) AND
"A"."OBJECT_ID"="B"."DATA_OBJECT_ID"(+))
7 - access("C"."DATA_OBJECT_ID"="D"."DATA_OBJECT_ID")
9 - access("B"."OBJECT_NAME"="C"."OBJECT_NAME")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL> alter session set optimizer_features_enable='19.1.0';
Session altered.
SQL> select count(*)
from a
left join (select b.data_object_id, c.object_name, d.object_type
from b, c, d
where b.object_name = c.object_name
and c.data_object_id = d.data_object_id
group by b.data_object_id, c.object_name, d.object_type) b
on a.object_id = b.data_object_id
and a.object_name = b.object_name
and a.object_type = b.object_type
where a.owner = 'SCOTT'
or (b.data_object_id is not null and b.object_name is not null and
b.object_type is not null); 2 3 4 5 6 7 8 9 10 11 12 13
Elapsed: 00:00:03.53
Execution Plan
----------------------------------------------------------
Plan hash value: 998875269
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 197 | | 1668K (2)| 00:01:06 |
| 1 | SORT AGGREGATE | | 1 | 197 | | | |
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN OUTER | | 327M| 60G| 9616K| 1668K (2)| 00:01:06 |
| 4 | JOIN FILTER CREATE | :BF0000 | 82734 | 8645K| | 308 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL | A | 82734 | 8645K| | 308 (1)| 00:00:01 |
| 6 | VIEW | | 327M| 27G| | 86111 (28)| 00:00:04 |
| 7 | HASH GROUP BY | | 327M| 47G| | 86111 (28)| 00:00:04 |
| 8 | JOIN FILTER USE | :BF0000 | 327M| 47G| | 74905 (17)| 00:00:03 |
| 9 | MERGE JOIN | | 327M| 47G| | 74905 (17)| 00:00:03 |
| 10 | SORT JOIN | | 327M| 23G| | 73240 (17)| 00:00:03 |
| 11 | VIEW | VW_GBF_20 | 327M| 23G| | 73240 (17)| 00:00:03 |
| 12 | HASH GROUP BY | | 327M| 31G| | 73240 (17)| 00:00:03 |
|* 13 | HASH JOIN | | 327M| 31G| 166M| 62034 (2)| 00:00:03 |
| 14 | TABLE ACCESS FULL| D | 4838K| 110M| | 17975 (1)| 00:00:01 |
| 15 | TABLE ACCESS FULL| C | 3940K| 296M| | 17967 (1)| 00:00:01 |
|* 16 | SORT JOIN | | 73673 | 5683K| 12M| 1664 (1)| 00:00:01 |
| 17 | VIEW | VW_GBC_19 | 73673 | 5683K| | 310 (1)| 00:00:01 |
| 18 | HASH GROUP BY | | 73673 | 5683K| | 310 (1)| 00:00:01 |
| 19 | TABLE ACCESS FULL | B | 73673 | 5683K| | 308 (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."OWNER"='SCOTT' OR "B"."DATA_OBJECT_ID" IS NOT NULL AND
"B"."OBJECT_NAME" IS NOT NULL AND "B"."OBJECT_TYPE" IS NOT NULL)
3 - access("A"."OBJECT_ID"="B"."DATA_OBJECT_ID"(+) AND
"A"."OBJECT_NAME"="B"."OBJECT_NAME"(+) AND "A"."OBJECT_TYPE"="B"."OBJECT_TYPE"(+))
13 - access("C"."DATA_OBJECT_ID"="D"."DATA_OBJECT_ID")
16 - access("ITEM_1"="ITEM_1")
filter("ITEM_1"="ITEM_1")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
181 recursive calls
0 db block gets
134108 consistent gets
135022 physical reads
0 redo size
551 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed
Oracle11g只需要6.5秒,19c只需要3.5秒(JOIN FILTER),SGA 596MB PGA199MB
PG12.7将work_mem设置为6GB之后还要24秒...FUCK
PG13没去测试,懒得去测试了
PG14将work_mem设置为64MB要24秒
总结:PG14 GROUP BY算法相比PG12 从SORT GROUP BY改成了HASH GROUP BY,能大大节约work_mem,但是相比O,还是有很大进步空间
以上是关于PostgreSQL对or exists产生的filter优化二的主要内容,如果未能解决你的问题,请参考以下文章
PostgreSQL对or exists产生的filter优化二
PostgreSQL对or exists产生的filter优化二
PostgreSQL对or exists产生的filter优化一