PostgreSQL对or exists产生的filter优化一
Posted robinson1988
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PostgreSQL对or exists产生的filter优化一相关的知识,希望对你有一定的参考价值。
在Oracle数据库中,where条件中有or exists会走Filter,这种情况一般都需要改写SQL
先创建2个测试表a,b
create table a as select * from dba_objects;
create table b as select * from a;
下面SQL在Oracle中会产生filter,跑得慢
select count(*) from a
where owner='SCOTT' or exists (select null from b where a.object_id=b.data_object_id);
SQL> select count(*) from a
2 where owner='SCOTT' or exists (select null from b where a.object_id=b.data_object_id);
已用时间: 00: 01: 33.95
执行计划
----------------------------------------------------------
Plan hash value: 1064440332
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 290 (1)| 00:00:04 |
| 1 | SORT AGGREGATE | | 1 | 30 | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL| A | 74940 | 2195K| 290 (1)| 00:00:04 |
|* 4 | TABLE ACCESS FULL| B | 800 | 10400 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"='SCOTT' OR EXISTS (SELECT 0 FROM "B" "B" WHERE
"B"."DATA_OBJECT_ID"=:B1))
4 - filter("B"."DATA_OBJECT_ID"=:B1)
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
13 recursive calls
0 db block gets
72687576 consistent gets
0 physical reads
0 redo size
424 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
需要将其改写,秒杀
select count(*)
from a left join (select data_object_id from b group by data_object_id) b
on a.object_id = b.data_object_id where a.owner = 'SCOTT' or b.data_object_id is not null;
SQL> select count(*)
2 from a left join (select data_object_id from b group by data_object_id) b
3 on a.object_id = b.data_object_id where a.owner = 'SCOTT' or b.data_object_id is not null;
已用时间: 00: 00: 00.02
执行计划
----------------------------------------------------------
Plan hash value: 2430901245
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 43 | | 831 (1)| 00:00:10 |
| 1 | SORT AGGREGATE | | 1 | 43 | | | |
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 80038 | 3360K| 1960K| 831 (1)| 00:00:10 |
| 4 | VIEW | | 80038 | 1016K| | 294 (2)| 00:00:04 |
| 5 | HASH GROUP BY | | 80038 | 1016K| | 294 (2)| 00:00:04 |
| 6 | TABLE ACCESS FULL | B | 80038 | 1016K| | 290 (1)| 00:00:04 |
| 7 | TABLE ACCESS FULL | A | 74940 | 2195K| | 290 (1)| 00:00:04 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"."OWNER"='SCOTT' OR "B"."DATA_OBJECT_ID" IS NOT NULL)
3 - access("A"."OBJECT_ID"="B"."DATA_OBJECT_ID"(+))
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
28 recursive calls
0 db block gets
2714 consistent gets
0 physical reads
0 redo size
424 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
现在将a,b迁移到PostgreSQL12中
orcl=> select * from version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
orcl=> show work_mem;
work_mem
----------
64MB
orcl=> explain analyze select count(*) from a
where owner='SCOTT' or exists (select null from b where a.object_id=b.data_object_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=158042495.87..158042495.88 rows=1 width=8) (actual time=20.602..20.603 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..158042405.13 rows=36296 width=0) (actual time=8.705..20.384 rows=7144 loops=1)
Filter: (((owner)::text = 'SCOTT'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 65441
SubPlan 1
-> Seq Scan on b (cost=0.00..2177.31 rows=1 width=0) (never executed)
Filter: (a.object_id = data_object_id)
SubPlan 2
-> Seq Scan on b b_1 (cost=0.00..1995.85 rows=72585 width=6) (actual time=0.004..5.449 rows=72585 loops=1)
Planning Time: 0.058 ms
Execution Time: 20.642 ms
(11 rows)
在PostgreSQL12中,上面SQL秒杀了,而在Oracle中要跑1分33秒
从PG的执行计划中可以看到 alternatives: SubPlan 1 or hashed SubPlan 2 ,意思就是在SubPlan 1 和 hashed SubPlan 2中 二选一
SubPlan 1 有个 never executed,说明没有选择SubPlan 1,选的是SubPlan 2,也就是说没有执行SubPlan 1,执行的是SubPlan 2
SubPlan 2 loops=1,说明只执行了一次
现在调小work_mem,设置为64kB
orcl=> set work_mem='64kB';
SET
orcl=> explain analyze select count(*) from a
where owner='SCOTT' or exists (select null from b where a.object_id=b.data_object_id);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=158042495.87..158042495.88 rows=1 width=8) (actual time=222323.048..222323.049 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..158042405.13 rows=36296 width=0) (actual time=4.212..222321.298 rows=7144 loops=1)
Filter: (((owner)::text = 'SCOTT'::text) OR (SubPlan 1))
Rows Removed by Filter: 65441
SubPlan 1
-> Seq Scan on b (cost=0.00..2177.31 rows=1 width=0) (actual time=3.062..3.062 rows=0 loops=72574)
Filter: (a.object_id = data_object_id)
Rows Removed by Filter: 69980
Planning Time: 0.057 ms
Execution Time: 222323.070 ms
将work_mem设置为64kB之后,PG的执行计划其实就和Oracle基本上一模一样了
再来个复杂的 or exists and exists 进行测试
select *
from a
where exists (select 1
from b
where a.object_id = b.object_id
and object_type like '%TABLE%')
or (object_id > 50000 and object_name like 'DBA%')
and not exists (select 1
from b
where a.data_object_id = b.data_object_id
and object_type like '%INDEX%');
orcl=> explain analyze select *
orcl-> from a
orcl-> where exists (select 1
orcl(> from b
orcl(> where a.object_id = b.object_id
orcl(> and object_type like '%TABLE%')
orcl-> or (object_id > 50000 and object_name like 'DBA%')
orcl-> and not exists (select 1
orcl(> from b
orcl(> where a.data_object_id = b.data_object_id
orcl(> and object_type like '%INDEX%');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
Seq Scan on a (cost=0.00..342425725.52 rows=36467 width=198) (actual time=5.813..22.910 rows=3285 loops=1)
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR ((object_id > '50000'::numeric) AND ((object_name)::text ~~ 'DBA%'::text) AND (NOT (alternatives: SubPlan 3 or hashed SubPlan 4)
)))
Rows Removed by Filter: 69300
SubPlan 1
-> Seq Scan on b (cost=0.00..2358.77 rows=1 width=0) (never executed)
Filter: (((object_type)::text ~~ '%TABLE%'::text) AND (a.object_id = object_id))
SubPlan 2
-> Seq Scan on b b_1 (cost=0.00..2177.31 rows=3075 width=6) (actual time=0.002..5.436 rows=3052 loops=1)
Filter: ((object_type)::text ~~ '%TABLE%'::text)
Rows Removed by Filter: 69533
SubPlan 3
-> Seq Scan on b b_2 (cost=0.00..2358.77 rows=1 width=0) (never executed)
Filter: (((object_type)::text ~~ '%INDEX%'::text) AND (a.data_object_id = data_object_id))
SubPlan 4
-> Seq Scan on b b_3 (cost=0.00..2177.31 rows=4154 width=6) (actual time=0.015..5.368 rows=4260 loops=1)
Filter: ((object_type)::text ~~ '%INDEX%'::text)
Rows Removed by Filter: 68325
Planning Time: 0.099 ms
Execution Time: 23.021 ms
(19 rows)
从执行计划中可以看到,PG12 也对上面的filter做了优化
结论:当work_mem足够大时,PG会对filter进行优化,将子查询的表进行HASH,让它只扫描一次
如果work_mem不够大,就不针对filter进行优化
思考:PG能自动优化filter的前提是work_mem足够大,如果exists中表很大怎么办?
作为DBA,建议不要太依赖这个功能,还是要做好SQL审核(SQL等价改写)才是王道
以上是关于PostgreSQL对or exists产生的filter优化一的主要内容,如果未能解决你的问题,请参考以下文章
PostgreSQL对or exists产生的filter优化二
PostgreSQL对or exists产生的filter优化二
PostgreSQL对or exists产生的filter优化一