PostgreSQL对or exists产生的filter优化二

Posted robinson1988

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PostgreSQL对or exists产生的filter优化二相关的知识,希望对你有一定的参考价值。

PostgreSQL会对or exists产生的filter进行优化,上一篇文章没有测试exists中有大表的情况,今天来测试一下exists中有大表的情况

注意:测试期间没有对表添加索引

orcl=> select * from version();
                                                 version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

create table a as select * from dba_objects;
create table b as select * from a;
create table c as select * from a;
create table d as select * from a;
insert into c select * from c;
.....反复执行,直到c有600MB.....
insert into d select * from d;
.....反复执行,直到d有600MB.....

orcl=> \\d+
                    List of relations
 Schema | Name | Type  | Owner |    Size    | Description
--------+------+-------+-------+------------+-------------
 public | a    | table | scott | 10192 kB   |
 public | b    | table | scott | 10192 kB   |
 public | c    | table | scott | 635 MB     |
 public | d    | table | scott | 635 MB     |

orcl=> show work_mem;
 work_mem
----------
 64MB

orcl=> explain select count(*)
  from a
 where owner = 'SCOTT'
    or exists (select null
          from b, c, d
         where b.object_name = c.object_name
           and c.data_object_id = d.data_object_id
           and a.object_id = b.data_object_id
           and a.object_name = c.object_name
           and a.object_type = d.object_type);
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1658265602.93..1658265602.94 rows=1 width=8)
   ->  Seq Scan on a  (cost=0.00..1658265512.19 rows=36296 width=0)
         Filter: (((owner)::text = 'SCOTT'::text) OR (SubPlan 1))
         SubPlan 1
           ->  Nested Loop  (cost=0.00..479762.07 rows=21 width=0)
                 ->  Nested Loop  (cost=0.00..477403.03 rows=21 width=24)
                       Join Filter: (c.data_object_id = d.data_object_id)
                       ->  Seq Scan on d  (cost=0.00..139367.54 rows=116181 width=6)
                             Filter: ((a.object_type)::text = (object_type)::text)
                       ->  Materialize  (cost=0.00..139366.27 rows=114 width=30)
                             ->  Seq Scan on c  (cost=0.00..139365.70 rows=114 width=30)
                                   Filter: ((object_name)::text = (a.object_name)::text)
                 ->  Materialize  (cost=0.00..2358.78 rows=1 width=24)
                       ->  Seq Scan on b  (cost=0.00..2358.77 rows=1 width=24)
                             Filter: (((object_name)::text = (a.object_name)::text) AND (a.object_id = data_object_id))

work_mem=64MB,PG没有对filter进行自动优化,一直增加work_mem...直到加到6GB,PG才对filter做了自动优化

orcl=> set work_mem='6GB';
SET

orcl=> explain analyze select count(*)
  from a
 where owner = 'SCOTT'
    or exists (select null
          from b, c, d
         where b.object_name = c.object_name
           and c.data_object_id = d.data_object_id
           and a.object_id = b.data_object_id
           and a.object_name = c.object_name
           and a.object_type = d.object_type);
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1658265602.93..1658265602.94 rows=1 width=8) (actual time=26938.024..26938.028 rows=1 loops=1)
   ->  Seq Scan on a  (cost=0.00..1658265512.19 rows=36296 width=0) (actual time=26916.210..26937.781 rows=7141 loops=1)
         Filter: (((owner)::text = 'SCOTT'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         Rows Removed by Filter: 65444
         SubPlan 1
           ->  Nested Loop  (cost=0.00..479762.07 rows=21 width=0) (never executed)
                 ->  Nested Loop  (cost=0.00..477403.03 rows=21 width=24) (never executed)
                       Join Filter: (c.data_object_id = d.data_object_id)
                       ->  Seq Scan on d  (cost=0.00..139367.54 rows=116181 width=6) (never executed)
                             Filter: ((a.object_type)::text = (object_type)::text)
                       ->  Materialize  (cost=0.00..139366.27 rows=114 width=30) (never executed)
                             ->  Seq Scan on c  (cost=0.00..139365.70 rows=114 width=30) (never executed)
                                   Filter: ((object_name)::text = (a.object_name)::text)
                 ->  Materialize  (cost=0.00..2358.78 rows=1 width=24) (never executed)
                       ->  Seq Scan on b  (cost=0.00..2358.77 rows=1 width=24) (never executed)
                             Filter: (((object_name)::text = (a.object_name)::text) AND (a.object_id = data_object_id))
         SubPlan 2
           ->  Merge Join  (cost=2066731.77..3002199.73 rows=62196959 width=70) (actual time=3085.476..13208.771 rows=67854336 loops=1)
                 Merge Cond: (d_1.data_object_id = c_1.data_object_id)
                 ->  Sort  (cost=642383.81..654001.92 rows=4647243 width=14) (actual time=853.094..912.878 rows=498689 loops=1)
                       Sort Key: d_1.data_object_id
                       Sort Method: quicksort  Memory: 415522kB
                       ->  Seq Scan on d d_1  (cost=0.00..127749.43 rows=4647243 width=14) (actual time=0.016..445.795 rows=4645440 loops=1)
                 ->  Sort  (cost=1424346.01..1445449.02 rows=8441205 width=36) (actual time=2232.371..4183.017 rows=67854337 loops=1)
                       Sort Key: c_1.data_object_id
                       Sort Method: quicksort  Memory: 1058170kB
                       ->  Hash Join  (cost=2903.16..453226.84 rows=8441205 width=36) (actual time=12.636..1409.609 rows=8841152 loops=1)
                             Hash Cond: ((c_1.object_name)::text = (b_1.object_name)::text)
                             ->  Seq Scan on c c_1  (cost=0.00..127747.96 rows=4647096 width=30) (actual time=0.012..316.364 rows=4645440 loops=1)
                             ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.557..12.558 rows=72585 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 5073kB
                                   ->  Seq Scan on b b_1  (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.006..5.736 rows=72585 loops=1)
 Planning Time: 0.222 ms
 Execution Time: 26965.614 ms

exists子查询中b,c,d加起来才1.2GB,但是要将work_mem设置为6GB PG才能自动对filter优化

在生产环境中,exists里面的表加起来几个GB,几十GB很常见,这个时候要设置多大的work_mem PG才能自动优化呢?

还是老老实实的做SQL审核,SQL等价改写吧,现在来看一下改写之后跑多久

orcl=> show work_mem;
 work_mem
----------
 64MB

orcl=> explain analyze select count(*)
orcl->   from a
orcl->   left join (select b.data_object_id, c.object_name, d.object_type
orcl(>                from b, c, d
orcl(>               where b.object_name = c.object_name
orcl(>                 and c.data_object_id = d.data_object_id
orcl(>               group by b.data_object_id, c.object_name, d.object_type) b
orcl->     on a.object_id = b.data_object_id
orcl->    and a.object_name = b.object_name
orcl->    and a.object_type = b.object_type
orcl->  where a.owner = 'SCOTT'
orcl->     or (b.data_object_id is not null and b.object_name is not null and
orcl(>        b.object_type is not null);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12957110.90..12957110.91 rows=1 width=8) (actual time=16782.213..16844.980 rows=1 loops=1)
   ->  Merge Right Join  (cost=3720449.80..12956932.14 rows=71502 width=0) (actual time=12482.389..16844.339 rows=7141 loops=1)
         Merge Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
         Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
         Rows Removed by Filter: 65444
         ->  Group  (cost=3712593.67..11845927.77 rows=62995559 width=38) (actual time=12445.613..16805.434 rows=7872 loops=1)
               Group Key: b.data_object_id, c.object_name, d.object_type
               ->  Gather Merge  (cost=3712593.67..11373461.07 rows=62995560 width=38) (actual time=12445.611..16800.049 rows=39356 loops=1)
                     Workers Planned: 4
                     Workers Launched: 4
                     ->  Group  (cost=3711593.61..3869082.51 rows=15748890 width=38) (actual time=11649.700..15711.736 rows=8070 loops=5)
                           Group Key: b.data_object_id, c.object_name, d.object_type
                           ->  Sort  (cost=3711593.61..3750965.83 rows=15748890 width=38) (actual time=11649.696..14017.939 rows=13445299 loops=5)
                                 Sort Key: b.data_object_id, c.object_name, d.object_type
                                 Sort Method: external merge  Disk: 627720kB
                                 Worker 0:  Sort Method: external merge  Disk: 538864kB
                                 Worker 1:  Sort Method: external merge  Disk: 597080kB
                                 Worker 2:  Sort Method: external merge  Disk: 660520kB
                                 Worker 3:  Sort Method: external merge  Disk: 553536kB
                                 ->  Merge Join  (cost=1149297.01..1398275.99 rows=15748890 width=38) (actual time=1983.411..4767.583 rows=13570867 loops=5)
                                       Merge Cond: (c.data_object_id = d.data_object_id)
                                       ->  Sort  (cost=427707.49..433060.49 rows=2141199 width=36) (actual time=735.984..793.578 rows=206721 loops=5)
                                             Sort Key: c.data_object_id
                                             Sort Method: external merge  Disk: 63976kB
                                             Worker 0:  Sort Method: external merge  Disk: 54928kB
                                             Worker 1:  Sort Method: external merge  Disk: 62872kB
                                             Worker 2:  Sort Method: external merge  Disk: 68224kB
                                             Worker 3:  Sort Method: external merge  Disk: 59152kB
                                             ->  Parallel Hash Join  (cost=2230.68..144009.05 rows=2141199 width=36) (actual time=8.979..402.091 rows=1768230 loops=5)
                                                   Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                                                   ->  Parallel Seq Scan on c  (cost=0.00..92890.60 rows=1161360 width=30) (actual time=0.013..91.337 rows=929088 loops=5)
                                                   ->  Parallel Hash  (cost=1696.97..1696.97 rows=42697 width=30) (actual time=6.640..6.640 rows=14517 loops=5)
                                                         Buckets: 131072  Batches: 1  Memory Usage: 5376kB
                                                         ->  Parallel Seq Scan on b  (cost=0.00..1696.97 rows=42697 width=30) (actual time=0.007..1.990 rows=14517 loops=5)
                                       ->  Materialize  (cost=721588.23..744816.90 rows=4645734 width=14) (actual time=1247.415..1879.497 rows=13570868 loops=5)
                                             ->  Sort  (cost=721588.23..733202.57 rows=4645734 width=14) (actual time=1247.409..1354.090 rows=498689 loops=5)
                                                   Sort Key: d.data_object_id
                                                   Sort Method: external merge  Disk: 87904kB
                                                   Worker 0:  Sort Method: external merge  Disk: 87904kB
                                                   Worker 1:  Sort Method: external merge  Disk: 87912kB
                                                   Worker 2:  Sort Method: external merge  Disk: 87896kB
                                                   Worker 3:  Sort Method: external merge  Disk: 87888kB
                                                   ->  Seq Scan on d  (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.034..568.742 rows=4645440 loops=5)
         ->  Sort  (cost=7856.14..8037.60 rows=72585 width=43) (actual time=23.339..26.246 rows=72585 loops=1)
               Sort Key: a.object_id, a.object_name, a.object_type
               Sort Method: quicksort  Memory: 10790kB
               ->  Seq Scan on a  (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.016..7.983 rows=72585 loops=1)
 Planning Time: 0.653 ms
 Execution Time: 16908.216 ms

改写完之后跑17秒左右,但是自动开了4个并行,work_mem 为64MB

现在禁止自动并行

orcl=> set max_parallel_workers_per_gather=0;
SET
orcl=> explain analyze select count(*)
  from a
  left join (select b.data_object_id, c.object_name, d.object_type
               from b, c, d
              where b.object_name = c.object_name
                and c.data_object_id = d.data_object_id
              group by b.data_object_id, c.object_name, d.object_type) b
    on a.object_id = b.data_object_id
   and a.object_name = b.object_name
   and a.object_type = b.object_type
 where a.owner = 'SCOTT'
    or (b.data_object_id is not null and b.object_name is not null and
       b.object_type is not null);
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=14952868.49..14952868.50 rows=1 width=8) (actual time=55497.082..55497.086 rows=1 loops=1)
   ->  Merge Right Join  (cost=13219585.91..14952689.74 rows=71502 width=0) (actual time=39422.940..55496.187 rows=7141 loops=1)
         Merge Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
         Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
         Rows Removed by Filter: 65444
         ->  Group  (cost=13211729.77..13841685.36 rows=62995559 width=38) (actual time=39322.008..55460.899 rows=7872 loops=1)
               Group Key: b.data_object_id, c.object_name, d.object_type
               ->  Sort  (cost=13211729.77..13369218.67 rows=62995559 width=38) (actual time=39322.004..48948.069 rows=64897025 loops=1)
                     Sort Key: b.data_object_id, c.object_name, d.object_type
                     Sort Method: external merge  Disk: 2977784kB
                     ->  Merge Join  (cost=2396381.36..3328514.34 rows=62995559 width=38) (actual time=3435.220..13449.670 rows=67854336 loops=1)
                           Merge Cond: (d.data_object_id = c.data_object_id)
                           ->  Sort  (cost=721588.23..733202.57 rows=4645734 width=14) (actual time=888.259..976.106 rows=498689 loops=1)
                                 Sort Key: d.data_object_id
                                 Sort Method: external merge  Disk: 87912kB
                                 ->  Seq Scan on d  (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.026..412.211 rows=4645440 loops=1)
                           ->  Materialize  (cost=1674792.54..1717616.52 rows=8564796 width=36) (actual time=2546.954..4554.523 rows=67854337 loops=1)
                                 ->  Sort  (cost=1674792.54..1696204.53 rows=8564796 width=36) (actual time=2546.951..2709.503 rows=1033601 loops=1)
                                       Sort Key: c.data_object_id
                                       Sort Method: external merge  Disk: 309128kB
                                       ->  Hash Join  (cost=2903.16..454361.32 rows=8564796 width=36) (actual time=13.798..1468.422 rows=8841152 loops=1)
                                             Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                                             ->  Seq Scan on c  (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.015..309.911 rows=4645440 loops=1)
                                             ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30) (actual time=13.719..13.719 rows=72585 loops=1)
                                                   Buckets: 131072  Batches: 1  Memory Usage: 5073kB
                                                   ->  Seq Scan on b  (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.005..6.523 rows=72585 loops=1)
         ->  Sort  (cost=7856.14..8037.60 rows=72585 width=43) (actual time=18.869..21.682 rows=72585 loops=1)
               Sort Key: a.object_id, a.object_name, a.object_type
               Sort Method: quicksort  Memory: 10790kB
               ->  Seq Scan on a  (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.011..5.940 rows=72585 loops=1)
 Planning Time: 0.174 ms
 Execution Time: 55748.392 ms

禁止自动并行之后跑了55秒,执行计划走了一堆sort merge join,现在禁止sort merge join,全走hash,看跑多久

orcl=> set enable_mergejoin=false;
SET
orcl=> explain analyze select count(*)
  from a
  left join (select b.data_object_id, c.object_name, d.object_type
               from b, c, d
              where b.object_name = c.object_name
                and c.data_object_id = d.data_object_id
              group by b.data_object_id, c.object_name, d.object_type) b
    on a.object_id = b.data_object_id
   and a.object_name = b.object_name
   and a.object_type = b.object_type
 where a.owner = 'SCOTT'
    or (b.data_object_id is not null and b.object_name is not null and
       b.object_type is not null);
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=83011075.70..83011075.71 rows=1 width=8) (actual time=53408.678..53408.683 rows=1 loops=1)
   ->  Hash Right Join  (cost=81042285.63..83010896.95 rows=71502 width=0) (actual time=32888.138..53407.748 rows=7141 loops=1)
         Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
         Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
         Rows Removed by Filter: 65444
         ->  Group  (cost=81039019.54..81668975.13 rows=62995559 width=38) (actual time=32789.399..53374.444 rows=8119 loops=1)
               Group Key: b.data_object_id, c.object_name, d.object_type
               ->  Sort  (cost=81039019.54..81196508.44 rows=62995559 width=38) (actual time=32789.396..46520.761 rows=67854336 loops=1)
                     Sort Key: b.data_object_id, c.object_name, d.object_type
                     Sort Method: external merge  Disk: 2977688kB
                     ->  Hash Join  (cost=220458.56..71155804.11 rows=62995559 width=38) (actual time=586.236..11237.270 rows=67854336 loops=1)
                           Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                           ->  Hash Join  (cost=217555.40..68771834.50 rows=34168017 width=32) (actual time=573.644..4750.182 rows=33587200 loops=1)
                                 Hash Cond: (d.data_object_id = c.data_object_id)
                                 ->  Seq Scan on d  (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.038..441.587 rows=4645440 loops=1)
                                 ->  Hash  (cost=127731.40..127731.40 rows=4645440 width=30) (actual time=569.515..569.516 rows=498688 loops=1)
                                       Buckets: 1048576  Batches: 8  Memory Usage: 11678kB
                                       ->  Seq Scan on c  (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.007..406.891 rows=4645440 loops=1)
                           ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.529..12.529 rows=72585 loops=1)
                                 Buckets: 131072  Batches: 1  Memory Usage: 5073kB
                                 ->  Seq Scan on b  (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.005..5.511 rows=72585 loops=1)
         ->  Hash  (cost=1995.85..1995.85 rows=72585 width=43) (actual time=17.795..17.796 rows=72585 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 6526kB
               ->  Seq Scan on a  (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.009..6.135 rows=72585 loops=1)
 Planning Time: 0.110 ms
 Execution Time: 53629.598 ms

禁止sort merge join之后全走hash join要跑53秒,很大一部分耗时发生在Sort上

Sort  (cost=10081039019.54..10081196508.44 rows=62995559 width=38) (actual time=32381.031..46111.439 rows=67854336 loops=1)
                     Sort Key: b.data_object_id, c.object_name, d.object_type
                     Sort Method: external merge  Disk: 2977688kB

看来还是要加大work_mem才行,PG12的 GROUP BY 算法相比O就是个渣渣...有待加强啊!!!

HASH JOIN,SORT MERGE JOIN 也有待加强

orcl=> set work_mem='6GB';
SET
orcl=> explain analyze select count(*)
  from a
  left join (select b.data_object_id, c.object_name, d.object_type
               from b, c, d
              where b.object_name = c.object_name
                and c.data_object_id = d.data_object_id
              group by b.data_object_id, c.object_name, d.object_type) b
    on a.object_id = b.data_object_id
   and a.object_name = b.object_name
   and a.object_type = b.object_type
 where a.owner = 'SCOTT'
    or (b.data_object_id is not null and b.object_name is not null and
       b.object_type is not null);
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73491444.96..73491444.97 rows=1 width=8) (actual time=24071.535..24071.539 rows=1 loops=1)
   ->  Hash Right Join  (cost=71522654.89..73491266.20 rows=71502 width=0) (actual time=23838.264..24071.158 rows=7141 loops=1)
         Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
         Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
         Rows Removed by Filter: 65444
         ->  HashAggregate  (cost=71519388.80..72149344.39 rows=62995559 width=38) (actual time=23819.981..24040.569 rows=8119 loops=1)
               Group Key: b.data_object_id, c.object_name, d.object_type
               ->  Hash Join  (cost=188702.56..71046922.11 rows=62995559 width=38) (actual time=574.309..11306.731 rows=67854336 loops=1)
                     Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                     ->  Hash Join  (cost=185799.40..68662952.50 rows=34168017 width=32) (actual time=561.869..4806.970 rows=33587200 loops=1)
                           Hash Cond: (d.data_object_id = c.data_object_id)
                           ->  Seq Scan on d  (cost=0.00..127734.34 rows=4645734 width=14) (actual time=0.026..350.947 rows=4645440 loops=1)
                           ->  Hash  (cost=127731.40..127731.40 rows=4645440 width=30) (actual time=551.953..551.953 rows=498688 loops=1)
                                 Buckets: 8388608  Batches: 1  Memory Usage: 94476kB
                                 ->  Seq Scan on c  (cost=0.00..127731.40 rows=4645440 width=30) (actual time=0.045..397.380 rows=4645440 loops=1)
                     ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30) (actual time=12.369..12.370 rows=72585 loops=1)
                           Buckets: 131072  Batches: 1  Memory Usage: 5073kB
                           ->  Seq Scan on b  (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.004..5.643 rows=72585 loops=1)
         ->  Hash  (cost=1995.85..1995.85 rows=72585 width=43) (actual time=18.198..18.198 rows=72585 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 6526kB
               ->  Seq Scan on a  (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.016..6.175 rows=72585 loops=1)
 Planning Time: 0.113 ms
 Execution Time: 24363.224 ms

现在测试一下PG14,PG13懒得去测试了

orcl=> select * from version();
                                                  version
------------------------------------------------------------------------------------------------------------
PostgreSQL 14beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

orcl=> show work_mem;
 work_mem
----------
 64MB

还是要将work_mem设置为6GB才能对filter自动优化

orcl=> set work_mem='6GB';
SET
orcl=> explain  select count(*)
orcl->   from a
orcl->  where owner = 'SCOTT'
orcl->     or exists (select null
orcl(>           from b, c, d
orcl(>          where b.object_name = c.object_name
orcl(>            and c.data_object_id = d.data_object_id
orcl(>            and a.object_id = b.data_object_id
orcl(>            and a.object_name = c.object_name
orcl(>            and a.object_type = d.object_type);
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=1752552267.57..1752552267.58 rows=1 width=8)
   ->  Seq Scan on a  (cost=0.00..1752552176.83 rows=36297 width=0)
         Filter: (((owner)::text = 'SCOTT'::text) OR (hashed SubPlan 2))
         SubPlan 2
           ->  Merge Join  (cost=2072764.75..2984181.18 rows=60595987 width=70)
                 Merge Cond: (d.data_object_id = c.data_object_id)
                 ->  Sort  (cost=642048.93..653660.57 rows=4644657 width=14)
                       Sort Key: d.data_object_id
                       ->  Seq Scan on d  (cost=0.00..127719.57 rows=4644657 width=14)
                 ->  Sort  (cost=1430710.80..1452265.14 rows=8621734 width=36)
                       Sort Key: c.data_object_id
                       ->  Hash Join  (cost=2903.16..437506.59 rows=8621734 width=36)
                             Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                             ->  Seq Scan on c  (cost=0.00..127727.45 rows=4645445 width=30)
                             ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30)
                                   ->  Seq Scan on b  (cost=0.00..1995.85 rows=72585 width=30)

将work_mem设置为64MB,看一下改写后的SQL 

orcl=> set work_mem='64MB';
SET
orcl=> explain analyze select count(*)
orcl->   from a
orcl->   left join (select b.data_object_id, c.object_name, d.object_type
orcl(>                from b, c, d
orcl(>               where b.object_name = c.object_name
orcl(>                 and c.data_object_id = d.data_object_id
orcl(>               group by b.data_object_id, c.object_name, d.object_type) b
orcl->     on a.object_id = b.data_object_id
orcl->    and a.object_name = b.object_name
orcl->    and a.object_type = b.object_type
orcl->  where a.owner = 'SCOTT'
orcl->     or (b.data_object_id is not null and b.object_name is not null and
orcl(>        b.object_type is not null);
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11617963.19..11617963.20 rows=1 width=8) (actual time=24227.425..24227.437 rows=1 loops=1)
   ->  Hash Right Join  (cost=8777347.46..11617784.44 rows=71502 width=0) (actual time=24204.902..24227.210 rows=7141 loops=1)
         Hash Cond: ((b.data_object_id = a.object_id) AND ((c.object_name)::text = (a.object_name)::text) AND ((d.object_type)::text = (a.object_type)::text))
         Filter: (((a.owner)::text = 'SCOTT'::text) OR ((b.data_object_id IS NOT NULL) AND (c.object_name IS NOT NULL) AND (d.object_type IS NOT NULL)))
         Rows Removed by Filter: 65444
         ->  HashAggregate  (cost=8774081.37..10326853.54 rows=60595987 width=38) (actual time=24181.780..24183.680 rows=8119 loops=1)
               Group Key: b.data_object_id, c.object_name, d.object_type
               Planned Partitions: 128  Batches: 1  Memory Usage: 13329kB
               ->  Merge Join  (cost=2387890.48..3320442.54 rows=60595987 width=38) (actual time=3486.442..12655.561 rows=67854336 loops=1)
                     Merge Cond: (c.data_object_id = d.data_object_id)
                     ->  Sort  (cost=1666463.80..1688018.14 rows=8621734 width=36) (actual time=2566.337..2728.647 rows=1033601 loops=1)
                           Sort Key: c.data_object_id
                           Sort Method: external merge  Disk: 309136kB
                           ->  Hash Join  (cost=2903.16..437506.59 rows=8621734 width=36) (actual time=17.774..1459.719 rows=8841152 loops=1)
                                 Hash Cond: ((c.object_name)::text = (b.object_name)::text)
                                 ->  Seq Scan on c  (cost=0.00..127727.45 rows=4645445 width=30) (actual time=0.022..300.843 rows=4645440 loops=1)
                                 ->  Hash  (cost=1995.85..1995.85 rows=72585 width=30) (actual time=17.526..17.527 rows=72585 loops=1)
                                       Buckets: 131072  Batches: 1  Memory Usage: 5073kB
                                       ->  Seq Scan on b  (cost=0.00..1995.85 rows=72585 width=30) (actual time=0.006..8.267 rows=72585 loops=1)
                     ->  Materialize  (cost=721425.43..744648.71 rows=4644657 width=14) (actual time=920.094..2794.691 rows=67854337 loops=1)
                           ->  Sort  (cost=721425.43..733037.07 rows=4644657 width=14) (actual time=920.090..986.429 rows=498689 loops=1)
                                 Sort Key: d.data_object_id
                                 Sort Method: external merge  Disk: 87896kB
                                 ->  Seq Scan on d  (cost=0.00..127719.57 rows=4644657 width=14) (actual time=0.017..418.305 rows=4645440 loops=1)
         ->  Hash  (cost=1995.85..1995.85 rows=72585 width=43) (actual time=22.874..22.875 rows=72585 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 6526kB
               ->  Seq Scan on a  (cost=0.00..1995.85 rows=72585 width=43) (actual time=0.015..9.023 rows=72585 loops=1)
 Planning Time: 0.280 ms
 Execution Time: 24276.652 ms

PG14没开并行也能在24秒左右跑完,PG14对GROUP BY算法做了进一步优化,从之前的SORT GROUP BY 改成了 HASH GROUP BY了

 ->  HashAggregate  (cost=8774081.37..10326853.54 rows=60595987 width=38) (actual time=24181.780..24183.680 rows=8119 loops=1)

在Oracle中测试一下

SQL> show parameter sga_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sga_target                           big integer 596M
SQL> show parameter pga_aggregate_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 199M

SQL> show parameter optimizer_feature

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.1

SQL> set timi on autot trace
SQL> select count(*)
  from a
  left join (select b.data_object_id, c.object_name, d.object_type
               from b, c, d
              where b.object_name = c.object_name
                and c.data_object_id = d.data_object_id
              group by b.data_object_id, c.object_name, d.object_type) b
    on a.object_id = b.data_object_id
   and a.object_name = b.object_name
   and a.object_type = b.object_type
 where a.owner = 'SCOTT'
    or (b.data_object_id is not null and b.object_name is not null and
       b.object_type is not null);  2    3    4    5    6    7    8    9   10   11   12   13

Elapsed: 00:00:06.50

Execution Plan
----------------------------------------------------------
Plan hash value: 3005719873

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     1 |   197 |       |   342M (25)|999:59:59 |
|   1 |  SORT AGGREGATE          |      |     1 |   197 |       |            |          |
|*  2 |   FILTER                 |      |       |       |       |            |          |
|*  3 |    HASH JOIN OUTER       |      |    30T|  5484T|  9616K|   342M (25)|999:59:59 |
|   4 |     TABLE ACCESS FULL    | A    | 82734 |  8645K|       |   308   (1)| 00:00:04 |
|   5 |     VIEW                 |      |    53G|  4480G|       |  2913K (84)| 09:42:43 |
|   6 |      HASH GROUP BY       |      |    53G|  9060G|       |  2913K (84)| 09:42:43 |
|*  7 |       HASH JOIN          |      |    53G|  9060G|   166M|   611K (24)| 02:02:19 |
|   8 |        TABLE ACCESS FULL | D    |  4838K|   110M|       | 17975   (1)| 00:03:36 |
|*  9 |        HASH JOIN         |      |    50M|  7616M|  6552K| 35687   (1)| 00:07:09 |
|  10 |         TABLE ACCESS FULL| B    | 73673 |  5683K|       |   308   (1)| 00:00:04 |
|  11 |         TABLE ACCESS FULL| C    |  3940K|   296M|       | 17967   (1)| 00:03:36 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"."OWNER"='SCOTT' OR "B"."DATA_OBJECT_ID" IS NOT NULL AND
              "B"."OBJECT_NAME" IS NOT NULL AND "B"."OBJECT_TYPE" IS NOT NULL)
   3 - access("A"."OBJECT_TYPE"="B"."OBJECT_TYPE"(+) AND
              "A"."OBJECT_NAME"="B"."OBJECT_NAME"(+) AND
              "A"."OBJECT_ID"="B"."DATA_OBJECT_ID"(+))
   7 - access("C"."DATA_OBJECT_ID"="D"."DATA_OBJECT_ID")
   9 - access("B"."OBJECT_NAME"="C"."OBJECT_NAME")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


SQL> alter session set optimizer_features_enable='19.1.0';

Session altered.

SQL> select count(*)
  from a
  left join (select b.data_object_id, c.object_name, d.object_type
               from b, c, d
              where b.object_name = c.object_name
                and c.data_object_id = d.data_object_id
              group by b.data_object_id, c.object_name, d.object_type) b
    on a.object_id = b.data_object_id
   and a.object_name = b.object_name
   and a.object_type = b.object_type
 where a.owner = 'SCOTT'
    or (b.data_object_id is not null and b.object_name is not null and
       b.object_type is not null);  2    3    4    5    6    7    8    9   10   11   12   13

Elapsed: 00:00:03.53

Execution Plan
----------------------------------------------------------
Plan hash value: 998875269

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |   197 |       |  1668K  (2)| 00:01:06 |
|   1 |  SORT AGGREGATE              |           |     1 |   197 |       |            |          |
|*  2 |   FILTER                     |           |       |       |       |            |          |
|*  3 |    HASH JOIN OUTER           |           |   327M|    60G|  9616K|  1668K  (2)| 00:01:06 |
|   4 |     JOIN FILTER CREATE       | :BF0000   | 82734 |  8645K|       |   308   (1)| 00:00:01 |
|   5 |      TABLE ACCESS FULL       | A         | 82734 |  8645K|       |   308   (1)| 00:00:01 |
|   6 |     VIEW                     |           |   327M|    27G|       | 86111  (28)| 00:00:04 |
|   7 |      HASH GROUP BY           |           |   327M|    47G|       | 86111  (28)| 00:00:04 |
|   8 |       JOIN FILTER USE        | :BF0000   |   327M|    47G|       | 74905  (17)| 00:00:03 |
|   9 |        MERGE JOIN            |           |   327M|    47G|       | 74905  (17)| 00:00:03 |
|  10 |         SORT JOIN            |           |   327M|    23G|       | 73240  (17)| 00:00:03 |
|  11 |          VIEW                | VW_GBF_20 |   327M|    23G|       | 73240  (17)| 00:00:03 |
|  12 |           HASH GROUP BY      |           |   327M|    31G|       | 73240  (17)| 00:00:03 |
|* 13 |            HASH JOIN         |           |   327M|    31G|   166M| 62034   (2)| 00:00:03 |
|  14 |             TABLE ACCESS FULL| D         |  4838K|   110M|       | 17975   (1)| 00:00:01 |
|  15 |             TABLE ACCESS FULL| C         |  3940K|   296M|       | 17967   (1)| 00:00:01 |
|* 16 |         SORT JOIN            |           | 73673 |  5683K|    12M|  1664   (1)| 00:00:01 |
|  17 |          VIEW                | VW_GBC_19 | 73673 |  5683K|       |   310   (1)| 00:00:01 |
|  18 |           HASH GROUP BY      |           | 73673 |  5683K|       |   310   (1)| 00:00:01 |
|  19 |            TABLE ACCESS FULL | B         | 73673 |  5683K|       |   308   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"."OWNER"='SCOTT' OR "B"."DATA_OBJECT_ID" IS NOT NULL AND
              "B"."OBJECT_NAME" IS NOT NULL AND "B"."OBJECT_TYPE" IS NOT NULL)
   3 - access("A"."OBJECT_ID"="B"."DATA_OBJECT_ID"(+) AND
              "A"."OBJECT_NAME"="B"."OBJECT_NAME"(+) AND "A"."OBJECT_TYPE"="B"."OBJECT_TYPE"(+))
  13 - access("C"."DATA_OBJECT_ID"="D"."DATA_OBJECT_ID")
  16 - access("ITEM_1"="ITEM_1")
       filter("ITEM_1"="ITEM_1")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
        181  recursive calls
          0  db block gets
     134108  consistent gets
     135022  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        910  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

Oracle11g只需要6.5秒,19c只需要3.5秒(JOIN FILTER),SGA 596MB PGA199MB

PG12.7将work_mem设置为6GB之后还要24秒...FUCK

PG13没去测试,懒得去测试了

PG14将work_mem设置为64MB要24秒

总结:PG14 GROUP BY算法相比PG12 从SORT GROUP BY改成了HASH GROUP BY,能大大节约work_mem,但是相比O,还是有很大进步空间

以上是关于PostgreSQL对or exists产生的filter优化二的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL对or exists产生的filter优化二

PostgreSQL对or exists产生的filter优化二

PostgreSQL对or exists产生的filter优化一

PostgreSQL对or exists产生的filter优化一

PostgreSQL对or exists的优化

postgresql----IN&&EXISTS