多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?

Posted

技术标签:

【中文标题】多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?【英文标题】:Query performance is really bad with multiple joins and unions, is there any other way to improve execution time for this? 【发布时间】:2021-04-15 08:56:12 【问题描述】:

我在存储过程中运行此查询,但我想提高性能,因为它需要数小时才能在受影响的 1320 万行上执行。有什么方法可以提高性能?

我正在使用 Postgresql + PgAdmin。

查询:

INSERT INTO t_temporary_id_table ( 
    ref_date,
    id,
    client, 
    slcunda,
    count_single_1,         
    count_double_2,         
    count_e, 
    count_g, 
    count_m, 
    active, 
    valid_till_max, 
    id_2, 
    created ,
    lastmodified 
    )
with 
cte_tmp as (
    select 
        a.id,               
        mm.tenant,          
        c.slcunda,              
        d.single,           
        d.count_single,   
        e.double_1,         
        e.count_double_1,     
        SUM (CASE WHEN id_role = 'E' THEN 1 ELSE 0 END) AS "count_e",  
        SUM (CASE WHEN id_role = 'G' THEN 1 ELSE 0 END) AS "count_g",  
        SUM (CASE WHEN id_role = 'M' THEN 1 ELSE 0 END) AS "count_m",  
        case when min(status)='active' then 1 else 0 end active, 
        MAX(valid_till) as valid_till_max
        
    from schema1.struct a
    inner join 
    (
    select 
        id,
        max(valid_till) valid_till_max
    from 
        schema1.struct a
    group by 
        id
    ) b
    on 
    a.id=b.id and a.valid_till = b.valid_till_max
    left outer join 
    schema2.tenants mm on a.tsl_1_2 = mm.tenant 
    left outer join (
        select 
            id,
            key_1 as slcunda
        from    
            schema1.t_id 
        where 
            id in (Select id from schema1.t_id group by id having count(id)=1) 
    ) c
    on a.id=c.id
    left outer join(
        select 
        id,
        count_single,
        single 
        from (
            select
                id, 
                id_2 as single,
                id_2_role,
                count(id) over(partition by id) as count_single,
                row_number() over(partition by id order by id, id_2_role desc) as rn
            from 
                schema1.different_id_2
            where 
                id_2_role in ('03','08','17')   
            ) a
        where rn=1
    ) d
    on a.id=d.id
    left outer join(
        select 
        id,
        count_double_1,
        double_1
        from (
            select
                id, 
                id_2 as double_1,
                id_2_role,
                count(id) over(partition by id) as count_double_1,
                row_number() over(partition by id order by id, id_2_role desc) as rn 
            from 
                schema1.different_id_2
            where 
                id_2_role in ('06','19')    
        ) a
    where rn=1
    ) e 
    on a.id=e.id    
    group by a.id,mm.client,c.slcunda,d.single,d.count_single,e.double_1,e.count_double_1
),
y as ( 
    select * 
    from (
        SELECT 
            cte_tmp.id,
            client,
            slcunda,
            count_single as count_single_1, 
            count_double_1 as count_double_2, 
            count_e,
            count_g,
            count_m,
            active,
            valid_till_max,
            b.id_2 
        FROM 
            cte_tmp
        inner join (
            select 
                id,
                id_2
            from (
                select 
                    id,
                    id_theory as id_2,
                    row_number() over(partition by id order by id) rn
                from
                    schema1.struct
            ) a 
        where rn=1  
        ) b
        on cte_tmp.id=b.id
        where
            count_e=1 and count_g=0 and count_m=0 and count_single=0 
        union all   
        SELECT 
            id,
            client,
            slcunda,
            count_single as count_single_1, 
            count_double_1 as count_double_2, 
            count_e,
            count_g,
            count_m,
            active,
            valid_till_max,
            single as id_2 
        FROM 
            cte_tmp
        where 
            count_e=1 and count_g=0 and count_m=0 and count_single>=1
        union all
        SELECT
            id,
            client,
            slcunda,
            count_single as count_single_1, 
            count_double_1 as count_double_2, 
            count_e,
            count_g,
            count_m,
            active,
            valid_till_max,
            double_1 as id_2 
        FROM 
            cte_tmp
        where
            count_e=0 and count_g=1 and count_m>=1 and active=1 and count_double>=1
        union all
        SELECT 
            id,
            client,
            slcunda,
            count_single as count_single_1, 
            count_double_1 as count_double_2, 
            count_e,
            count_g,
            count_m,
            active,
            valid_till_max,
            double_1 as id_2 
        FROM 
            cte_tmp
        where 
            count_e<>1 and count_g<>0 and count_m<>=0 and active=0 and count_double>=1
    ) a 
), 


z as (
    SELECT 
        cte_tmp.id,
        client,
        slcunda,
        count_single as count_single_1, 
        count_double_1 as count_double_2,
        count_e,
        count_g,
        count_m,
        active,
        valid_till_max
    FROM cte_tmp
    except
    select 
        id,
        client,
        slcunda,
        count_single_1,
        count_double_2,
        count_e,
        count_g,
        count_m,
        active,
        valid_till_max
    from y
),

temporary_result as (
    select 
        id::bigint,
        client,
        slcunda,
        count_single_1,
        count_double_2,
        count_e,
        count_g,
        count_m,
        active,
        valid_till_max,
        '' as id_2
    from z
    union all
    select 
        id::bigint,
        client,
        slcunda,
        count_single_1,
        count_double_2,
        count_e,
        count_g,
        count_m,
        active,
        valid_till_max,
        id_2
    from y
)
select 
    now(),
    id,
    client,
    slcunda,
    count_single_1, 
    count_double_2, 
    count_e,
    count_g,
    count_m,
    active,
    valid_till_max,
    id_2::bigint,
    now(),
    now()
from temporary_result
QUERY PLAN
Insert on t_temporary_id_table  (cost=841905.48..842027.08 rows=206 width=48) (actual time=28853.074..28853.111 rows=0 loops=1)
  Buffers: shared hit=111871, local hit=1010060 read=2 dirtied=9372 written=18742, temp read=114117 written=92895
  I/O Timings: read=0.007
  ->  Subquery Scan on "*SELECT*"  (cost=841905.48..842027.08 rows=206 width=48) (actual time=26620.987..28062.291 rows=991322 loops=1)
        Buffers: shared hit=111871, temp read=114117 written=92895
        ->  Result  (cost=841905.48..842019.35 rows=206 width=102) (actual time=26620.981..27612.767 rows=991322 loops=1)
              Buffers: shared hit=111871, temp read=114117 written=92895
              CTE cte_tmp
                ->  GroupAggregate  (cost=655151.28..655242.89 rows=1745 width=97) (actual time=20685.859..22471.231 rows=991322 loops=1)
                      Group Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1
                      Buffers: shared hit=91523, temp read=77862 written=77948
                      ->  Sort  (cost=655151.28..655155.64 rows=1745 width=78) (actual time=20685.833..21282.869 rows=999585 loops=1)
                            Sort Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1
                            Sort Method: external merge  Disk: 52920kB
                            Buffers: shared hit=91523, temp read=77862 written=77948
                            ->  Hash Left Join  (cost=605059.82..655057.32 rows=1745 width=78) (actual time=14404.197..19018.340 rows=999585 loops=1)
                                  Hash Cond: (a. tsl_1_2 = (mm.tenant)::numeric)
                                  Buffers: shared hit=91523, temp read=67869 written=67928
                                  ->  Hash Right Join  (cost=605057.61..655028.93 rows=1745 width=79) (actual time=14404.146..18397.854 rows=999585 loops=1)
                                        Hash Cond: (a_2.id = a.id)
                                        Buffers: shared hit=91522, temp read=67869 written=67928
                                        ->  Subquery Scan on a_2  (cost=143319.32..193271.82 rows=4995 width=33) (actual time=1706.052..3753.365 rows=999246 loops=1)
                                              Filter: (a_2.rn = 1)
                                              Buffers: shared hit=7358, temp read=7362 written=7387
                                              ->  WindowAgg  (cost=143319.32..180783.69 rows=999050 width=46) (actual time=1706.049..3584.785 rows=999246 loops=1)
                                                    Buffers: shared hit=7358, temp read=7362 written=7387
                                                    ->  WindowAgg  (cost=143319.32..165797.94 rows=999050 width=38) (actual time=1706.034..2812.484 rows=999246 loops=1)
                                                          Buffers: shared hit=7358, temp read=7362 written=7387
                                                          ->  Sort  (cost=143319.32..145816.94 rows=999050 width=30) (actual time=1706.021..2170.216 rows=999246 loops=1)
                                                                Sort Key: different_id_2.id, different_id_2.id_2_role DESC
                                                                Sort Method: external merge  Disk: 40200kB
                                                                Buffers: shared hit=7358, temp read=7362 written=7387
                                                                ->  Seq Scan on different_id_2  (cost=0.00..19858.00 rows=999050 width=30) (actual time=0.029..419.960 rows=999246 loops=1)
                                                                      Filter: (id_2_role = ANY ('6,19'::numeric[]))
                                                                      Rows Removed by Filter: 754
                                                                      Buffers: shared hit=7358
                                        ->  Hash  (cost=461716.48..461716.48 rows=1745 width=54) (actual time=12696.847..12696.862 rows=999585 loops=1)
                                              Buckets: 65536 (originally 2048)  Batches: 32 (originally 1)  Memory Usage: 3585kB
                                              Buffers: shared hit=84164, temp read=45823 written=51723
                                              ->  Hash Left Join  (cost=407582.29..461716.48 rows=1745 width=54) (actual time=8198.967..12034.422 rows=999585 loops=1)
                                                    Hash Cond: (a.id = a_1.id)
                                                    Buffers: shared hit=84164, temp read=45823 written=45857
                                                    ->  Hash Left Join  (cost=386461.36..440588.99 rows=1745 width=29) (actual time=7848.284..11342.813 rows=999585 loops=1)
                                                          Hash Cond: (a.id = t_id.id)
                                                          Buffers: shared hit=76806, temp read=45823 written=45857
                                                          ->  Hash Join  (cost=194219.00..248340.00 rows=1745 width=26) (actual time=2108.327..4049.878 rows=999585 loops=1)
                                                                Hash Cond: ((a.id = struct.id) AND (a.valid_till = (max(struct.valid_till))))
                                                                Buffers: shared hit=40696, temp read=13666 written=13683
                                                                ->  Seq Scan on struct a  (cost=0.00..30348.00 rows=1000000 width=26) (actual time=0.016..323.130 rows=1000000 loops=1)
                                                                      Buffers: shared hit=20348
                                                                ->  Hash  (cost=174465.86..174465.86 rows=993476 width=12) (actual time=2107.791..2107.793 rows=991322 loops=1)
                                                                      Buckets: 131072  Batches: 16  Memory Usage: 3929kB
                                                                      Buffers: shared hit=20348, temp read=3902 written=7995
                                                                      ->  GroupAggregate  (cost=147096.34..164531.10 rows=993476 width=12) (actual time=884.285..1782.278 rows=991322 loops=1)
                                                                            Group Key: struct.id
                                                                            Buffers: shared hit=20348, temp read=3902 written=3919
                                                                            ->  Sort  (cost=147096.34..149596.34 rows=1000000 width=12) (actual time=884.272..1368.369 rows=1000000 loops=1)
                                                                                  Sort Key: struct.id
                                                                                  Sort Method: external merge  Disk: 25496kB
                                                                                  Buffers: shared hit=20348, temp read=3902 written=3919
                                                                                  ->  Seq Scan on struct  (cost=0.00..30348.00 rows=1000000 width=12) (actual time=0.007..308.301 rows=1000000 loops=1)
                                                                                        Buffers: shared hit=20348
                                                          ->  Hash  (cost=192179.85..192179.85 rows=5000 width=11) (actual time=5738.432..5738.436 rows=1000000 loops=1)
                                                                Buckets: 131072 (originally 8192)  Batches: 16 (originally 1)  Memory Usage: 3748kB
                                                                Buffers: shared hit=36110, temp read=22791 written=26489
                                                                ->  Hash Join  (cost=161499.84..192179.85 rows=5000 width=11) (actual time=1913.550..5318.612 rows=1000000 loops=1)
                                                                      Hash Cond: (t_id.id = t_id_1.id)
                                                                      Buffers: shared hit=36110, temp read=22791 written=22808
                                                                      ->  Seq Scan on t_id (cost=0.00..28055.00 rows=1000000 width=11) (actual time=0.008..182.515 rows=1000000 loops=1)
                                                                            Buffers: shared hit=18055
                                                                      ->  Hash  (cost=161437.34..161437.34 rows=5000 width=8) (actual time=1913.370..1913.372 rows=1000000 loops=1)
                                                                            Buckets: 131072 (originally 8192)  Batches: 16 (originally 1)  Memory Usage: 3548kB
                                                                            Buffers: shared hit=18055, temp read=2861 written=6190
                                                                            ->  GroupAggregate  (cost=141387.34..161387.34 rows=5000 width=8) (actual time=726.580..1619.438 rows=1000000 loops=1)
                                                                                  Group Key: t_id_1.id
                                                                                  Filter: (count(t_id_1.id) = 1)
                                                                                  Buffers: shared hit=18055, temp read=2861 written=2878
                                                                                  ->  Sort  (cost=141387.34..143887.34 rows=1000000 width=8) (actual time=726.568..1204.865 rows=1000000 loops=1)
                                                                                        Sort Key: t_id_1.id
                                                                                        Sort Method: external merge  Disk: 18704kB
                                                                                        Buffers: shared hit=18055, temp read=2861 written=2878
                                                                                        ->  Seq Scan on t_id t_id_1  (cost=0.00..28055.00 rows=1000000 width=8) (actual time=0.004..156.615 rows=1000000 loops=1)
                                                                                              Buffers: shared hit=18055
                                                    ->  Hash  (cost=21120.92..21120.92 rows=1 width=33) (actual time=350.644..350.647 rows=143 loops=1)
                                                          Buckets: 1024  Batches: 1  Memory Usage: 19kB
                                                          Buffers: shared hit=7358
                                                          ->  Subquery Scan on a_1  (cost=21113.42..21120.92 rows=1 width=33) (actual time=350.339..350.604 rows=143 loops=1)
                                                                Filter: (a_1.rn = 1)
                                                                Buffers: shared hit=7358
                                                                ->  WindowAgg  (cost=21113.42..21119.05 rows=150 width=46) (actual time=350.335..350.578 rows=143 loops=1)
                                                                      Buffers: shared hit=7358
                                                                      ->  WindowAgg  (cost=21113.42..21116.80 rows=150 width=38) (actual time=350.322..350.440 rows=143 loops=1)
                                                                            Buffers: shared hit=7358
                                                                            ->  Sort  (cost=21113.42..21113.80 rows=150 width=30) (actual time=350.302..350.318 rows=143 loops=1)
                                                                                  Sort Key: different_id_2_1.id, different_id_2_1.id_2_role DESC
                                                                                  Sort Method: quicksort  Memory: 36kB
                                                                                  Buffers: shared hit=7358
                                                                                  ->  Seq Scan on different_id_2 different_id_2_1  (cost=0.00..21108.00 rows=150 width=30) (actual time=0.987..350.004 rows=143 loops=1)
                                                                                        Filter: (id_2_role = ANY ('03,08,17'::numeric[]))
                                                                                        Rows Removed by Filter: 999857
                                                                                        Buffers: shared hit=7358
                                  ->  Hash  (cost=1.54..1.54 rows=54 width=8) (actual time=0.037..0.038 rows=54 loops=1)
                                        Buckets: 1024  Batches: 1  Memory Usage: 11kB
                                        Buffers: shared hit=1
                                        ->  Seq Scan on tenants mm  (cost=0.00..1.54 rows=54 width=8) (actual time=0.015..0.023 rows=54 loops=1)
                                              Buffers: shared hit=1
              CTE y
                ->  Append  (cost=153984.20..186662.59 rows=6 width=122) (actual time=1196.616..1645.006 rows=3588 loops=1)
                      Buffers: shared hit=20348, temp read=36255 written=6487
                      ->  Merge Join  (cost=153984.20..186496.72 rows=1 width=95) (actual time=1195.226..1195.229 rows=0 loops=1)
                            Merge Cond: (a_3.id = cte_tmp_1.id)
                            Buffers: shared hit=20348, temp read=10872 written=6487
                            ->  Subquery Scan on a_3  (cost=153931.84..186431.84 rows=5000 width=25) (actual time=1034.841..1034.842 rows=1 loops=1)
                                  Filter: (a_3.rn = 1)
                                  Buffers: shared hit=20348, temp read=2411 written=6486
                                  ->  WindowAgg  (cost=153931.84..173931.84 rows=1000000 width=33) (actual time=1034.839..1034.840 rows=1 loops=1)
                                        Buffers: shared hit=20348, temp read=2411 written=6486
                                        ->  Sort  (cost=153931.84..156431.84 rows=1000000 width=25) (actual time=1034.824..1034.826 rows=2 loops=1)
                                              Sort Key: struct_1.id
                                              Sort Method: external merge  Disk: 35272kB
                                              Buffers: shared hit=20348, temp read=2411 written=6486
                                              ->  Seq Scan on struct struct_1  (cost=0.00..30348.00 rows=1000000 width=25) (actual time=0.019..229.951 rows=1000000 loops=1)
                                                    Buffers: shared hit=20348
                            ->  Sort  (cost=52.36..52.37 rows=1 width=78) (actual time=160.379..160.379 rows=0 loops=1)
                                  Sort Key: cte_tmp_1.id
                                  Sort Method: quicksort  Memory: 25kB
                                  Buffers: temp read=8461 written=1
                                  ->  CTE Scan on cte_tmp cte_tmp_1  (cost=0.00..52.35 rows=1 width=78) (actual time=160.369..160.369 rows=0 loops=1)
                                        Filter: ((count_e = 1) AND (count_g = 0) AND (count_m = 0) AND (count_single = 0))
                                        Rows Removed by Filter: 991322
                                        Buffers: temp read=8461 written=1
                      ->  CTE Scan on cte_tmp cte_tmp_2  (cost=0.00..52.35 rows=1 width=128) (actual time=1.387..145.031 rows=5 loops=1)
                            Filter: ((count_single>= 1) AND (count_e = 1) AND (count_g = 0) AND (count_m = 0))
                            Rows Removed by Filter: 991317
                            Buffers: temp read=8461
                      ->  CTE Scan on cte_tmp cte_tmp_3  (cost=0.00..56.71 rows=1 width=128) (actual time=0.054..151.483 rows=3579 loops=1)
                            Filter: ((count_m >= 1) AND (count_double >= 1) AND (count_e = 0) AND (count_g = 1) AND (active = 1))
                            Rows Removed by Filter: 987743
                            Buffers: temp read=8461
                      ->  CTE Scan on cte_tmp cte_tmp_4  (cost=0.00..56.71 rows=3 width=128) (actual time=31.046..152.850 rows=4 loops=1)
                            Filter: ((count_e <> 1) AND (count_g <> 0) AND (count_m <> 0) AND (count_double >= 1) AND (active = 0))
                            Rows Removed by Filter: 991318
                            Buffers: temp read=8461
              ->  Append  (cost=0.00..108.73 rows=206 width=103) (actual time=26620.976..27439.782 rows=991322 loops=1)
                    Buffers: shared hit=111871, temp read=114117 written=92895
                    ->  Subquery Scan on z  (cost=0.00..107.56 rows=200 width=102) (actual time=26620.975..27356.994 rows=987734 loops=1)
                          Buffers: shared hit=111871, temp read=114117 written=92895
                          ->  HashSetOp Except  (cost=0.00..105.06 rows=200 width=82) (actual time=26620.971..27011.650 rows=987734 loops=1)
                                Buffers: shared hit=111871, temp read=114117 written=92895
                                ->  Append  (cost=0.00..61.28 rows=1751 width=82) (actual time=20685.869..25367.475 rows=994910 loops=1)
                                      Buffers: shared hit=111871, temp read=114117 written=92895
                                      ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..52.35 rows=1745 width=82) (actual time=20685.867..23595.424 rows=991322 loops=1)
                                            Buffers: shared hit=91523, temp read=77862 written=86408
                                            ->  CTE Scan on cte_tmp  (cost=0.00..34.90 rows=1745 width=78) (actual time=20685.866..23423.809 rows=991322 loops=1)
                                                  Buffers: shared hit=91523, temp read=77862 written=86408
                                      ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.18 rows=6 width=82) (actual time=1196.623..1648.210 rows=3588 loops=1)
                                            Buffers: shared hit=20348, temp read=36255 written=6487
                                            ->  CTE Scan on y y_1  (cost=0.00..0.12 rows=6 width=78) (actual time=1196.621..1647.533 rows=3588 loops=1)
                                                  Buffers: shared hit=20348, temp read=36255 written=6487
                    ->  CTE Scan on y  (cost=0.00..0.14 rows=6 width=120) (actual time=0.004..1.137 rows=3588 loops=1)
Planning Time: 1.647 ms
Execution Time: 28921.102 ms

编辑:添加了 EXPLAIN(ANALYZE, BUFFERS, FORMAT TEXT) 结果,但如何使用它来提高性能?

【问题讨论】:

你为什么首先使用 COALESCE() ?你不需要它 也许常数有时是-1。 我已经删除了我得到-1的情况,所以现在我从查询中删除了合并 执行计划不可读。与psql 连接,运行EXPLAIN (ANALYZE, BUFFERS /* 您的查询 */` 并复制并粘贴结果。 在不访问真实系统的情况下调整复杂的语句非常困难。但是随着一些计划步骤溢出到磁盘,您可以通过增加work_mem 来节省几秒钟,例如set work_mem = '256MB'; - 显然这取决于您的服务器有多少内存以及正在运行的并发查询数 【参考方案1】:
AND count_d >= 1

NULL count_d 将无法通过该测试,就像0 一样,因此根本不需要COALESCE(count_d, 0)

【讨论】:

【参考方案2】:

您没有显示执行计划,但鉴于您有正确的索引,coalesce 不一定是问题。

假设所有这些条件都是选择性的,也就是说,它们大大缩小了结果集,理想的索引应该是

CREATE INDEX ON table_a (count_e, count_f, count_g, coalesce(count_d, 0));

如果其中一个条件不是选择性的,则将其从索引中省略。

【讨论】:

我已经添加了执行计划 无法读取。您在问题中引用的函数似乎不包含 coalesce 我已经删除了合并,因为 -1 的情况已经从数据中删除了。 EXPLAIN (ANALYZE, BUFFERS 我的查询花费了 2 多个小时。但是,仅 EXPLAIN 就可以正常工作,它会有用吗?我可以将它添加到问题中。 @劳伦兹 EXPLAIN (ANALYZE, BUFFERS) 就足够了——但它必须是可读的。 我已经添加了EXPLAIN的结果,它可读吗? @Laurenz Albe【参考方案3】:

这回答了关于单个表的原始问题。从那以后,这个问题发生了巨大的变化。

coalesce() 是多余的,因为NULL 会使条件失败,所以:

where count_e = 1 and count_f = 0 and count_g = 0 and count_d >= 1

但是,删除coalesce() 是否会影响性能确实值得怀疑。

会改变性能的是这样编写查询,然后在(count_e, count_f, count_g, count_d) 上添加索引。索引中前三列的顺序并不重要,但count_d 应该在这些列之后,因为不等式。

【讨论】:

我已经添加了索引,完整的查询仍然需要一个多小时才能完成。受影响的行数超过 1300 万行 @SaadMustafa 。 . .您的执行计划表明查询比您显示的要复杂得多。 我将添加完整的查询,以便更容易理解@Gordon Linoff 我已经添加了完整的查询@Gordon Linoff 你快到了。使用FORMAT TEXT 代替FORMAT JSON 并添加ANALYZEBUFFER 选项。

以上是关于多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?的主要内容,如果未能解决你的问题,请参考以下文章

使用多个联合优化查询性能

mysql联合查询

MySQL 通过使用连接查询来优化联合查询

MySQL数据库联合查询与连接查询

UICollectionView 的实际性能真的很差

MySql:联合查询