多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?
Posted
技术标签:
【中文标题】多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?【英文标题】:Query performance is really bad with multiple joins and unions, is there any other way to improve execution time for this? 【发布时间】:2021-04-15 08:56:12 【问题描述】:我在存储过程中运行此查询,但我想提高性能,因为它需要数小时才能在受影响的 1320 万行上执行。有什么方法可以提高性能?
我正在使用 Postgresql + PgAdmin。
查询:
INSERT INTO t_temporary_id_table (
ref_date,
id,
client,
slcunda,
count_single_1,
count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
id_2,
created ,
lastmodified
)
with
cte_tmp as (
select
a.id,
mm.tenant,
c.slcunda,
d.single,
d.count_single,
e.double_1,
e.count_double_1,
SUM (CASE WHEN id_role = 'E' THEN 1 ELSE 0 END) AS "count_e",
SUM (CASE WHEN id_role = 'G' THEN 1 ELSE 0 END) AS "count_g",
SUM (CASE WHEN id_role = 'M' THEN 1 ELSE 0 END) AS "count_m",
case when min(status)='active' then 1 else 0 end active,
MAX(valid_till) as valid_till_max
from schema1.struct a
inner join
(
select
id,
max(valid_till) valid_till_max
from
schema1.struct a
group by
id
) b
on
a.id=b.id and a.valid_till = b.valid_till_max
left outer join
schema2.tenants mm on a.tsl_1_2 = mm.tenant
left outer join (
select
id,
key_1 as slcunda
from
schema1.t_id
where
id in (Select id from schema1.t_id group by id having count(id)=1)
) c
on a.id=c.id
left outer join(
select
id,
count_single,
single
from (
select
id,
id_2 as single,
id_2_role,
count(id) over(partition by id) as count_single,
row_number() over(partition by id order by id, id_2_role desc) as rn
from
schema1.different_id_2
where
id_2_role in ('03','08','17')
) a
where rn=1
) d
on a.id=d.id
left outer join(
select
id,
count_double_1,
double_1
from (
select
id,
id_2 as double_1,
id_2_role,
count(id) over(partition by id) as count_double_1,
row_number() over(partition by id order by id, id_2_role desc) as rn
from
schema1.different_id_2
where
id_2_role in ('06','19')
) a
where rn=1
) e
on a.id=e.id
group by a.id,mm.client,c.slcunda,d.single,d.count_single,e.double_1,e.count_double_1
),
y as (
select *
from (
SELECT
cte_tmp.id,
client,
slcunda,
count_single as count_single_1,
count_double_1 as count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
b.id_2
FROM
cte_tmp
inner join (
select
id,
id_2
from (
select
id,
id_theory as id_2,
row_number() over(partition by id order by id) rn
from
schema1.struct
) a
where rn=1
) b
on cte_tmp.id=b.id
where
count_e=1 and count_g=0 and count_m=0 and count_single=0
union all
SELECT
id,
client,
slcunda,
count_single as count_single_1,
count_double_1 as count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
single as id_2
FROM
cte_tmp
where
count_e=1 and count_g=0 and count_m=0 and count_single>=1
union all
SELECT
id,
client,
slcunda,
count_single as count_single_1,
count_double_1 as count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
double_1 as id_2
FROM
cte_tmp
where
count_e=0 and count_g=1 and count_m>=1 and active=1 and count_double>=1
union all
SELECT
id,
client,
slcunda,
count_single as count_single_1,
count_double_1 as count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
double_1 as id_2
FROM
cte_tmp
where
count_e<>1 and count_g<>0 and count_m<>=0 and active=0 and count_double>=1
) a
),
z as (
SELECT
cte_tmp.id,
client,
slcunda,
count_single as count_single_1,
count_double_1 as count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max
FROM cte_tmp
except
select
id,
client,
slcunda,
count_single_1,
count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max
from y
),
temporary_result as (
select
id::bigint,
client,
slcunda,
count_single_1,
count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
'' as id_2
from z
union all
select
id::bigint,
client,
slcunda,
count_single_1,
count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
id_2
from y
)
select
now(),
id,
client,
slcunda,
count_single_1,
count_double_2,
count_e,
count_g,
count_m,
active,
valid_till_max,
id_2::bigint,
now(),
now()
from temporary_result
QUERY PLAN
Insert on t_temporary_id_table (cost=841905.48..842027.08 rows=206 width=48) (actual time=28853.074..28853.111 rows=0 loops=1)
Buffers: shared hit=111871, local hit=1010060 read=2 dirtied=9372 written=18742, temp read=114117 written=92895
I/O Timings: read=0.007
-> Subquery Scan on "*SELECT*" (cost=841905.48..842027.08 rows=206 width=48) (actual time=26620.987..28062.291 rows=991322 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
-> Result (cost=841905.48..842019.35 rows=206 width=102) (actual time=26620.981..27612.767 rows=991322 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
CTE cte_tmp
-> GroupAggregate (cost=655151.28..655242.89 rows=1745 width=97) (actual time=20685.859..22471.231 rows=991322 loops=1)
Group Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1
Buffers: shared hit=91523, temp read=77862 written=77948
-> Sort (cost=655151.28..655155.64 rows=1745 width=78) (actual time=20685.833..21282.869 rows=999585 loops=1)
Sort Key: a.id, mm.tenant, t_id.key_1, a_1.single, a_1.count_single, a_2.double_1, a_2.count_double_1
Sort Method: external merge Disk: 52920kB
Buffers: shared hit=91523, temp read=77862 written=77948
-> Hash Left Join (cost=605059.82..655057.32 rows=1745 width=78) (actual time=14404.197..19018.340 rows=999585 loops=1)
Hash Cond: (a. tsl_1_2 = (mm.tenant)::numeric)
Buffers: shared hit=91523, temp read=67869 written=67928
-> Hash Right Join (cost=605057.61..655028.93 rows=1745 width=79) (actual time=14404.146..18397.854 rows=999585 loops=1)
Hash Cond: (a_2.id = a.id)
Buffers: shared hit=91522, temp read=67869 written=67928
-> Subquery Scan on a_2 (cost=143319.32..193271.82 rows=4995 width=33) (actual time=1706.052..3753.365 rows=999246 loops=1)
Filter: (a_2.rn = 1)
Buffers: shared hit=7358, temp read=7362 written=7387
-> WindowAgg (cost=143319.32..180783.69 rows=999050 width=46) (actual time=1706.049..3584.785 rows=999246 loops=1)
Buffers: shared hit=7358, temp read=7362 written=7387
-> WindowAgg (cost=143319.32..165797.94 rows=999050 width=38) (actual time=1706.034..2812.484 rows=999246 loops=1)
Buffers: shared hit=7358, temp read=7362 written=7387
-> Sort (cost=143319.32..145816.94 rows=999050 width=30) (actual time=1706.021..2170.216 rows=999246 loops=1)
Sort Key: different_id_2.id, different_id_2.id_2_role DESC
Sort Method: external merge Disk: 40200kB
Buffers: shared hit=7358, temp read=7362 written=7387
-> Seq Scan on different_id_2 (cost=0.00..19858.00 rows=999050 width=30) (actual time=0.029..419.960 rows=999246 loops=1)
Filter: (id_2_role = ANY ('6,19'::numeric[]))
Rows Removed by Filter: 754
Buffers: shared hit=7358
-> Hash (cost=461716.48..461716.48 rows=1745 width=54) (actual time=12696.847..12696.862 rows=999585 loops=1)
Buckets: 65536 (originally 2048) Batches: 32 (originally 1) Memory Usage: 3585kB
Buffers: shared hit=84164, temp read=45823 written=51723
-> Hash Left Join (cost=407582.29..461716.48 rows=1745 width=54) (actual time=8198.967..12034.422 rows=999585 loops=1)
Hash Cond: (a.id = a_1.id)
Buffers: shared hit=84164, temp read=45823 written=45857
-> Hash Left Join (cost=386461.36..440588.99 rows=1745 width=29) (actual time=7848.284..11342.813 rows=999585 loops=1)
Hash Cond: (a.id = t_id.id)
Buffers: shared hit=76806, temp read=45823 written=45857
-> Hash Join (cost=194219.00..248340.00 rows=1745 width=26) (actual time=2108.327..4049.878 rows=999585 loops=1)
Hash Cond: ((a.id = struct.id) AND (a.valid_till = (max(struct.valid_till))))
Buffers: shared hit=40696, temp read=13666 written=13683
-> Seq Scan on struct a (cost=0.00..30348.00 rows=1000000 width=26) (actual time=0.016..323.130 rows=1000000 loops=1)
Buffers: shared hit=20348
-> Hash (cost=174465.86..174465.86 rows=993476 width=12) (actual time=2107.791..2107.793 rows=991322 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3929kB
Buffers: shared hit=20348, temp read=3902 written=7995
-> GroupAggregate (cost=147096.34..164531.10 rows=993476 width=12) (actual time=884.285..1782.278 rows=991322 loops=1)
Group Key: struct.id
Buffers: shared hit=20348, temp read=3902 written=3919
-> Sort (cost=147096.34..149596.34 rows=1000000 width=12) (actual time=884.272..1368.369 rows=1000000 loops=1)
Sort Key: struct.id
Sort Method: external merge Disk: 25496kB
Buffers: shared hit=20348, temp read=3902 written=3919
-> Seq Scan on struct (cost=0.00..30348.00 rows=1000000 width=12) (actual time=0.007..308.301 rows=1000000 loops=1)
Buffers: shared hit=20348
-> Hash (cost=192179.85..192179.85 rows=5000 width=11) (actual time=5738.432..5738.436 rows=1000000 loops=1)
Buckets: 131072 (originally 8192) Batches: 16 (originally 1) Memory Usage: 3748kB
Buffers: shared hit=36110, temp read=22791 written=26489
-> Hash Join (cost=161499.84..192179.85 rows=5000 width=11) (actual time=1913.550..5318.612 rows=1000000 loops=1)
Hash Cond: (t_id.id = t_id_1.id)
Buffers: shared hit=36110, temp read=22791 written=22808
-> Seq Scan on t_id (cost=0.00..28055.00 rows=1000000 width=11) (actual time=0.008..182.515 rows=1000000 loops=1)
Buffers: shared hit=18055
-> Hash (cost=161437.34..161437.34 rows=5000 width=8) (actual time=1913.370..1913.372 rows=1000000 loops=1)
Buckets: 131072 (originally 8192) Batches: 16 (originally 1) Memory Usage: 3548kB
Buffers: shared hit=18055, temp read=2861 written=6190
-> GroupAggregate (cost=141387.34..161387.34 rows=5000 width=8) (actual time=726.580..1619.438 rows=1000000 loops=1)
Group Key: t_id_1.id
Filter: (count(t_id_1.id) = 1)
Buffers: shared hit=18055, temp read=2861 written=2878
-> Sort (cost=141387.34..143887.34 rows=1000000 width=8) (actual time=726.568..1204.865 rows=1000000 loops=1)
Sort Key: t_id_1.id
Sort Method: external merge Disk: 18704kB
Buffers: shared hit=18055, temp read=2861 written=2878
-> Seq Scan on t_id t_id_1 (cost=0.00..28055.00 rows=1000000 width=8) (actual time=0.004..156.615 rows=1000000 loops=1)
Buffers: shared hit=18055
-> Hash (cost=21120.92..21120.92 rows=1 width=33) (actual time=350.644..350.647 rows=143 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 19kB
Buffers: shared hit=7358
-> Subquery Scan on a_1 (cost=21113.42..21120.92 rows=1 width=33) (actual time=350.339..350.604 rows=143 loops=1)
Filter: (a_1.rn = 1)
Buffers: shared hit=7358
-> WindowAgg (cost=21113.42..21119.05 rows=150 width=46) (actual time=350.335..350.578 rows=143 loops=1)
Buffers: shared hit=7358
-> WindowAgg (cost=21113.42..21116.80 rows=150 width=38) (actual time=350.322..350.440 rows=143 loops=1)
Buffers: shared hit=7358
-> Sort (cost=21113.42..21113.80 rows=150 width=30) (actual time=350.302..350.318 rows=143 loops=1)
Sort Key: different_id_2_1.id, different_id_2_1.id_2_role DESC
Sort Method: quicksort Memory: 36kB
Buffers: shared hit=7358
-> Seq Scan on different_id_2 different_id_2_1 (cost=0.00..21108.00 rows=150 width=30) (actual time=0.987..350.004 rows=143 loops=1)
Filter: (id_2_role = ANY ('03,08,17'::numeric[]))
Rows Removed by Filter: 999857
Buffers: shared hit=7358
-> Hash (cost=1.54..1.54 rows=54 width=8) (actual time=0.037..0.038 rows=54 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
Buffers: shared hit=1
-> Seq Scan on tenants mm (cost=0.00..1.54 rows=54 width=8) (actual time=0.015..0.023 rows=54 loops=1)
Buffers: shared hit=1
CTE y
-> Append (cost=153984.20..186662.59 rows=6 width=122) (actual time=1196.616..1645.006 rows=3588 loops=1)
Buffers: shared hit=20348, temp read=36255 written=6487
-> Merge Join (cost=153984.20..186496.72 rows=1 width=95) (actual time=1195.226..1195.229 rows=0 loops=1)
Merge Cond: (a_3.id = cte_tmp_1.id)
Buffers: shared hit=20348, temp read=10872 written=6487
-> Subquery Scan on a_3 (cost=153931.84..186431.84 rows=5000 width=25) (actual time=1034.841..1034.842 rows=1 loops=1)
Filter: (a_3.rn = 1)
Buffers: shared hit=20348, temp read=2411 written=6486
-> WindowAgg (cost=153931.84..173931.84 rows=1000000 width=33) (actual time=1034.839..1034.840 rows=1 loops=1)
Buffers: shared hit=20348, temp read=2411 written=6486
-> Sort (cost=153931.84..156431.84 rows=1000000 width=25) (actual time=1034.824..1034.826 rows=2 loops=1)
Sort Key: struct_1.id
Sort Method: external merge Disk: 35272kB
Buffers: shared hit=20348, temp read=2411 written=6486
-> Seq Scan on struct struct_1 (cost=0.00..30348.00 rows=1000000 width=25) (actual time=0.019..229.951 rows=1000000 loops=1)
Buffers: shared hit=20348
-> Sort (cost=52.36..52.37 rows=1 width=78) (actual time=160.379..160.379 rows=0 loops=1)
Sort Key: cte_tmp_1.id
Sort Method: quicksort Memory: 25kB
Buffers: temp read=8461 written=1
-> CTE Scan on cte_tmp cte_tmp_1 (cost=0.00..52.35 rows=1 width=78) (actual time=160.369..160.369 rows=0 loops=1)
Filter: ((count_e = 1) AND (count_g = 0) AND (count_m = 0) AND (count_single = 0))
Rows Removed by Filter: 991322
Buffers: temp read=8461 written=1
-> CTE Scan on cte_tmp cte_tmp_2 (cost=0.00..52.35 rows=1 width=128) (actual time=1.387..145.031 rows=5 loops=1)
Filter: ((count_single>= 1) AND (count_e = 1) AND (count_g = 0) AND (count_m = 0))
Rows Removed by Filter: 991317
Buffers: temp read=8461
-> CTE Scan on cte_tmp cte_tmp_3 (cost=0.00..56.71 rows=1 width=128) (actual time=0.054..151.483 rows=3579 loops=1)
Filter: ((count_m >= 1) AND (count_double >= 1) AND (count_e = 0) AND (count_g = 1) AND (active = 1))
Rows Removed by Filter: 987743
Buffers: temp read=8461
-> CTE Scan on cte_tmp cte_tmp_4 (cost=0.00..56.71 rows=3 width=128) (actual time=31.046..152.850 rows=4 loops=1)
Filter: ((count_e <> 1) AND (count_g <> 0) AND (count_m <> 0) AND (count_double >= 1) AND (active = 0))
Rows Removed by Filter: 991318
Buffers: temp read=8461
-> Append (cost=0.00..108.73 rows=206 width=103) (actual time=26620.976..27439.782 rows=991322 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
-> Subquery Scan on z (cost=0.00..107.56 rows=200 width=102) (actual time=26620.975..27356.994 rows=987734 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
-> HashSetOp Except (cost=0.00..105.06 rows=200 width=82) (actual time=26620.971..27011.650 rows=987734 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
-> Append (cost=0.00..61.28 rows=1751 width=82) (actual time=20685.869..25367.475 rows=994910 loops=1)
Buffers: shared hit=111871, temp read=114117 written=92895
-> Subquery Scan on "*SELECT* 1" (cost=0.00..52.35 rows=1745 width=82) (actual time=20685.867..23595.424 rows=991322 loops=1)
Buffers: shared hit=91523, temp read=77862 written=86408
-> CTE Scan on cte_tmp (cost=0.00..34.90 rows=1745 width=78) (actual time=20685.866..23423.809 rows=991322 loops=1)
Buffers: shared hit=91523, temp read=77862 written=86408
-> Subquery Scan on "*SELECT* 2" (cost=0.00..0.18 rows=6 width=82) (actual time=1196.623..1648.210 rows=3588 loops=1)
Buffers: shared hit=20348, temp read=36255 written=6487
-> CTE Scan on y y_1 (cost=0.00..0.12 rows=6 width=78) (actual time=1196.621..1647.533 rows=3588 loops=1)
Buffers: shared hit=20348, temp read=36255 written=6487
-> CTE Scan on y (cost=0.00..0.14 rows=6 width=120) (actual time=0.004..1.137 rows=3588 loops=1)
Planning Time: 1.647 ms
Execution Time: 28921.102 ms
编辑:添加了 EXPLAIN(ANALYZE, BUFFERS, FORMAT TEXT) 结果,但如何使用它来提高性能?
【问题讨论】:
你为什么首先使用 COALESCE() ?你不需要它 也许常数有时是-1。 我已经删除了我得到-1的情况,所以现在我从查询中删除了合并 执行计划不可读。与psql
连接,运行EXPLAIN (ANALYZE, BUFFERS
/* 您的查询 */` 并复制并粘贴结果。
在不访问真实系统的情况下调整复杂的语句非常困难。但是随着一些计划步骤溢出到磁盘,您可以通过增加work_mem
来节省几秒钟,例如set work_mem = '256MB';
- 显然这取决于您的服务器有多少内存以及正在运行的并发查询数
【参考方案1】:
AND count_d >= 1
NULL
count_d 将无法通过该测试,就像0
一样,因此根本不需要COALESCE(count_d, 0)
。
【讨论】:
【参考方案2】:您没有显示执行计划,但鉴于您有正确的索引,coalesce
不一定是问题。
假设所有这些条件都是选择性的,也就是说,它们大大缩小了结果集,理想的索引应该是
CREATE INDEX ON table_a (count_e, count_f, count_g, coalesce(count_d, 0));
如果其中一个条件不是选择性的,则将其从索引中省略。
【讨论】:
我已经添加了执行计划 无法读取。您在问题中引用的函数似乎不包含coalesce
。
我已经删除了合并,因为 -1 的情况已经从数据中删除了。 EXPLAIN (ANALYZE, BUFFERS
我的查询花费了 2 多个小时。但是,仅 EXPLAIN 就可以正常工作,它会有用吗?我可以将它添加到问题中。 @劳伦兹
EXPLAIN (ANALYZE, BUFFERS)
就足够了——但它必须是可读的。
我已经添加了EXPLAIN的结果,它可读吗? @Laurenz Albe【参考方案3】:
这回答了关于单个表的原始问题。从那以后,这个问题发生了巨大的变化。
coalesce()
是多余的,因为NULL
会使条件失败,所以:
where count_e = 1 and count_f = 0 and count_g = 0 and count_d >= 1
但是,删除coalesce()
是否会影响性能确实值得怀疑。
会改变性能的是这样编写查询,然后在(count_e, count_f, count_g, count_d)
上添加索引。索引中前三列的顺序并不重要,但count_d
应该在这些列之后,因为不等式。
【讨论】:
我已经添加了索引,完整的查询仍然需要一个多小时才能完成。受影响的行数超过 1300 万行 @SaadMustafa 。 . .您的执行计划表明查询比您显示的要复杂得多。 我将添加完整的查询,以便更容易理解@Gordon Linoff 我已经添加了完整的查询@Gordon Linoff 你快到了。使用FORMAT TEXT
代替FORMAT JSON
并添加ANALYZE
和BUFFER
选项。以上是关于多个连接和联合的查询性能真的很差,有没有其他方法可以提高执行时间?的主要内容,如果未能解决你的问题,请参考以下文章