Postgres 分区性能调优

Posted 2023-04-15

技术标签:

【中文标题】Postgres 分区性能调优【英文标题】：Postgres Partition Performance Tuning 【发布时间】：2020-07-22 11:04:39 【问题描述】：

选择查询扫描所有子表。我们可以让查询优化器扫描正确的子表吗？

示例：

使用 Postgres 9.6 中的继承概念创建一个父表和两个子表，忽略约束以使其简单

create table student(id INTEGER, name varchar(10), result varchar(1) );
create table student_pass() inherits (student);
create table student_fail() inherits (student);

索引

create index student_result_idx on student (result);
create index student_result_idx2 on student_pass (result) where result='P';
create index student_result_idx3 on student_fail (result) where result='F';

程序

CREATE OR REPLACE FUNCTION student_partition()
RETURNS TRIGGER AS $$
BEGIN
    IF (new.result = 'P')THEN
        INSERT INTO student_pass VALUES (NEW.*);
    ELSE
        INSERT INTO student_fail VALUES (NEW.*);
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

触发器

CREATE TRIGGER insert_trigger BEFORE INSERT ON student
    FOR EACH ROW EXECUTE procedure student_partition();

插入

insert into student values
(1,'aaa','P'),
(2,'bbb','F');

插入按预期发生在各自的表中

选择

 select * from student where result='P';

这里的问题是，当我选择它时，它会扫描所有表。如何让查询优化器足够聪明地选择正确的子表？

我们是否需要索引中的 where 条件，因为整个表将是“P”或“F”？

输出 EXPLAIN(analyze, buffers) select * from student where result='P'

Append  (cost=0.00..37.94 rows=11 width=50) (actual time=0.016..0.042 rows=2 loops=1)
  Buffers: shared hit=4
  ->  Seq Scan on student  (cost=0.00..2.30 rows=1 width=50) (actual time=0.015..0.017 rows=1 loops=1)
        Filter: ((result)::text = 'P'::text)
        Rows Removed by Filter: 1
        Buffers: shared hit=1
  ->  Bitmap Heap Scan on student_pass  (cost=4.17..12.64 rows=5 width=50) (actual time=0.013..0.014 rows=1 loops=1)
        Recheck Cond: ((result)::text = 'P'::text)
        Heap Blocks: exact=1
        Buffers: shared hit=2
        ->  Bitmap Index Scan on student_result_idx2  (cost=0.00..4.17 rows=5 width=0) (actual time=0.007..0.007 rows=1 loops=1)
              Buffers: shared hit=1
  ->  Seq Scan on student_fail  (cost=0.00..23.00 rows=5 width=50) (actual time=0.007..0.007 rows=0 loops=1)
        Filter: ((result)::text = 'P'::text)
        Rows Removed by Filter: 1
        Buffers: shared hit=1
Planning time: 0.447 ms
Execution time: 0.120 ms

【问题讨论】：

"忽略约束以使其变得简单" - 约束对于查询优化器非常重要如果您真的认为出于性能原因需要分区，请升级到 Postgres 12 并使用声明式分区。但我严重怀疑名为student 的表是否会从一开始就从分区中受益。它可能包含多少行？百万？ 500万？它们都没有表明分区会有所帮助的大小。例如我提到的student，实际表每月增长10亿左右。我们正在实现数据，但在给定时间内将有大约 30 亿条记录。我们正在尝试提高 select 的性能，而无需更改应用程序级别。如果您预计有 30 亿行，那么您绝对应该升级到 Postgres 12 并使用声明性分区。忘记旧的基于继承的分区。但是只有当查询在 WHERE 子句中包含分区键时，分区才会有助于提高性能 【参考方案1】：

添加约束有助于

alter table student_pass add constraint pass_cst check (result ='P');
alter table student_fail add constraint fail_cst check (result not in ('P'));

输出 EXPLAIN(analyze, buffers) select * from student where result='P'

Append  (cost=0.00..23.00 rows=6 width=50) (actual time=0.299..0.303 rows=2 loops=1)
  Buffers: shared read=1
  ->  Seq Scan on student  (cost=0.00..0.00 rows=1 width=50) (actual time=0.004..0.004 rows=0 loops=1)
        Filter: ((result)::text = 'P'::text)
  ->  Seq Scan on student_pass  (cost=0.00..23.00 rows=5 width=50) (actual time=0.294..0.296 rows=2 loops=1)
        Filter: ((result)::text = 'P'::text)
        Buffers: shared read=1
Planning time: 10.488 ms
Execution time: 0.361 ms

查询优化器跳过了 student_fail 表

【讨论】：

以上是关于Postgres 分区性能调优的主要内容，如果未能解决你的问题，请参考以下文章