Postgresql 中不可预测的查询性能

Posted 2023-04-14

技术标签:

【中文标题】Postgresql 中不可预测的查询性能【英文标题】：Unpredictable query performance in Postgresql 【发布时间】：2013-10-30 19:00:11 【问题描述】：

我在 Postgres 9.3 数据库中有这样的表：

A <1---n B n---1> C

表 A 包含 ~10^7 行，表 B 相当大，包含 ~10^9 行，C 包含 ~100 行。

我使用以下查询来查找与 B 和 C 中的某些条件匹配的所有 As（不同）（真正的查询更复杂，连接更多表并检查子查询中的更多属性）：

查询 1：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-12' and
                    C.Name = '00000015')
limit 200;

该查询大约需要 500 毫秒（注意表中存在 C.Name = '00000015'）：

Limit  (cost=119656.37..120234.06 rows=200 width=9) (actual time=427.799..465.485 rows=200 loops=1)
  ->  Hash Semi Join  (cost=119656.37..483518.78 rows=125971 width=9) (actual time=427.797..465.460 rows=200 loops=1)
        Hash Cond: (a.id = b.aid)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.010..15.058 rows=133470 loops=1)
        ->  Hash  (cost=117588.73..117588.73 rows=125971 width=4) (actual time=427.233..427.233 rows=190920 loops=1)
              Buckets: 4096  Batches: 8  Memory Usage: 838kB
              ->  Nested Loop  (cost=0.57..117588.73 rows=125971 width=4) (actual time=0.176..400.326 rows=190920 loops=1)
                    ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.015..0.030 rows=1 loops=1)
                          Filter: (name = '00000015'::text)
                          Rows Removed by Filter: 149
                    ->  Index Only Scan using cid_aid on b  (cost=0.57..116291.64 rows=129422 width=8) (actual time=0.157..382.896 rows=190920 loops=1)
                          Index Cond: ((cid = c.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-12 00:00:00'::timestamp without time zone))
                          Heap Fetches: 0
Total runtime: 476.173 ms

查询 2：将 C.Name 更改为不存在的内容 (C.Name = 'foo') 需要 0.1 毫秒：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-12' and
                    C.Name = 'foo')
limit 200;

Limit  (cost=119656.37..120234.06 rows=200 width=9) (actual time=0.063..0.063 rows=0 loops=1)
  ->  Hash Semi Join  (cost=119656.37..483518.78 rows=125971 width=9) (actual time=0.062..0.062 rows=0 loops=1)
        Hash Cond: (a.id = b.aid)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.010..0.010 rows=1 loops=1)
        ->  Hash  (cost=117588.73..117588.73 rows=125971 width=4) (actual time=0.038..0.038 rows=0 loops=1)
              Buckets: 4096  Batches: 8  Memory Usage: 0kB
              ->  Nested Loop  (cost=0.57..117588.73 rows=125971 width=4) (actual time=0.038..0.038 rows=0 loops=1)
                    ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.037..0.037 rows=0 loops=1)
                          Filter: (name = 'foo'::text)
                          Rows Removed by Filter: 150
                    ->  Index Only Scan using cid_aid on b  (cost=0.57..116291.64 rows=129422 width=8) (never executed)
                          Index Cond: ((cid = c.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-12 00:00:00'::timestamp without time zone))
                          Heap Fetches: 0
Total runtime: 0.120 ms

查询 3：将 C.Name 重置为存在的内容（如在第一个查询中）并将时间戳增加 3 天使用了另一个查询计划，但仍然很快（200 毫秒）：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-15' and
                    C.Name = '00000015')
limit 200;

Limit  (cost=0.57..112656.93 rows=200 width=9) (actual time=4.404..227.569 rows=200 loops=1)
  ->  Nested Loop Semi Join  (cost=0.57..90347016.34 rows=160394 width=9) (actual time=4.403..227.544 rows=200 loops=1)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.008..1.046 rows=12250 loops=1)
        ->  Nested Loop  (cost=0.57..7.49 rows=1 width=4) (actual time=0.017..0.017 rows=0 loops=12250)
              ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.005..0.015 rows=1 loops=12250)
                    Filter: (name = '00000015'::text)
                    Rows Removed by Filter: 147
              ->  Index Only Scan using cid_aid on b  (cost=0.57..4.60 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12250)
                    Index Cond: ((cid = c.id) AND (aid = a.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-15 00:00:00'::timestamp without time zone))
                    Heap Fetches: 0
Total runtime: 227.632 ms

查询 4：但新的查询计划在搜索不存在的 C.Name 时完全失败：：

explain analyze
select A.SNr from A
where exists (select 1 from B, C
              where B.AId = A.Id and
                    B.CId = C.Id and
                    B.Timestamp >= '2013-01-01' and
                    B.Timestamp <= '2013-01-15' and
                    C.Name = 'foo')
limit 200;

现在返回相同的 0 行需要 170 秒（与之前的 0.1 毫秒相比！）：

Limit  (cost=0.57..112656.93 rows=200 width=9) (actual time=170184.979..170184.979 rows=0 loops=1)
  ->  Nested Loop Semi Join  (cost=0.57..90347016.34 rows=160394 width=9) (actual time=170184.977..170184.977 rows=0 loops=1)
        ->  Seq Scan on a  (cost=0.00..196761.34 rows=12020034 width=13) (actual time=0.008..794.626 rows=12020034 loops=1)
        ->  Nested Loop  (cost=0.57..7.49 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=12020034)
              ->  Seq Scan on c  (cost=0.00..2.88 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=12020034)
                    Filter: (name = 'foo'::text)
                    Rows Removed by Filter: 150
              ->  Index Only Scan using cid_aid on b  (cost=0.57..4.60 rows=1 width=8) (never executed)
                    Index Cond: ((cid = c.id) AND (aid = a.id) AND ("timestamp" >= '2013-01-01 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-01-15 00:00:00'::timestamp without time zone))
                    Heap Fetches: 0
Total runtime: 170185.033 ms

所有查询都在“alter table set statistics”之后运行，所有列的值为 10000，并且在整个数据库上运行分析之后。

现在看起来参数的最轻微变化（甚至 SQL 都没有）可以使 Postgres 选择一个糟糕的计划（在这种情况下是 0.1 毫秒与 170 秒！）。在更改内容时，我总是尝试检查查询计划，但是当参数上的如此小的更改可以产生如此巨大的差异时，很难确定某些东西会起作用。我对其他查询也有类似的问题。

我可以做些什么来获得更可预测的结果？

（我已经尝试修改某些查询计划参数（设置启用_...=开/关）和一些不同的 SQL 语句 - 加入 + 区别/分组而不是“存在” - 但似乎没有什么能让 postgres 选择“稳定" 查询计划，同时仍提供可接受的性能）。

编辑 #1：表格 + 索引定义

test=# \d a
                          Tabelle äpublic.aô
 Spalte |   Typ   |                     Attribute
--------+---------+----------------------------------------------------
 id     | integer | not null Vorgabewert nextval('a_id_seq'::regclass)
 anr    | integer |
 snr    | text    |
Indexe:
    "a_pkey" PRIMARY KEY, btree (id)
    "anr_snr_index" UNIQUE, btree (anr, snr)
    "anr_index" btree (anr)
Fremdschlnssel-Constraints:
    "anr_fkey" FOREIGN KEY (anr) REFERENCES pt(id)
Fremdschlnsselverweise von:
    TABLE "b" CONSTRAINT "aid_fkey" FOREIGN KEY (aid) REFERENCES a(id)


test=# \d b
                 Tabelle äpublic.bô
  Spalte   |             Typ             | Attribute
-----------+-----------------------------+-----------
 id        | uuid                        | not null
 timestamp | timestamp without time zone |
 cid       | integer                     |
 aid       | integer                     |
 prop1     | text                        |
 propn     | integer                     |
Indexe:
    "b_pkey" PRIMARY KEY, btree (id)
    "aid_cid" btree (aid, cid)
    "cid_aid" btree (cid, aid, "timestamp")
    "timestamp_index" btree ("timestamp")
Fremdschlnssel-Constraints:
    "aid_fkey" FOREIGN KEY (aid) REFERENCES a(id)
    "cid_fkey" FOREIGN KEY (cid) REFERENCES c(id)


test=# \d c
                          Tabelle äpublic.cô
 Spalte |   Typ   |                     Attribute
--------+---------+----------------------------------------------------
 id     | integer | not null Vorgabewert nextval('c_id_seq'::regclass)
 name   | text    |
Indexe:
    "c_pkey" PRIMARY KEY, btree (id)
    "c_name_index" UNIQUE, btree (name)
Fremdschlnsselverweise von:
    TABLE "b" CONSTRAINT "cid_fkey" FOREIGN KEY (cid) REFERENCES c(id)

【问题讨论】：

您的索引是什么样的？如果删除LIMIT 200 并在外部查询中使用SELECT COUNT(*) 而不是SELECT A.SNr 会怎样？看起来 A 表在 a.SNr 上没有可用索引（或 PK ）。也可能是没有统计数据。 请在您的问题中添加表格定义（包括 PK/FK 和二级索引）。我添加了表+索引定义，见底部的编辑#1。 aid,cid,timestamp 的某种组合是表 b 的 自然键 吗？在这种情况下，您可以删除代理键 id，并依赖复合键（当然，前提是它们不可为空）。（顺便说一句：时间戳是列的坏名称） 【参考方案1】：

您的问题是查询需要评估整个表的相关子查询 a。当 Postgres 快速找到 200 个适合的随机行时（当 c.name 存在时似乎偶尔会出现这种情况），它会相应地生成它们，并且如果有很多可供选择的话，速度相当快。但是当不存在这样的行时，它会在 exists() 语句中评估整个 hogwash 的次数与表 a 的行数一样多，因此您会看到性能问题。

添加一个不相关的 where 子句肯定会解决一些边缘情况：

and exists(select 1 from c where name = ?)

当您将后者与 b 连接并将其写为 cte 时，它也可能起作用：

with bc as (
select aid
from b join c on b.cid = c.bid
and b.timestamp between ? and ?
and c.name = ?
)
select a.id
from a
where exists (select 1 from bc)
and exists (select 1 from bc where a.id = bc.aid)
limit 200

如果没有，只需逐字输入 bc 查询，而不是使用 cte。这里的重点是强制 Postgres 将 bc 查找视为独立的，如果结果集根本没有产生任何行，则尽早放弃。

我假设您的查询最终会更复杂，但请注意，上面的内容可以重写为：

with bc as (...)
select aid
from bc
limit 200

或者：

with bc as (...)
select a.id
from a
where a.id in (select aid from bc)
limit 200

两者都应该在边缘情况下产生更好的计划。

（旁注：通常不建议在不订购的情况下进行限制。）

【讨论】：

添加“exists (select 1 from c where name = ?)”或 CTE 真的会强制 postgres 首先评估该表达式，还是更像是一个提示？它确实提高了性能。我会做更多的测试。从技术上讲，它并不强制 Postgres 首先评估它，但查询是在最终集中找到任何行的完全独立的先决条件。 Postgres 会将其识别为具有这些特征，并首先对其进行相应的评估。被接受为答案，因为“使用查询”似乎可以很好地“强制”postgres 以某种（合理的）顺序做事，从而使其产生更稳定的查询计划。尽管我仍然担心 postgres 在生产的某一天可能会决定再次选择一个非常糟糕的查询计划。【参考方案2】：

也许尝试用 CTE 重写查询？

with BC as (
    select distinct B.AId from B where
    B.Timestamp >= '2013-01-01' and
    B.Timestamp <= '2013-01-12' and
    B.CId in (select C.Id from C where C.Name = '00000015')
    limit 200
)

select A.SNr from A where A.Id in (select AId from BC)

如果我理解正确 - 可以很容易地将限制放在 BC 查询中以避免扫描表 A。

【讨论】：

嗨，alexius，将限制放在里面会产生不同/不正确的结果。为什么？如果比较给定的查询和我的 - 结果应该是相同的（假设 A.id 作为主键）。但是，如果原始查询更复杂并且使用对表 A 的引用 - 这可能是真的。

以上是关于Postgresql 中不可预测的查询性能的主要内容，如果未能解决你的问题，请参考以下文章