为啥此查询在 PostgreSQL 中不使用仅索引扫描？

Posted 2023-03-24

技术标签:

【中文标题】为啥此查询在 PostgreSQL 中不使用仅索引扫描？【英文标题】：Why this query does't use index-only scan in PostgreSQL?为什么此查询在 PostgreSQL 中不使用仅索引扫描？ 【发布时间】：2015-08-19 07:34:38 【问题描述】：

我有一个没有主键的 28 列和 7M 记录的表。

CREATE TABLE records (
  direction smallint,
  exporters_id integer,
  time_stamp integer
  ...
)

我在这个表和真空表上创建索引（自动真空开启）

CREATE INDEX exporter_dir_time_only_index ON sacopre_records
USING btree (exporters_id, direction, time_stamp);

我想执行这个查询

SELECT count(exporters_id) FROM records WHERE exporters_id = 50

该表有 6982224 条记录，exporters_id = 50。我预计此查询使用仅索引扫描来获取结果，但它使用了顺序扫描。这是“解释分析”输出：

Aggregate  (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1)
->  Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1)
    Filter: (exporters_id = 50)
    Rows Removed by Filter: 2663
Total runtime: 1521.886 ms

但是当我将 exporters_id 更改为另一个 id 时，查询使用 index-only 扫描

Aggregate  (cost=46.05..46.06 rows=1 width=4) (actual time=0.321..0.321 rows=1 loops=1)
->  Index Only Scan using exporter_dir_time_only_index on sacopre_records  (cost=0.43..42.85 rows=1281 width=4) (actual time=0.313..0.315 rows=4 loops=1)
    Index Cond: (exporters_id = 47)
    Heap Fetches: 0
Total runtime: 0.358 ms

问题出在哪里？

【问题讨论】：

你试过SELECT COUNT(exporters_id=50) FROM records吗？ @Tordek，我现在测试它并得到相同的结果，它使用 seq-scan。也许新索引没有被分析，因此呈现给规划者？.. 试试vacuum analyze records @VaoTsun，我在上面说过，我执行了“真空分析”并且自动真空开启。 【参考方案1】：

解释告诉你原因。仔细看看。

Aggregate  (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1)
->  Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1)
    Filter: (exporters_id = 50)
    Rows Removed by Filter: 2663
Total runtime: 1521.886 ms

您的过滤器仅删除了表中 6982149 行总数中的 2663 行，因此执行顺序扫描确实应该比使用索引更快，因为磁盘磁头应该通过 6982149 - 2663 = 6979486 条记录.磁头开始按顺序读取整个表，并在途中删除与您的标准不匹配的那一小部分 (0.000004 %)。在索引扫描情况下，它应该从索引文件跳转并返回数据文件 6979486 次，这肯定应该比你现在得到的这 1.5 秒慢！

【讨论】：

“然后返回数据文件”...但是他们在索引字段上执行COUNT，引擎肯定可以遍历索引并忽略数据吗？我同意@Tordek，没有必要回到数据文件！！选择结果是 50。索引类型是btree，所以我认为从索引中获取结果比搜索数据文件要快。 @KouberSaparev 最常见的价值观：这是错误的。在某些 DBMS 中可能是这种情况，但在 PostgreSQL 中，b-tree 索引确实包含所有值，无论是常见的还是其他的，除非您在索引定义。这样做也很有用，对于仅索引扫描或有效返回按索引排序的结果。不过，您的第二点是正确的：很可能这里没有使用索引，因为它的选择性不够，使用enable_seqscan = off 进行测试有助于查看相对成本估算。 @Arshen 您的random_page_cost 和seq_page_cost 很可能无法准确反映系统的实际性能。或者计划者没有很好地估计。不过，这并没有太大的区别。

以上是关于为啥此查询在 PostgreSQL 中不使用仅索引扫描？的主要内容，如果未能解决你的问题，请参考以下文章