Postgres 索引未使用正确的计划

Posted

技术标签:

【中文标题】Postgres 索引未使用正确的计划【英文标题】:Postgres index not using correct plan 【发布时间】:2021-02-07 22:08:26 【问题描述】:

我的 postgresql 版本是 10.6。我创建了一个索引,但它并不用于所有 where 子句条件检查。以下是更多详细信息:

Create index concurrently ticket_created_at_portal_id_created_by_id_assigned_group_id_idx on ticket(created_at, portal_id, created_by_id, assigned_group);
EXPLAIN (analyze true, verbose true, costs true, buffers true, timing true ) select * from ticket where status is not null
and (assigned_group in ('447') or created_by_id in ('39731566'))
and portal_id=8
and created_at>='2020-12-07T03:00:10.973'
and created_at<='2021-02-05T03:00:10.973'
order by updated_at DESC limit 10;


                                                          QUERY PLAN                                                                                                                                                                                                                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=18975.23..18975.25 rows=10 width=638) (actual time=278.340..278.345 rows=10 loops=1)
   Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
   Buffers: shared hit=2280 read=3105
   ->  Sort  (cost=18975.23..18975.45 rows=87 width=638) (actual time=278.338..278.339 rows=10 loops=1)
         Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
         Sort Key: ticket.updated_at DESC
         Sort Method: top-N heapsort  Memory: 33kB
         Buffers: shared hit=2280 read=3105
         ->  Bitmap Heap Scan on public.ticket  (cost=17855.76..18973.35 rows=87 width=638) (actual time=111.871..275.835 rows=1256 loops=1)
               Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
               Recheck Cond: (((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8)) OR ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint)))
               Filter: ((ticket.status IS NOT NULL) AND (ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone))
               Rows Removed by Filter: 1517
               Heap Blocks: exact=2638
               Buffers: shared hit=2277 read=3105
               ->  BitmapOr  (cost=17855.76..17855.76 rows=291 width=0) (actual time=106.215..106.216 rows=0 loops=1)
                     Buffers: shared hit=336 read=2408
                     ->  Bitmap Index Scan on ticket_assigned_group_portal_id_assigned_agent_idx  (cost=0.00..11.25 rows=282 width=0) (actual time=10.661..10.661 rows=2776 loops=1)
                           Index Cond: ((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8))
                           Buffers: shared hit=4 read=15
                     ->  Bitmap Index Scan on ticket_created_at_portal_id_created_by_id_assigned_group_id_idx  (cost=0.00..17844.47 rows=9 width=0) (actual time=95.551..95.551 rows=2 loops=1)
                           Index Cond: ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint))
                           Buffers: shared hit=332 read=2393
 Planning time: 43.083 ms
 Execution time: 278.556 ms
(25 rows)

ticket_created_at_portal_id_created_by_id_assigned_group_id_idx 具有 where 子句的所有列,但 status 不为空,但查询仍然使用单独的索引索引条件:((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8) ) 这已经存在于第二个索引 ticket_created_at_portal_id_created_by_id_assigned_group_id_idx 中。

为什么会这样?即使我在索引中也包含状态列,查询仍然使用 2 个索引,并且 hen 对索引堆扫描进行了大量过滤。

    我们如何优化它?

我也尝试过表的索引,但仍然无法删除索引。似乎在不同查询的多个索引中重复了相同的列,如果我们可以减少这些索引数,请提供帮助。 表的所有索引为:

"ticket_pkey" PRIMARY KEY, btree (id)
"ticket_ticket_id_idx" UNIQUE, btree (ticket_id)
"uk2uors84i0m8sjxc6oaocuy6oj" UNIQUE CONSTRAINT, btree (ticket_id)
"idx_resource_id" btree (resource_id)
"idx_ticket_created_at" btree (created_at)
"ticket_assigned_agent_idx" btree (assigned_agent)
"ticket_assigned_group_idx" btree (assigned_group)
"ticket_assigned_group_portal_id_assigned_agent_idx" btree (assigned_group, portal_id, assigned_agent)
"ticket_created_at_portal_id_created_by_id_assigned_group_id_idx" btree (created_at, portal_id, created_by_id, assigned_group)
"ticket_created_at_portal_id_status_idx" btree (created_at, portal_id, status)
"ticket_id_resolved_at_assigned_group_status_idx" btree (id, resolved_at, assigned_group, status)

【问题讨论】:

@Laurenz Albe:请在这里帮忙。 【参考方案1】:

使用电话簿(按姓氏然后名字排序)查找名字为“Francis”且姓氏以 K 和 T 之间的字母开头的每个人有多容易?不是很容易,因为它不是按名字排序的。您将不得不遍历电话簿的整个中间部分,阅读每个人的名字。

这里也一样。当索引中的第一列用于范围/不等式查询而不是相等时,它会使之后的所有列的效率大大降低。您可能希望将用于相等的列而不是放在 OR 中。不幸的是,这只是portal_id。接下来的最佳选择取决于我们无法从提供的信息中猜测的其他每个条件的选择性。

在决定这一点时,status IS NULL 与相等是一样的,但status IS NOT NULL 不是,因为它可能是任意数量的值,但仍然不为空,因此它实际上与不等式相同。如果此条件具有高度选择性,则将其合并到部分索引的 WHERE 中的最佳方法。

由于 OR,您可能仍然最好使用 2 个可以组合在位图中的索引或。

...(portal_id, assigned_group, created_at) WHERE status IS NOT NULL;
...(portal_id, created_by_id, created_at) WHERE status IS NOT NULL;

另一种方法是避免获取和排序所有匹配的行,方法是使用索引按 updated_at 的顺序遍历行并在找到其中的 10 个后停止。索引可用于按顺序遍历列,只要在 ORDER BY 列之前仅发生经过相等性测试(且没有 OR)的事物,因此:

...(portal_id, updated_at) WHERE status IS NOT NULL;

【讨论】:

以上是关于Postgres 索引未使用正确的计划的主要内容,如果未能解决你的问题,请参考以下文章

Django Admin 搜索查询未命中 Postgres 索引

Postgres 时间戳列的默认值设置未正确使用

索引扫描时 Postgres 不使用索引是更好的选择

Postgres维护的正确顺序

日期字段上的 Postgres DESC 索引

如果存在多个索引,Postgres 如何选择使用哪个索引?