如何以超过 15 秒的速度对超过 300 万条记录的表进行此查询？

Posted 2023-04-14

技术标签:

【中文标题】如何以超过 15 秒的速度对超过 300 万条记录的表进行此查询？【英文标题】：How can I make this query against a table with over 3 million records faster than 15+ seconds? 【发布时间】：2019-11-13 00:20:21 【问题描述】：

我有以下疑问：

EXPLAIN ANALYZE
SELECT
    customer_id
FROM
    orders 
WHERE
    "status" IN ( 'authorized', 'paid', 'partially_paid', 'pending')    
GROUP BY 
    customer_id
HAVING 
    COUNT(customer_id) >= 2

这会产生以下查询计划：

Finalize GroupAggregate  (cost=440054.50..516225.43 rows=252557 width=33) (actual time=12206.961..17389.057 rows=457301 loops=1)
  Group Key: customer_id
  Filter: (count(customer_id) >= 2)
  Rows Removed by Filter: 592730
  ->  Gather Merge  (cost=440054.50..511174.29 rows=505114 width=41) (actual time=12206.945..16674.249 rows=1615901 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial GroupAggregate  (cost=439054.47..451871.57 rows=252557 width=41) (actual time=12101.661..14862.466 rows=538634 loops=3)
              Group Key: customer_id
              ->  Sort  (cost=439054.47..442484.98 rows=1372204 width=33) (actual time=12101.648..14344.507 rows=1097122 loops=3)
                    Sort Key: customer_id
                    Sort Method: external merge  Disk: 45448kB
                    ->  Parallel Seq Scan on orders  (cost=0.00..224124.56 rows=1372204 width=33) (actual time=0.014..1205.188 rows=1097122 loops=3)
                          Filter: ((status)::text = ANY ('authorized,paid,partially_paid,pending'::text[]))
                          Rows Removed by Filter: 24092
Planning time: 0.101 ms
Execution time: 17434.175 ms

表本身有超过 300 万条记录。

在一天结束的时候，我试图找到所有已经下过 2 个或更多订单的客户，并试图让这个查询快速响应，最好在几秒钟内。

我尝试了几种方法，但似乎无法让执行时间变快。

关于如何改进这一点的任何想法？

【问题讨论】：

你的桌子上有哪些索引？我希望超过 customer_id 的索引能有很大帮助 customer_id 上已有索引。以及地位。为什么要过滤status？ 【参考方案1】：

即使您正在执行顺序扫描，扫描实际上也非常快（actual time=0.014..1205.188 毫秒）。你真正受到打击的是那种，它需要actual time=12101.648..14344.507。你正在使用Sort Method: external merge Disk: 45448kB 溢出到磁盘

尝试将您的 work_mem 增加到高于 48MB 的值，看看是否有帮助。

【讨论】：

这大大提高了性能。将其缩短到 3 秒。谢谢！【参考方案2】：

我想知道过滤索引是否会加快查询速度：

create index idf_orders_customer_id_status on (customer_id)
    where "status" IN ( 'authorized', 'paid', 'partially_paid', 'pending') ;

SELECT customer_id
FROM orders 
WHERE "status" IN ( 'authorized', 'paid', 'partially_paid', 'pending')    
GROUP BY customer_id
HAVING COUNT(*) >= 2;

【讨论】：

不幸的是，这对执行时间没有帮助。我知道使用具有多个值的 IN 会导致查询性能下降，但这是我们必须过滤的东西。 @john 。 . .我认为它并没有摆脱order by 的排序。实际上我认为它确实做到了，但我认为它仍然使用 Seq Scan，因为查询优化器仍然认为它更优化的记录数。

以上是关于如何以超过 15 秒的速度对超过 300 万条记录的表进行此查询？的主要内容，如果未能解决你的问题，请参考以下文章