PostgreSQL 查询性能和可能的优化

Posted

技术标签:

【中文标题】PostgreSQL 查询性能和可能的优化【英文标题】:PostgreSQL query performance and possible optimisation 【发布时间】:2016-11-09 11:34:59 【问题描述】:

我设法编写了查询以获取正确的数据,但对我来说它看起来很糟糕,因为我不得不在查询中使用查询 3 次,甚至 doe 查询性能现在大约 700 毫秒我担心它将来会变慢什么时候会有更多的数据需要处理。任何有关这有多糟糕以及如何优化它的信息将不胜感激。

编辑:

我忘了提到表 s3 和 s14 有多个具有相同 parcelno 的行,我总是需要两个表中的最新行(由 sdate 和 stime 确定)。如果 s14 的最新行比 s3 的最新行更新,或者 s3 列 emadr2 的最新行与表 d 中的列 parcelshop_id 具有相同的值,则不显示数据。请记住,这些表不是我创建的,我只是从中读取数据。

SELECT 
    q1.ddepot, 
    q1.parcelno, 
    q1.sdate, 
    q1.stime, 
    q1.dpostal, 
    q1.service, 
    q1.lorry,
    q1.zc5x3,
    q1.parcelshop_id,
    q1.country,
    q1.dname1
FROM(
    SELECT DISTINCT ON (q.parcelno) q.* FROM(
        SELECT 
            d.ddepot, 
            d.parcelno, 
            s3.sdate, 
            s3.stime, 
            d.dpostal, 
            d.service, 
            s3.lorry,
            s3.zc5x3,
            d.parcelshop_id,
            s3.country,
            d.dname1,
            s3.emadr1,
            s3.emadr2
        FROM dispatcher.detour_avis d
        LEFT JOIN scans.scandata03 s3 ON d.parcelno = s3.parcelno
        LEFT JOIN scans.scandata14 s14 ON d.parcelno = s14.parcelno 
        WHERE   
            d.ddate > (NOW() - interval '5 day') 
            AND d.parcelshop_id IS NOT NULL 
            AND s3.parcelno IS NOT NULL 
            AND (s14.parcelno IS NULL OR (s14.sdate + s14.stime)::timestamp without time zone < (s3.sdate + s3.stime)::timestamp without time zone)
        ORDER BY s3.sdate, s3.stime DESC
    )q 
    ORDER BY q.parcelno
) q1
WHERE q1.parcelshop_id != q1.emadr2

解释(分析,详细):

Subquery Scan on q1  (cost=68552.93..68554.90 rows=84 width=68) (actual time=701.318..701.324 rows=4 loops=1)
  Output: q1.ddepot, q1.parcelno, q1.sdate, q1.stime, q1.dpostal, q1.service, q1.lorry, q1.zc5x3, q1.parcelshop_id, q1.country, q1.dname1
  Filter: ((q1.parcelshop_id)::text <> (q1.emadr2)::text)
  Rows Removed by Filter: 2
  ->  Unique  (cost=68552.93..68553.85 rows=84 width=87) (actual time=701.310..701.314 rows=6 loops=1)
        Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
        ->  Sort  (cost=68552.93..68553.39 rows=184 width=87) (actual time=701.309..701.311 rows=15 loops=1)
              Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
              Sort Key: d.parcelno
              Sort Method: quicksort  Memory: 27kB
              ->  Sort  (cost=68543.71..68544.17 rows=184 width=87) (actual time=701.269..701.269 rows=15 loops=1)
                    Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
                    Sort Key: s3.sdate, s3.stime
                    Sort Method: quicksort  Memory: 27kB
                    ->  Nested Loop  (cost=0.00..68536.79 rows=184 width=87) (actual time=689.775..701.238 rows=15 loops=1)
                          Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
                          Join Filter: ((s14.parcelno IS NULL) OR ((s14.sdate + s14.stime) < (s3.sdate + s3.stime)))
                          Rows Removed by Join Filter: 16
                          ->  Nested Loop Left Join  (cost=0.00..57423.07 rows=455 width=74) (actual time=689.615..700.578 rows=14 loops=1)
                                Output: d.ddepot, d.parcelno, d.dpostal, d.service, d.parcelshop_id, d.dname1, s14.parcelno, s14.sdate, s14.stime
                                ->  Seq Scan on dispatcher.detour_avis d  (cost=0.00..49247.17 rows=455 width=47) (actual time=689.535..700.162 rows=11 loops=1)
                                      Output: d.id, d.parcelno, d.service, d.detour_type, d.ddepot, d.dname1, d.dname2, d.dstreet, d.dhouseno, d.dcountryn, d.dstate, d.dpostal, d.dcity, d.dphone, d.odepot, d.oname1, d.oname2, d.ostreet, d.ohouseno, d.ocoun (...)
                                      Filter: ((d.parcelshop_id IS NOT NULL) AND (d.ddate > (now() - '5 days'::interval)))
                                      Rows Removed by Filter: 985930
                                ->  Append  (cost=0.00..17.92 rows=5 width=33) (actual time=0.036..0.036 rows=1 loops=11)
                                      ->  Seq Scan on scans.scandata14 s14  (cost=0.00..0.00 rows=1 width=58) (actual time=0.000..0.000 rows=0 loops=11)
                                            Output: s14.parcelno, s14.sdate, s14.stime
                                            Filter: ((d.parcelno)::text = (s14.parcelno)::text)
                                      ->  Index Scan using scandata14_2013_pl_indx on scans.scandata14_2013 s14_1  (cost=0.14..0.25 rows=1 width=27) (actual time=0.001..0.001 rows=0 loops=11)
                                            Output: s14_1.parcelno, s14_1.sdate, s14_1.stime
                                            Index Cond: ((d.parcelno)::text = (s14_1.parcelno)::text)
                                      ->  Index Scan using scandata14_2014_pl_indx on scans.scandata14_2014 s14_2  (cost=0.29..4.29 rows=1 width=27) (actual time=0.007..0.007 rows=0 loops=11)
                                            Output: s14_2.parcelno, s14_2.sdate, s14_2.stime
                                            Index Cond: ((d.parcelno)::text = (s14_2.parcelno)::text)
                                      ->  Index Scan using scandata14_2015_pl_indx on scans.scandata14_2015 s14_3  (cost=0.42..6.47 rows=1 width=27) (actual time=0.010..0.010 rows=0 loops=11)
                                            Output: s14_3.parcelno, s14_3.sdate, s14_3.stime
                                            Index Cond: ((d.parcelno)::text = (s14_3.parcelno)::text)
                                      ->  Index Scan using scandata14_2016_pl_indx on scans.scandata14_2016 s14_4  (cost=0.42..6.91 rows=1 width=27) (actual time=0.014..0.015 rows=1 loops=11)
                                            Output: s14_4.parcelno, s14_4.sdate, s14_4.stime
                                            Index Cond: ((d.parcelno)::text = (s14_4.parcelno)::text)
                          ->  Append  (cost=0.00..24.34 rows=5 width=80) (actual time=0.044..0.045 rows=2 loops=14)
                                ->  Seq Scan on scans.scandata03 s3  (cost=0.00..0.00 rows=1 width=186) (actual time=0.000..0.000 rows=0 loops=14)
                                      Output: s3.sdate, s3.stime, s3.lorry, s3.zc5x3, s3.country, s3.emadr1, s3.emadr2, s3.parcelno
                                      Filter: ((s3.parcelno IS NOT NULL) AND ((d.parcelno)::text = (s3.parcelno)::text))
                                ->  Index Scan using scandata03_2013_pl_indx on scans.scandata03_2013 s3_1  (cost=0.14..0.26 rows=1 width=51) (actual time=0.001..0.001 rows=0 loops=14)
                                      Output: s3_1.sdate, s3_1.stime, s3_1.lorry, s3_1.zc5x3, s3_1.country, s3_1.emadr1, s3_1.emadr2, s3_1.parcelno
                                      Index Cond: (((s3_1.parcelno)::text = (d.parcelno)::text) AND (s3_1.parcelno IS NOT NULL))
                                ->  Index Scan using scandata03_2014_pl_indx on scans.scandata03_2014 s3_2  (cost=0.42..7.55 rows=1 width=53) (actual time=0.009..0.009 rows=0 loops=14)
                                      Output: s3_2.sdate, s3_2.stime, s3_2.lorry, s3_2.zc5x3, s3_2.country, s3_2.emadr1, s3_2.emadr2, s3_2.parcelno
                                      Index Cond: (((s3_2.parcelno)::text = (d.parcelno)::text) AND (s3_2.parcelno IS NOT NULL))
                                ->  Index Scan using scandata03_2015_pl_indx on scans.scandata03_2015 s3_3  (cost=0.42..8.21 rows=1 width=54) (actual time=0.013..0.013 rows=0 loops=14)
                                      Output: s3_3.sdate, s3_3.stime, s3_3.lorry, s3_3.zc5x3, s3_3.country, s3_3.emadr1, s3_3.emadr2, s3_3.parcelno
                                      Index Cond: (((s3_3.parcelno)::text = (d.parcelno)::text) AND (s3_3.parcelno IS NOT NULL))
                                ->  Index Scan using scandata03_2016_pl_indx on scans.scandata03_2016 s3_4  (cost=0.43..8.31 rows=1 width=55) (actual time=0.019..0.020 rows=2 loops=14)
                                      Output: s3_4.sdate, s3_4.stime, s3_4.lorry, s3_4.zc5x3, s3_4.country, s3_4.emadr1, s3_4.emadr2, s3_4.parcelno
                                      Index Cond: (((s3_4.parcelno)::text = (d.parcelno)::text) AND (s3_4.parcelno IS NOT NULL))
Planning time: 4.670 ms
Execution time: 701.550 ms

【问题讨论】:

解释分析可能有用。 请edit您的问题添加create table为有问题的表(包括所有索引)和使用explain (analyze, verbose)生成的执行计划的语句。 Formatted 文本 请no screen shots 注意:AND s3.parcelno IS NOT NULL 会将左连接变成普通连接。 注 2:ORDER BY s3.sdate, s3.stime DESC 看起来不对。为什么不将日期+时间组合成时间戳?而且,既然您似乎对最近的 s3 记录感兴趣,为什么不选择最近的 s3 记录,而不是按 + distinct 的(错误)顺序? IMO ORDER BY 用于指示来自 s3 的哪个详细记录将显示在 DISTINCT ON ... 我的猜测是 d 和 s3 之间存在 1:N 关系。 【参考方案1】:

在我看来有很多不必要的嵌套。检查这是否在功能上等效

select distinct on (d.parcelno) d.*
from
    dispatcher.detour_avis d
    inner join
    scans.scandata03 s3 on d.parcelno = s3.parcelno
    left join
    scans.scandata14 s14 on d.parcelno = s14.parcelno
where
    d.ddate > now() - interval '5 day'
    and d.parcelshop_id is not null and parcelshop_id != emadr2
    and (
        s14.parcelno is null or
        (s14.sdate + s14.stime)::timestamp < (s3.sdate + s3.stime)::timestamp
    )
order by d.parcelno

当您执行left join 并在where 子句中放入包含正确表列连接条件的s3.parcelno is not null 条件时,您实际上是在执行inner join。所以我只是从where 子句中删除了它,并将left 变成了inner join

【讨论】:

注意:IMO 你可以通过将LEFT JOIN s14IS NULL OR ... 放入NOT EXISTS(... s14 ...) 构造来摆脱它。 很遗憾没有,我在原帖中添加了更好的解释。

以上是关于PostgreSQL 查询性能和可能的优化的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL——查询编译部分小结

PostgreSQL——查询编译部分小结

PostgreSQL——查询编译部分小结

PostgreSQL 性能优化 短查询 覆盖索引,前缀索引,索引和排序

Postgresql SELECT 性能和优化

数据库查询优化器的艺术:原理解析与SQL性能优化