PostgreSQL 查询性能和可能的优化
Posted
技术标签:
【中文标题】PostgreSQL 查询性能和可能的优化【英文标题】:PostgreSQL query performance and possible optimisation 【发布时间】:2016-11-09 11:34:59 【问题描述】:我设法编写了查询以获取正确的数据,但对我来说它看起来很糟糕,因为我不得不在查询中使用查询 3 次,甚至 doe 查询性能现在大约 700 毫秒我担心它将来会变慢什么时候会有更多的数据需要处理。任何有关这有多糟糕以及如何优化它的信息将不胜感激。
编辑:
我忘了提到表 s3 和 s14 有多个具有相同 parcelno 的行,我总是需要两个表中的最新行(由 sdate 和 stime 确定)。如果 s14 的最新行比 s3 的最新行更新,或者 s3 列 emadr2 的最新行与表 d 中的列 parcelshop_id 具有相同的值,则不显示数据。请记住,这些表不是我创建的,我只是从中读取数据。
SELECT
q1.ddepot,
q1.parcelno,
q1.sdate,
q1.stime,
q1.dpostal,
q1.service,
q1.lorry,
q1.zc5x3,
q1.parcelshop_id,
q1.country,
q1.dname1
FROM(
SELECT DISTINCT ON (q.parcelno) q.* FROM(
SELECT
d.ddepot,
d.parcelno,
s3.sdate,
s3.stime,
d.dpostal,
d.service,
s3.lorry,
s3.zc5x3,
d.parcelshop_id,
s3.country,
d.dname1,
s3.emadr1,
s3.emadr2
FROM dispatcher.detour_avis d
LEFT JOIN scans.scandata03 s3 ON d.parcelno = s3.parcelno
LEFT JOIN scans.scandata14 s14 ON d.parcelno = s14.parcelno
WHERE
d.ddate > (NOW() - interval '5 day')
AND d.parcelshop_id IS NOT NULL
AND s3.parcelno IS NOT NULL
AND (s14.parcelno IS NULL OR (s14.sdate + s14.stime)::timestamp without time zone < (s3.sdate + s3.stime)::timestamp without time zone)
ORDER BY s3.sdate, s3.stime DESC
)q
ORDER BY q.parcelno
) q1
WHERE q1.parcelshop_id != q1.emadr2
解释(分析,详细):
Subquery Scan on q1 (cost=68552.93..68554.90 rows=84 width=68) (actual time=701.318..701.324 rows=4 loops=1)
Output: q1.ddepot, q1.parcelno, q1.sdate, q1.stime, q1.dpostal, q1.service, q1.lorry, q1.zc5x3, q1.parcelshop_id, q1.country, q1.dname1
Filter: ((q1.parcelshop_id)::text <> (q1.emadr2)::text)
Rows Removed by Filter: 2
-> Unique (cost=68552.93..68553.85 rows=84 width=87) (actual time=701.310..701.314 rows=6 loops=1)
Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
-> Sort (cost=68552.93..68553.39 rows=184 width=87) (actual time=701.309..701.311 rows=15 loops=1)
Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
Sort Key: d.parcelno
Sort Method: quicksort Memory: 27kB
-> Sort (cost=68543.71..68544.17 rows=184 width=87) (actual time=701.269..701.269 rows=15 loops=1)
Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
Sort Key: s3.sdate, s3.stime
Sort Method: quicksort Memory: 27kB
-> Nested Loop (cost=0.00..68536.79 rows=184 width=87) (actual time=689.775..701.238 rows=15 loops=1)
Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
Join Filter: ((s14.parcelno IS NULL) OR ((s14.sdate + s14.stime) < (s3.sdate + s3.stime)))
Rows Removed by Join Filter: 16
-> Nested Loop Left Join (cost=0.00..57423.07 rows=455 width=74) (actual time=689.615..700.578 rows=14 loops=1)
Output: d.ddepot, d.parcelno, d.dpostal, d.service, d.parcelshop_id, d.dname1, s14.parcelno, s14.sdate, s14.stime
-> Seq Scan on dispatcher.detour_avis d (cost=0.00..49247.17 rows=455 width=47) (actual time=689.535..700.162 rows=11 loops=1)
Output: d.id, d.parcelno, d.service, d.detour_type, d.ddepot, d.dname1, d.dname2, d.dstreet, d.dhouseno, d.dcountryn, d.dstate, d.dpostal, d.dcity, d.dphone, d.odepot, d.oname1, d.oname2, d.ostreet, d.ohouseno, d.ocoun (...)
Filter: ((d.parcelshop_id IS NOT NULL) AND (d.ddate > (now() - '5 days'::interval)))
Rows Removed by Filter: 985930
-> Append (cost=0.00..17.92 rows=5 width=33) (actual time=0.036..0.036 rows=1 loops=11)
-> Seq Scan on scans.scandata14 s14 (cost=0.00..0.00 rows=1 width=58) (actual time=0.000..0.000 rows=0 loops=11)
Output: s14.parcelno, s14.sdate, s14.stime
Filter: ((d.parcelno)::text = (s14.parcelno)::text)
-> Index Scan using scandata14_2013_pl_indx on scans.scandata14_2013 s14_1 (cost=0.14..0.25 rows=1 width=27) (actual time=0.001..0.001 rows=0 loops=11)
Output: s14_1.parcelno, s14_1.sdate, s14_1.stime
Index Cond: ((d.parcelno)::text = (s14_1.parcelno)::text)
-> Index Scan using scandata14_2014_pl_indx on scans.scandata14_2014 s14_2 (cost=0.29..4.29 rows=1 width=27) (actual time=0.007..0.007 rows=0 loops=11)
Output: s14_2.parcelno, s14_2.sdate, s14_2.stime
Index Cond: ((d.parcelno)::text = (s14_2.parcelno)::text)
-> Index Scan using scandata14_2015_pl_indx on scans.scandata14_2015 s14_3 (cost=0.42..6.47 rows=1 width=27) (actual time=0.010..0.010 rows=0 loops=11)
Output: s14_3.parcelno, s14_3.sdate, s14_3.stime
Index Cond: ((d.parcelno)::text = (s14_3.parcelno)::text)
-> Index Scan using scandata14_2016_pl_indx on scans.scandata14_2016 s14_4 (cost=0.42..6.91 rows=1 width=27) (actual time=0.014..0.015 rows=1 loops=11)
Output: s14_4.parcelno, s14_4.sdate, s14_4.stime
Index Cond: ((d.parcelno)::text = (s14_4.parcelno)::text)
-> Append (cost=0.00..24.34 rows=5 width=80) (actual time=0.044..0.045 rows=2 loops=14)
-> Seq Scan on scans.scandata03 s3 (cost=0.00..0.00 rows=1 width=186) (actual time=0.000..0.000 rows=0 loops=14)
Output: s3.sdate, s3.stime, s3.lorry, s3.zc5x3, s3.country, s3.emadr1, s3.emadr2, s3.parcelno
Filter: ((s3.parcelno IS NOT NULL) AND ((d.parcelno)::text = (s3.parcelno)::text))
-> Index Scan using scandata03_2013_pl_indx on scans.scandata03_2013 s3_1 (cost=0.14..0.26 rows=1 width=51) (actual time=0.001..0.001 rows=0 loops=14)
Output: s3_1.sdate, s3_1.stime, s3_1.lorry, s3_1.zc5x3, s3_1.country, s3_1.emadr1, s3_1.emadr2, s3_1.parcelno
Index Cond: (((s3_1.parcelno)::text = (d.parcelno)::text) AND (s3_1.parcelno IS NOT NULL))
-> Index Scan using scandata03_2014_pl_indx on scans.scandata03_2014 s3_2 (cost=0.42..7.55 rows=1 width=53) (actual time=0.009..0.009 rows=0 loops=14)
Output: s3_2.sdate, s3_2.stime, s3_2.lorry, s3_2.zc5x3, s3_2.country, s3_2.emadr1, s3_2.emadr2, s3_2.parcelno
Index Cond: (((s3_2.parcelno)::text = (d.parcelno)::text) AND (s3_2.parcelno IS NOT NULL))
-> Index Scan using scandata03_2015_pl_indx on scans.scandata03_2015 s3_3 (cost=0.42..8.21 rows=1 width=54) (actual time=0.013..0.013 rows=0 loops=14)
Output: s3_3.sdate, s3_3.stime, s3_3.lorry, s3_3.zc5x3, s3_3.country, s3_3.emadr1, s3_3.emadr2, s3_3.parcelno
Index Cond: (((s3_3.parcelno)::text = (d.parcelno)::text) AND (s3_3.parcelno IS NOT NULL))
-> Index Scan using scandata03_2016_pl_indx on scans.scandata03_2016 s3_4 (cost=0.43..8.31 rows=1 width=55) (actual time=0.019..0.020 rows=2 loops=14)
Output: s3_4.sdate, s3_4.stime, s3_4.lorry, s3_4.zc5x3, s3_4.country, s3_4.emadr1, s3_4.emadr2, s3_4.parcelno
Index Cond: (((s3_4.parcelno)::text = (d.parcelno)::text) AND (s3_4.parcelno IS NOT NULL))
Planning time: 4.670 ms
Execution time: 701.550 ms
【问题讨论】:
解释分析可能有用。 请edit您的问题添加create table
为有问题的表(包括所有索引)和使用explain (analyze, verbose)
生成的执行计划的语句。 Formatted 文本 请no screen shots
注意:AND s3.parcelno IS NOT NULL
会将左连接变成普通连接。
注 2:ORDER BY s3.sdate, s3.stime DESC
看起来不对。为什么不将日期+时间组合成时间戳?而且,既然您似乎对最近的 s3
记录感兴趣,为什么不选择最近的 s3 记录,而不是按 + distinct 的(错误)顺序?
IMO ORDER BY
用于指示来自 s3 的哪个详细记录将显示在 DISTINCT ON ...
我的猜测是 d 和 s3 之间存在 1:N 关系。
【参考方案1】:
在我看来有很多不必要的嵌套。检查这是否在功能上等效
select distinct on (d.parcelno) d.*
from
dispatcher.detour_avis d
inner join
scans.scandata03 s3 on d.parcelno = s3.parcelno
left join
scans.scandata14 s14 on d.parcelno = s14.parcelno
where
d.ddate > now() - interval '5 day'
and d.parcelshop_id is not null and parcelshop_id != emadr2
and (
s14.parcelno is null or
(s14.sdate + s14.stime)::timestamp < (s3.sdate + s3.stime)::timestamp
)
order by d.parcelno
当您执行left join
并在where
子句中放入包含正确表列连接条件的s3.parcelno is not null
条件时,您实际上是在执行inner join
。所以我只是从where
子句中删除了它,并将left
变成了inner join
【讨论】:
注意:IMO 你可以通过将LEFT JOIN s14
和IS NULL OR ...
放入NOT EXISTS(... s14 ...)
构造来摆脱它。
很遗憾没有,我在原帖中添加了更好的解释。以上是关于PostgreSQL 查询性能和可能的优化的主要内容,如果未能解决你的问题,请参考以下文章