理解/优化 Postgresql 中的 SQL 查询
Posted
技术标签:
【中文标题】理解/优化 Postgresql 中的 SQL 查询【英文标题】:Understand/Optimize SQL query in Postgresql 【发布时间】:2019-12-04 19:09:43 【问题描述】:所以我有一个查询,我在最坏的情况下运行它,它需要 10-12 分钟。 如果我删除 where 查询的时间检查,它会下降到 20-30 秒,所以我想知道如何优化它?
我尝试在时间戳到时间的转换上添加索引,但它并没有真正帮助...... rs 表(register_status)有超过 7000 万行,register_date 大约 280k,cp 一个不到 1k。
查询的想法是获取在一段时间内按状态分组的 CP 的所有结果,包括在时间范围内。这是最坏的情况,所以这是数据库中的第一个日期,如果用户选择一整天作为时间范围。查询如下:
explain analyze SELECT
COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
cp.id as charge_point_id,
cp.address,
cp.type as charge_point_type,
cp.latitude,
cp.longitude
FROM register_date rd
inner join register_status rs on rs.fk_register_date = rd.id
inner join charge_point cp on cp.id = rs.fk_charge_point
WHERE
rd.date::date >= '2016-11-01' and rd.date::date <= '2019-08-01'
AND
rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
group by cp.id
而EXPLAIN ANALYZE的结果是下面这个,我可以看到很多空间使用...
"Finalize GroupAggregate (cost=34412.78..34536.10 rows=780 width=124) (actual time=689440.380..699740.172 rows=813 loops=1)"
" Group Key: cp.id"
" -> Gather Merge (cost=34412.78..34519.27 rows=722 width=124) (actual time=689421.445..699736.996 rows=1579 loops=1)"
" Workers Planned: 1"
" Workers Launched: 1"
" -> Partial GroupAggregate (cost=33412.77..33438.04 rows=722 width=124) (actual time=649515.576..659674.461 rows=790 loops=2)"
" Group Key: cp.id"
" -> Sort (cost=33412.77..33414.57 rows=722 width=96) (actual time=649496.720..654001.697 rows=24509314 loops=2)"
" Sort Key: cp.id"
" Sort Method: external merge Disk: 2649104kB"
" Worker 0: Sort Method: external merge Disk: 2652840kB"
" -> Nested Loop (cost=0.56..33378.49 rows=722 width=96) (actual time=1.343..504948.423 rows=24509314 loops=2)"
" -> Parallel Seq Scan on register_date rd (cost=0.00..6443.69 rows=4 width=4) (actual time=0.021..294.724 rows=139760 loops=2)"
" Filter: (((date)::date >= '2016-11-01'::date) AND ((date)::date <= '2019-08-01'::date) AND ((date)::time without time zone >= '00:00:00'::time without time zone) AND ((date)::time without time zone <= '23:59:00'::time without time zone))"
" -> Nested Loop (cost=0.56..6725.90 rows=780 width=100) (actual time=0.077..3.574 rows=175 loops=279519)"
" -> Seq Scan on charge_point cp (cost=0.00..21.80 rows=780 width=92) (actual time=0.002..0.077 rows=813 loops=279519)"
" -> Index Only Scan using register_status_fk_charge_point_fk_register_date_status_key on register_status rs (cost=0.56..8.58 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=227248947)"
" Index Cond: ((fk_charge_point = cp.id) AND (fk_register_date = rd.id))"
" Heap Fetches: 49018627"
"Planning Time: 0.506 ms"
"Execution Time: 700065.010 ms"
【问题讨论】:
所以您有一个名为date
的 timestamp
列。
【参考方案1】:
横向连接可能会更快:
SELECT cp.*, rd.*
FROM charge_point cp CROSS JOIN LATERAL
(SELECT COUNT(*) filter (where rs.status = 'Occ') as total_occ,
COUNT(*) filter (where rs.status = 'Part') as total_part,
COUNT(*) filter (where rs.status = 'OOS') as total_oos,
COUNT(*) filter (where rs.status = 'OOC') as total_ooc,
FROM register_date rd JOIN
register_status rs
ON rs.fk_register_date = rd.id
WHERE cp.id = rs.fk_charge_point AND
rd.date >= '2016-11-01' and
rd.date < '2019-08-01' + interval '1 day'
) rd;
建议使用register_date(fk_charge_point, date)
和register_status(id, status)
上的索引。
请注意,我更改了日期比较,因此它们对索引更友好。我认为没有理由按 time
过滤,所以我删除了这些条件。
【讨论】:
注意:) cp;
[加:我认为您在此处需要 LATERAL,只有不拆分时间戳就足够了,IMO]
我确实需要时间,因为它可能只需要日期时间在早上 5 点到下午 6 点之间的结果,所以正如我所说,我在最坏的情况下运行查询场景,如果用户选择从 00:00 到 23:59 ,这没有意义,但仍然可能。
@user3107720 。 . .然后重新添加条件。横向连接仍应使用计费点和日期的索引。【参考方案2】:
我使用 Gordon 的方法开发了一个新查询,结果速度更快,从 10-12 分钟到 20-40 秒:
SELECT cp.*, rd.* from charge_point cp cross join lateral
(select
COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
rs.fk_charge_point as cpid
FROM register_date rd
inner join register_status rs on rs.fk_register_date = rd.id
WHERE
rd.date::date >= '2019-02-01' and rd.date::date <= '2019-08-01'
AND
rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
group by rs.fk_charge_point) rd
where cp.id = rd.cpid
我仍然需要检查添加任何索引是否会使其更快,但目前看起来不错
【讨论】:
以上是关于理解/优化 Postgresql 中的 SQL 查询的主要内容,如果未能解决你的问题,请参考以下文章