如何改进包含存储过程使用的多个自联接的视图
Posted
技术标签:
【中文标题】如何改进包含存储过程使用的多个自联接的视图【英文标题】:How to improve a view containing multiple self joins used by a stored procedure 【发布时间】:2020-10-30 00:13:56 【问题描述】:我有一个非常慢的存储过程(需要 5-6 分钟才能得到结果)包含一些表和一个视图。我相信是视图中的多个自连接部分使存储过程变慢。这里表 A 是一个 700,000 行的表,B 是一个 20 行的表。
表 A
id | status_key | status_date | seq
10035 2 2020-10-01 1
10035 3 2020-10-03 2
10049 2 2020-06-10 1
10049 3 2020-06-13 2
10049 4 2020-06-17 3
10049 5 2020-07-03 4
表 B
status_key | status_name
2 | accepted
3 | conditionally accepted
4 | decided
5 | declined
景色
SELECT a1.status_key as current_status_key,
b1.status_name as current_status_name,
a1.status_date as current_status_date,
a2.status_key as previous_status_key,
b2.status_name as precious_status_name,
a2.status_date as previous_status_date,
a3.status_key as next_status_key,
b3.status_name as next_status_name,
a3.status_date as next_status_date,
a4.status_key as next_2_status_key,
b4.status_name as next_2_status_name,
a4.status_date as next_2_status_date,
FROM A a1
INNER JOIN B b1 ON a1.status_key = b1.status_key
LEFT JOIN A a2 ON a1.id = a2.id AND a1.seq = a2.seq + 1
LEFT JOIN B b2 ON a2.status_key = b2.status_key
LEFT JOIN A a3 ON a1.id = a3.id AND a1.seq = a3.seq - 1
LEFT JOIN B b3 ON a3.status_key = b3.status_key
LEFT JOIN A a4 ON a1.id = a4.id AND a1.seq = a4.seq - 2
LEFT JOIN B b4 ON a4.status_key = b4.status_key
想要的结果
id | current_status_key | current_status_name | current_status_date | previous_status_key | previous_status_name | previous_status_date | next_status_key | next_status_name | next_status_date | next_2_status_key | next_2_status_name | next_2_status_date
10035 | 2 | accepted | 2020-10-01 | NULL | NULL | NULL | 3 | conditionally accepted | 2020-10-03 | NULL | NULL | NULL
10035 | 3 | conditionally accepted | 2020-10-03 | 2 | accepted | 2020-10-01 | | NULL | NULL | NULL | NULL | NULL | NULL
如何通过重写这部分来改善我的观点?我正在考虑使用 CTE 来分离上述部分。有什么想法吗?
【问题讨论】:
请提供样本数据、期望的结果以及您想要达到的目标的说明。 程序代码是高度特定于供应商的 - 所以请添加一个标签来指定您是否使用mysql
、postgresql
、sql-server
、oracle
或db2
- 或完全不同的东西。
【参考方案1】:
您可以改用 LAG 和 LEAD 来计算相关的状态键,而不是 A 上的自联接。这希望这意味着它只需要从 A 读取行一次 - 但需要根据您的数据库进行测试/等等
这是上面的 SQL 示例。注意 - SQL 已在问题更新提供数据后更新。
WITH a1 AS
(SELECT A.ID,
A.status_key AS a1_status_key,
A.status_date AS a1_status_date,
LAG(A.status_key, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a2_status_key,
LAG(A.status_date, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a2_status_date,
LEAD(A.status_key, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a3_status_key,
LEAD(A.status_date, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a3_status_date,
LEAD(A.status_key, 2) OVER (PARTITION BY A.id ORDER BY A.seq) AS a4_status_key,
LEAD(A.status_date, 2) OVER (PARTITION BY A.id ORDER BY A.seq) AS a4_status_date
FROM A
)
SELECT a1.id,
a1.a1_status_key as current_status_key,
b1.status_name as current_status_name,
a1.a1_status_date as current_status_date,
a1.a2_status_key as previous_status_key,
b2.status_name as previous_status_name,
a1.a2_status_date as previous_status_date,
a1.a3_status_key as next_status_key,
b3.status_name as next_status_name,
a1.a3_status_date as next_status_date,
a1.a4_status_key as next_2_status_key,
b4.status_name as next_2_status_name,
a1.a4_status_date as next_2_status_date
FROM a1
LEFT JOIN B b1 ON a1.a1_status_key = b1.status_key
LEFT JOIN B b2 ON a1.a2_status_key = b2.status_key
LEFT JOIN B b3 ON a1.a3_status_key = b3.status_key
LEFT JOIN B b4 ON a1.a4_status_key = b4.status_key;
这是一个使用临时表的db<>fiddle。
如果聚集索引位于id, seq
上,我认为它也会有很大帮助。如果表 A 实际上更大并且具有其他值,则在这两列上使用非聚集索引,然后包含其他相关列可能会更好,例如 id, seq, status_date, status_key
。
以前的版本
WITH a1 AS
(SELECT ...,
LEAD(A.status_key, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a2_status_key,
LAG(A.status_key, 1) OVER (PARTITION BY A.id ORDER BY A.seq) AS a3_status_key,
LAG(A.status_key, 2) OVER (PARTITION BY A.id ORDER BY A.seq) AS a4_status_key
FROM A
)
SELECT a1.*, ...
FROM a1
LEFT JOIN B b1 ON a1.status_key = b1.status_key
LEFT JOIN B b2 ON a1.a2_status_key = b2.status_key
LEFT JOIN B b3 ON a1.a3_status_key = b3.status_key
LEFT JOIN B b4 ON a1.a4_status_key = b4.status_key;
【讨论】:
如果这些点使用了来自另一个 seqnr 的大量列,那么将会有很多LAGs
,我希望 DBMS 对每个偏移值只执行一次偏移。另一种选择是物化视图而不是普通视图,如果多次访问此视图肯定会提高性能,但会占用额外的空间。无论如何,对于这种顺序项目的情况,您的解决方案非常好。
你是对的。我使用的是基于他们使用SELECT ...
的 OP(预编辑)问题的点 - 所以我只是使用相同的,但如果他们从 a2、a3 和 a4 中选择很多列,那么是的,这将是很多领先/滞后。从他们的更新来看,他们似乎每个(日期和状态键)都使用了 2 个。我认为值得一试性能,尽管索引也会有所帮助。
谢谢肖恩。我使用了您的方法并用数据对其进行了测试。它真的奏效了!现在只需 20 秒运行存储过程即可返回前 50 行!以上是关于如何改进包含存储过程使用的多个自联接的视图的主要内容,如果未能解决你的问题,请参考以下文章