如何将子查询转换为连接以获得快速结果?
Posted
技术标签:
【中文标题】如何将子查询转换为连接以获得快速结果?【英文标题】:How to Convert Sub Query to Joins for Fast Result? 【发布时间】:2019-05-28 22:37:54 【问题描述】:我想将子查询转换为连接以提高性能。
以下子查询需要很长时间才能加载。
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
(SELECT TOP 1 b.Level
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= @Start
order by b.Timestamp ) AS Level,
(select top 1 b.Timestamp
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= @Start
order by b.Timestamp ) AS TimeStamp,
(SELECT top 1 b.Temp
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= @Start
order by b.Timestamp ) AS Temp
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)
【问题讨论】:
您正在执行相同的相关子查询 3 次不同的时间。使用 CTE 通过IMEI
(或 CROSS APPLY
)检索所有 3 列。还要确保检查Microframe.dbo.TrackMessages
表上的索引和碎片,确保至少有一个IMEI, Timestamp
按顺序排列。
您正试图多次检索 same TrackMessages 记录。在返回 Level、Timestamp、Temp 字段的 FROM
子句中添加单个子查询。如果Timestamp
被索引覆盖,您可能不需要对其进行过滤,因为您返回的是Timestamp
的最新 记录。您必须检查实际的执行计划
由于您似乎在处理物联网数据,因此您还应该检查ROW_NUMBER
等排名函数、窗口函数和LAST_VALUE
等分析函数。在这种情况下它可能不会比TOP 1 ORDER BY
快,但它可以在更多情况下使用
@PanagiotisKanavos 请你在这里详细描述一下,或者你可以用你的方式写查询来描述我的结果?
@EzLo CTE Retrive 请指导我?
【参考方案1】:
您可以将子查询移至FROM
子句并使用CROSS APPLY
。由于您似乎正在处理物联网数据,您应该研究 T-SQL 的排名、窗口和分析功能。性能在很大程度上取决于表的索引。
鉴于这些表格:
create table #TrackMessages (
Message_ID bigint primary key,
imei nvarchar(50) ,
[timestamp] datetime2,
Level int,
temp numeric(5,2)
);
create table #device (
imei nvarchar(50) primary key,
owner_id int
);
create table #tbl_static_tank_info (
tank_id int not null primary key,
tank_name nvarchar(20),
fuel_type nvarchar(20),
capacity numeric(9,2),
owner_id int,
client_id int
)
和索引:
create nonclustered index IX_MSG_IMEI_Time on #TrackMessages (imei,timestamp) include(level,temp) ;
create INDEX IX_Device_OwnerID on #device (Owner_ID)
create INDEX IX_Tank_Client on #tbl_static_tank_info (Client_ID);
create INDEX IX_Tank_Owner on #tbl_static_tank_info (Owner_ID);
TOP 1
查询如下所示:
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
Level,
TimeStamp,
Temp
FROM #device as a
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
cross apply (SELECT top 1 imei,Temp,Level,timestamp
from #TrackMessages b
where b.IMEI = a.imei
AND b.Timestamp >= @start
order by b.Timestamp ) msg
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
如果tank、devices和messages之间存在1-M关系,则可以使用FIRST_VALUE解析函数返回第一条记录ber设备,而无需使用子查询:
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
first_value(Temp) over (partition by b.imei order by timestamp) as temp,
first_value(Level) over (partition by b.imei order by timestamp) as level,
min(timestamp) over (partition by b.imei) as timestamp
from #TrackMessages b
inner join #device as a on b.IMEI = a.imei
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
性能在很大程度上取决于索引、表统计信息以及索引和OVER
顺序是否匹配。
可以使用LAST_VALUE 修改此查询以返回每个设备的第一个和最后一个值:
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
first_value(Temp) over (partition by b.imei order by timestamp) as StartTemp,
first_value(Level) over (partition by b.imei order by timestamp) as StartLevel,
min(timestamp) over (partition by b.imei) as StartTime,
last_value(Temp) over (partition by b.imei order by timestamp) as EndTemp,
lastt_value(Level) over (partition by b.imei order by timestamp) as EndLevel,
max(timestamp) over (partition by b.imei) as EndTime
from #TrackMessages b
inner join #device as a on b.IMEI = a.imei
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
服务器必须按时间戳升序(IX_MSG_IMEI_Time 索引已经这样做)和降序对测量值进行排序。
【讨论】:
【参考方案2】:这是CROSS APPLY
的解决方案,它就像一个函数,您可以随时声明 并将其用作连接子句。如果返回集可能不存在,您可以将CROSS APPLY
更改为OUTER APPLY
,在这种情况下,如果TrackMessages
上可能没有特定IMEI
的任何记录(将返回NULL
值)。
SELECT
c.tank_name,
c.fuel_type,
c.capacity,
c.tank_id,
T.Level,
T.Timestamp,
T.Temp
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
CROSS APPLY (
SELECT TOP 1 -- Retrieve only the first record
-- And return as many columns as you need
b.Level,
b.Timestamp,
b.Temp
FROM
Microframe.dbo.TrackMessages AS b
WHERE
a.IMEI = b.IMEI AND -- With matching IMEI
b.Timestamp >= @Start
ORDER BY
b.Timestamp) T -- Ordered by Timestamp
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)
但是我相信这里的关键点是您的表上的索引。如果您已经确定问题出在子查询上,请确保 TrackMessages
具有以下索引:
CREATE NONCLUSTERED INDEX NCI_TrackMessages_IMEI_TimeStamp ON Microframe.dbo.TrackMessages (IMEI, Timestamp)
索引有利有弊,请务必在创建或删除索引之前检查它们。
【讨论】:
【参考方案3】:没有结构,我的解决方案是:
WITH CTE AS
(SELECT B.IMEI,
b.Level,
b.Timetamp,
b.Temp,
ROW_NUMBER() OVER (PARTITION BY b.IMEI ORDER BY Timestamp) AS Row
FROM Microframe.dbo.TrackMessages b
WHERE b.Timestamp >= @Start
)
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
CTE.Level, CTE.Timestamp, CTE.Temp
FROM GatexServerDB.dbo.device as a
INNER JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
INNER JOIN CTE ON CTE.IMEI = a.IMEI
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)
AND CTE.Row = 1;
我无法测试它,但它应该非常接近解决方案。请确认它是否有效。
【讨论】:
您应该在CTE.Row = 1
上的WHERE
上添加过滤器,以确保您只收到1 行而不是所有跟踪消息。还需要去掉 CTE 上的ORDER BY
。【参考方案4】:
您可以比较并使用以下任一解决方案
通过行号窗口函数完成排序的JOIN方式
SELECT * FROM
(
SELECT
c.tank_name,
c.fuel_type,
c.capacity,
c.tank_id,
Level=b.Level,
TimeStamp=b.Timestamp,
Temp=b.Temp,
r=Row_number() over ( order by b.timestamp)
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c
ON c.tank_id = a.owner_id
JOIN Microframe.dbo.TrackMessages as b
ON b.IMEI = a.IMEI AND b.Timestamp >= @Start
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)
)T
where r=1
或像下面这样的 CROSS APPLY 方式
SELECT * FROM
(
SELECT
c.tank_name, c.fuel_type, c.capacity, c.tank_id
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c
ON c.tank_id = a.owner_id
AND c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)
) A
CROSS APPLY
(
SELECT
TOP 1
b.Level, b.Timestamp,b.Temp
FROM Microframe.dbo.TrackMessages b
WHERE b.IMEI = a.IMEI
AND b.Timestamp >= @Start
ORDER BY b.Timestamp
)D
【讨论】:
以上是关于如何将子查询转换为连接以获得快速结果?的主要内容,如果未能解决你的问题,请参考以下文章
Python-Sqlalchemy-Postgres:如何将子查询结果存储在变量中并将其用于主查询
Laravel 6,MYSQL - 如何使用 Laravel Querybuilder 或 Model Eloquent 将子查询与 GroupBY 左连接?