如何将子查询转换为连接以获得快速结果?

Posted

技术标签:

【中文标题】如何将子查询转换为连接以获得快速结果?【英文标题】:How to Convert Sub Query to Joins for Fast Result? 【发布时间】:2019-05-28 22:37:54 【问题描述】:

我想将子查询转换为连接以提高性能。

以下子查询需要很长时间才能加载。

SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
     (SELECT TOP 1 b.Level 
      from Microframe.dbo.TrackMessages b 
      where b.IMEI = a.IMEI 
            AND b.Timestamp >= @Start 
      order by b.Timestamp ) AS Level,
    (select top 1 b.Timestamp 
     from Microframe.dbo.TrackMessages b 
     where b.IMEI = a.IMEI 
           AND b.Timestamp >= @Start 
     order by b.Timestamp ) AS TimeStamp,
    (SELECT top 1 b.Temp 
     from Microframe.dbo.TrackMessages b 
     where b.IMEI = a.IMEI 
           AND b.Timestamp >= @Start 
     order by b.Timestamp ) AS Temp 
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)

【问题讨论】:

您正在执行相同的相关子查询 3 次不同的时间。使用 CTE 通过 IMEI(或 CROSS APPLY)检索所有 3 列。还要确保检查Microframe.dbo.TrackMessages 表上的索引和碎片,确保至少有一个IMEI, Timestamp 按顺序排列。 您正试图多次检索 same TrackMessages 记录。在返回 Level、Timestamp、Temp 字段的 FROM 子句中添加单个子查询。如果Timestamp 被索引覆盖,您可能不需要对其进行过滤,因为您返回的是Timestamp最新 记录。您必须检查实际的执行计划 由于您似乎在处理物联网数据,因此您还应该检查ROW_NUMBER 等排名函数、窗口函数和LAST_VALUE 等分析函数。在这种情况下它可能不会比TOP 1 ORDER BY快,但它可以在更多情况下使用 @PanagiotisKanavos 请你在这里详细描述一下,或者你可以用你的方式写查询来描述我的结果? @EzLo CTE Retrive 请指导我? 【参考方案1】:

您可以将子查询移至FROM 子句并使用CROSS APPLY。由于您似乎正在处理物联网数据,您应该研究 T-SQL 的排名、窗口和分析功能。性能在很大程度上取决于表的索引。

鉴于这些表格:

create table #TrackMessages (
    Message_ID bigint primary key,
    imei nvarchar(50) ,
    [timestamp] datetime2,
    Level int,
    temp numeric(5,2)
);

create table #device (
    imei nvarchar(50) primary key,
    owner_id int        
);


create table #tbl_static_tank_info (
    tank_id int not null primary key,
    tank_name nvarchar(20),
    fuel_type nvarchar(20),
    capacity numeric(9,2),
    owner_id int,
    client_id int
 )

和索引:

create nonclustered index IX_MSG_IMEI_Time on #TrackMessages (imei,timestamp) include(level,temp)       ;
create INDEX IX_Device_OwnerID on #device (Owner_ID)
create INDEX IX_Tank_Client on #tbl_static_tank_info (Client_ID);
create INDEX IX_Tank_Owner  on #tbl_static_tank_info (Owner_ID);

TOP 1 查询如下所示:

SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
     Level,
    TimeStamp,
    Temp 
FROM #device as a
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
cross apply (SELECT top 1 imei,Temp,Level,timestamp 
            from #TrackMessages b 
            where b.IMEI = a.imei
           AND b.Timestamp >= @start 
     order by b.Timestamp ) msg 
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)

如果tank、devices和messages之间存在1-M关系,则可以使用FIRST_VALUE解析函数返回第一条记录ber设备,而无需使用子查询:

SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
        first_value(Temp) over (partition by b.imei order by timestamp) as temp,
        first_value(Level) over (partition by b.imei order by timestamp) as level,
        min(timestamp)  over (partition by b.imei) as timestamp
from #TrackMessages b 
    inner join #device as a on b.IMEI = a.imei
    inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)

性能在很大程度上取决于索引、表统计信息以及索引和OVER 顺序是否匹配。

可以使用LAST_VALUE 修改此查询以返回每个设备的第一个和最后一个值:

SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
        first_value(Temp) over (partition by b.imei order by timestamp) as StartTemp,
        first_value(Level) over (partition by b.imei order by timestamp) as StartLevel,
        min(timestamp)  over (partition by b.imei) as StartTime,
        last_value(Temp) over (partition by b.imei order by timestamp) as EndTemp,
        lastt_value(Level) over (partition by b.imei order by timestamp) as EndLevel,
        max(timestamp)  over (partition by b.imei) as EndTime   
from #TrackMessages b 
    inner join #device as a on b.IMEI = a.imei
    inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)

服务器必须按时间戳升序(IX_MSG_IMEI_Time 索引已经这样做)和降序对测量值进行排序。

【讨论】:

【参考方案2】:

这是CROSS APPLY 的解决方案,它就像一个函数,您可以随时声明 并将其用作连接子句。如果返回集可能不存在,您可以将CROSS APPLY 更改为OUTER APPLY,在这种情况下,如果TrackMessages 上可能没有特定IMEI 的任何记录(将返回NULL 值)。

SELECT 
    c.tank_name, 
    c.fuel_type, 
    c.capacity, 
    c.tank_id,

    T.Level,
    T.Timestamp,
    T.Temp

FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
CROSS APPLY (

    SELECT TOP 1 -- Retrieve only the first record

        -- And return as many columns as you need
        b.Level,
        b.Timestamp,
        b.Temp
    FROM
        Microframe.dbo.TrackMessages AS b
    WHERE
        a.IMEI = b.IMEI AND -- With matching IMEI
        b.Timestamp >= @Start
    ORDER BY
        b.Timestamp) T -- Ordered by Timestamp

WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (Tanks)

但是我相信这里的关键点是您的表上的索引。如果您已经确定问题出在子查询上,请确保 TrackMessages 具有以下索引:

CREATE NONCLUSTERED INDEX NCI_TrackMessages_IMEI_TimeStamp ON Microframe.dbo.TrackMessages (IMEI, Timestamp)

索引有利有弊,请务必在创建或删除索引之前检查它们。

【讨论】:

【参考方案3】:

没有结构,我的解决方案是:

WITH CTE AS 
    (SELECT B.IMEI,
            b.Level, 
            b.Timetamp,
            b.Temp,
            ROW_NUMBER() OVER (PARTITION BY b.IMEI ORDER BY Timestamp) AS Row
      FROM Microframe.dbo.TrackMessages b 
      WHERE b.Timestamp >= @Start 
    )
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
    CTE.Level, CTE.Timestamp, CTE.Temp
FROM GatexServerDB.dbo.device as a
INNER JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
INNER JOIN CTE ON  CTE.IMEI = a.IMEI 
WHERE c.client_id = 65
  AND a.IMEI IS NOT NULL
  AND c.tank_id IN (Tanks)
  AND CTE.Row = 1;

我无法测试它,但它应该非常接近解决方案。请确认它是否有效。

【讨论】:

您应该在CTE.Row = 1 上的WHERE 上添加过滤器,以确保您只收到1 行而不是所有跟踪消息。还需要去掉 CTE 上的ORDER BY【参考方案4】:

您可以比较并使用以下任一解决方案

通过行号窗口函数完成排序的JOIN方式

SELECT * FROM 
(
    SELECT 
        c.tank_name, 
        c.fuel_type, 
        c.capacity, 
        c.tank_id,
        Level=b.Level,
        TimeStamp=b.Timestamp,
        Temp=b.Temp,
        r=Row_number() over ( order by b.timestamp)
    FROM GatexServerDB.dbo.device as a
        JOIN GatexReportsDB.dbo.tbl_static_tank_info as c 
            ON c.tank_id = a.owner_id
        JOIN Microframe.dbo.TrackMessages as b 
            ON b.IMEI = a.IMEI AND b.Timestamp >= @Start 
    WHERE c.client_id = 65
    AND a.IMEI IS NOT NULL
    AND c.tank_id IN (Tanks)
)T
where r=1

或像下面这样的 CROSS APPLY 方式

SELECT * FROM 
(
    SELECT 
        c.tank_name, c.fuel_type, c.capacity, c.tank_id
    FROM GatexServerDB.dbo.device as a
        JOIN GatexReportsDB.dbo.tbl_static_tank_info as c 
            ON c.tank_id = a.owner_id
            AND c.client_id = 65
            AND a.IMEI IS NOT NULL
            AND c.tank_id IN (Tanks)
) A
CROSS APPLY 
(
    SELECT 
        TOP 1 
        b.Level, b.Timestamp,b.Temp 
    FROM Microframe.dbo.TrackMessages b
    WHERE b.IMEI = a.IMEI 
        AND b.Timestamp >= @Start 
    ORDER BY b.Timestamp 
)D

【讨论】:

以上是关于如何将子查询转换为连接以获得快速结果?的主要内容,如果未能解决你的问题,请参考以下文章

帮助将子查询转换为带连接的查询

Python-Sqlalchemy-Postgres:如何将子查询结果存储在变量中并将其用于主查询

将子查询的结果聚合为逗号分隔值

Laravel 6,MYSQL - 如何使用 Laravel Querybuilder 或 Model Eloquent 将子查询与 GroupBY 左连接?

将子查询转换为 JSON 性能

将子查询转换为单个查询 Hive