选择下的 SQL 函数使其非常慢

Posted

技术标签:

【中文标题】选择下的 SQL 函数使其非常慢【英文标题】:SQL a Function under Select make it very Slow 【发布时间】:2021-10-27 13:46:24 【问题描述】:

我正在执行 SQL 选择查询。在[E]字段下,有一个名为“[defaultDB].[dbo].HighFirst”的存储函数,用于判断“高”或“低”先发生。

很遗憾,这个判断功能让查询超级慢。如果我跳过该函数(将函数替换为 1),则只需 2 秒即可完成查询。但是,如果我使用该功能,则需要 30 分钟以上。是否可以不使用store功能,直接将判断嵌入到查询中?

USE defaultDB; 
IF OBJECT_ID (N'[defaultDB].[dbo].[HighFirst]', N'FN') IS NOT NULL  
    DROP FUNCTION HighFirst;  
GO  

CREATE FUNCTION HighFirst(@Date INT, @Stime INT, @ETime INT, @High Float, @Low Float)  
RETURNS BIT
AS   
BEGIN  
    Declare @result Bit
        IF          
            (SELECT min([time]) FROM [defaultDB].[dbo].[Table2]    WHERE [date] = @Date AND [Start] >= @Stime AND [Time]<= @etime and [H] > @high) <=   (SELECT min([time]) FROM [defaultDB].[dbo].[Table2]    WHERE [date] = @Date AND [Start] >= @Stime AND [Time]<= @etime and [L] < @Low)
            Set @result = 1
        ELSE 
            Set @result = 0; 
        Return @result  
END; 



Select 
 [Stime], 
 [Etime],
 [target],
 [CL],
 [WinRate] = count(CASE WHEN [Max] > [target]  THEN 1 END)  / cast( count(*) as float),
  [E] = AVG( Case 
    WHEN [Max] >= [target] AND [Min]< [CL] THEN IIF ([defaultDB].[dbo].HighFirst([date], [Stime], [Etime], [Target], [CL]) =1 , [Target] ,[CL])         
    WHEN [Max] >= [target] AND [Min] > [CL] THEN [target]  /*HIT*/
    WHEN [Max] < [target] AND [Min] < [CL] THEN [CL] /*CL*/
    WHEN [Max] < [target] AND [Min] > [CL] THEN [EndReturn] /*both not */
 END ) , 
 [count] = count(*)
 FROM [defaultDB].[dbo].[Table1] M

 JOIN (values (0.003), (0.0035), (0.004), (0.0045), (0.005), (0.0055), (0.006), (0.0065), (0.007), (0.0075), (0.008) ) as T([target])  
 on 1 =1 

 JOIN (values (-0.003), (-0.0035), (-0.004), (-0.0045), (-0.005), (-0.0055), (-0.006), (-0.0065), (-0.007), (-0.0075), (-0.008), (-0.0085), (-0.009), (-0.0095), (-0.01), (-0.0105), (-0.011), (-0.0115)  ) as C([CL]) 
  
  on  [M].[date] in ( 20120307,20120601,20121109,20130826,20131002,20140117,20140122,20140311,20140529,20140718,20150619,20151014,20151022,20160411,20160516,20160721,20160818,20160909,20170127,20170213,20170921,20171025,20171229,20180116,20180315,20180926,20181022,20181128,20181211,20190104,20190329,20190502,20190521,20190528,20190611,20190627,20190823,20190930,20191104,20191211,20200214,20200318,20200529,20200706,20200828,20201230,20210112,20210305,20210318,20210408,20210525,20210617,20210625)
 AND [Stime] >= 133000  
  
  Group By [Stime], [Etime], [Target], [CL]

【问题讨论】:

如果你把你的函数变成一个内联表值函数,并在FROM中引用(通过APPLY)我怀疑性能会好很多。 旁注:JOIN ON 1=1 很傻,就写CROSS JOIN。我希望Table1.date 不是int 专栏,这很奇怪 【参考方案1】:

我看到了两种优化方法。

稍作改动的第一种方法是将 table2 转换为内存表,然后将 HighFirst 转换为本机编译函数

另一种方法是使用外部应用而不是函数

SELECT IIF (TargetTime.MinTime <= CLTime.MinTime , [Target] ,[CL])  
FROM [defaultDB].[dbo].[Table1] AS M
JOIN(VALUES(0.003), (0.0035), (0.004), (0.0045), (0.005), (0.0055), (0.006), (0.0065), (0.007), (0.0075), (0.008)) AS T([target])
    ON 1 = 1
JOIN(VALUES(-0.003), (-0.0035), (-0.004), (-0.0045), (-0.005), (-0.0055), (-0.006), (-0.0065), (-0.007), (-0.0075), (-0.008), (-0.0085), (-0.009), (-0.0095), (-0.01), (-0.0105), (-0.011), (-0.0115)) AS C([CL])
    ON 1 = 1
OUTER APPLY
            (
             SELECT MIN([time]) AS MinTime
             FROM [defaultDB].[dbo].[Table2]
             WHERE [date] = M.[DATE]
                   AND [Start] >= M.Stime
                   AND [Time] <= M.etime
                   AND [H] > T.target
            ) as TargetTime
OUTER APPLY
            (
             SELECT MIN([time]) AS MinTime
             FROM [defaultDB].[dbo].[Table2]
             WHERE [date] = M.[DATE]
                   AND [Start] >= M.Stime
                   AND [Time] <= M.etime
                   AND L < C.CL
            ) as CLTime
WHERE [M].[date] IN(20120307, 20120601, 20121109, 20130826, 20131002, 20140117, 20140122, 20140311, 20140529, 20140718, 20150619, 20151014, 20151022, 20160411, 20160516, 20160721, 20160818, 20160909, 20170127, 20170213, 20170921, 20171025, 20171229, 20180116, 20180315, 20180926, 20181022, 20181128, 20181211, 20190104, 20190329, 20190502, 20190521, 20190528, 20190611, 20190627, 20190823, 20190930, 20191104, 20191211, 20200214, 20200318, 20200529, 20200706, 20200828, 20201230, 20210112, 20210305, 20210318, 20210408, 20210525, 20210617, 20210625)
  AND [Stime] >= 133000; 

【讨论】:

谢谢!!!惊人的!!!过去需要一个半小时以上。现在需要 1 分 28 秒!!!有什么办法可以进一步提高速度? 应用您的建议后,sql 查询速度提高了 10 倍。但是,它仍然不够快。因此,我尝试将 GPU 与炽热的 SQL 一起使用。不幸的是......它不支持“APPLY”运算符......我应该如何修改sql?我应该使用加入吗?【参考方案2】:

按照函数使用模式,您可以尝试重新定义函数以返回值而不是标志,这样您就不需要 IIF 检查。您可以使用条件聚合来做到这一点。这样 [defaultDB].[dbo].[Table2] 只扫描一次:

CREATE FUNCTION HighFirstVal(@Date INT, @Stime INT, @ETime INT, @High Float, @Low Float)  
RETURNS float
AS
BEGIN   
    RETURN (SELECT case when 
                min(case when [H] > @high then [time] end) <= min(case when [L] < @Low then [time] end) 
          then @high else @low end 
    FROM [defaultDB].[dbo].[Table2]    
    WHERE [date] = @Date AND [Start] >= @Stime AND [Time]<= @etime)
END  

或者您可以在将参数替换为列后直接在查询中使用 RETURN 中的子查询。

【讨论】:

【参考方案3】:

您可以将此函数转换为内联表值函数这可能会执行得更好。

CREATE FUNCTION dbo.HighFirst
  (@Date INT, @Stime INT, @ETime INT, @High Float, @Low Float)  
RETURNS TABLE
AS RETURN

SELECT Result =
  CASE WHEN MIN(CASE WHEN t2.H > @high THEN t2.[time] END) >
            MIN(CASE WHEN t2.L < @Low  THEN t2.[time] END)
      THEN @High ELSE @Low END
FROM dbo.Table2 t2
WHERE t2.[date] = @Date
  AND t2.Start >= @Stime
  AND t2.[Time] <= @etime;

GO

你可以像这样使用APPLY

Select 
 [Stime], 
 [Etime],
 [target],
 [CL],
 [WinRate] = count(CASE WHEN [Max] > [target]  THEN 1 END)  / cast( count(*) as float),
  [E] = AVG( Case 
    WHEN [Max] >= [target] AND [Min]< [CL] THEN h.Result
    WHEN [Max] >= [target] AND [Min] > [CL] THEN [target]  /*HIT*/
    WHEN [Max] < [target] AND [Min] < [CL] THEN [CL] /*CL*/
    WHEN [Max] < [target] AND [Min] > [CL] THEN [EndReturn] /*both not */
 END ) , 
 [count] = count(*)
 FROM [defaultDB].[dbo].[Table1] M
 CROSS APPLY [defaultDB].[dbo].HighFirst([date], [Stime], [Etime], [Target], [CL]) h

 JOIN (values (0.003), (0.0035), (0.004), (0.0045), (0.005), (0.0055), (0.006), (0.0065), (0.007), (0.0075), (0.008) ) as T([target])  
 on 1 =1 

 JOIN (values (-0.003), (-0.0035), (-0.004), (-0.0045), (-0.005), (-0.0055), (-0.006), (-0.0065), (-0.007), (-0.0075), (-0.008), (-0.0085), (-0.009), (-0.0095), (-0.01), (-0.0105), (-0.011), (-0.0115)  ) as C([CL]) 
  
  on  [M].[date] in ( 20120307,20120601,20121109,20130826,20131002,20140117,20140122,20140311,20140529,20140718,20150619,20151014,20151022,20160411,20160516,20160721,20160818,20160909,20170127,20170213,20170921,20171025,20171229,20180116,20180315,20180926,20181022,20181128,20181211,20190104,20190329,20190502,20190521,20190528,20190611,20190627,20190823,20190930,20191104,20191211,20200214,20200318,20200529,20200706,20200828,20201230,20210112,20210305,20210318,20210408,20210525,20210617,20210625)
 AND [Stime] >= 133000  
  
  Group By [Stime], [Etime], [Target], [CL]

【讨论】:

我相信瓶颈是“功能”。我尝试通过直接方式应用相同的 sql 查询并将 sql 查询放入函数中。直接方式要快得多

以上是关于选择下的 SQL 函数使其非常慢的主要内容,如果未能解决你的问题,请参考以下文章

通过 *** 的 Oracle 和 MS SQL 查询非常慢

在 Azure SQL 上运行非常缓慢的外部表上选择

如何使用 SQL 分析函数选择工资低于部门平均水平的员工?

MS Access 直通查询 - 使用 ODBC 连接字符串时非常慢,手动选择数据源时速度快 - 问题出在哪里

sql标量值函数非常慢

MySQL 选择查询非常慢