查询优化。重复子查询

Posted

技术标签:

【中文标题】查询优化。重复子查询【英文标题】:Query optimization. Duplicate subqueries 【发布时间】:2016-08-11 15:58:45 【问题描述】:

我们在旧系统中发现了一个缓慢的查询。我在查询中看到的是一个重复的片段。这是完整的查询:

DECLARE @SellerId INT;
DECLARE @DateFrom DATETIME;
DECLARE @DateTo DATETIME;

SET @SellerId = 5396884;
SET @DateFrom = '2016-01-05';
SET @DateTo = '2016-10-08';

DECLARE @CurrentDate DATETIME;
SET @CurrentDate = GETDATE();



CREATE TABLE #ReportDate (codes INT, dates DATETIME);
DECLARE @dif as INT;
DECLARE @cont as INT;
DECLARE @currdate as DATETIME;
SET @dif = DATEDIFF(day, @DateFrom, @DateTo);
SET @cont = 1;
SET @currdate = @DateFrom - 1;
WHILE (@cont <= @dif + 1)
BEGIN
    SET @currdate = DATEADD(DAY, 1, @currdate);
    INSERT INTO #ReportDate VALUES (@cont, @currdate);
    SET @cont = @cont + 1;
END


/* HOW TO OPTIMIZE THIS ONE? */
SELECT
        #ReportDate.dates as valid_date,
        (
          SELECT 

          COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives

          FROM      
                (

                    SELECT *

                    FROM salesView

                    WHERE

                        salesView.sell_id NOT IN 
                            (
                              SELECT sell_id

                              FROM salesStates

                              WHERE
                                  salesStates.aborted = 1
                            ) 

                  ) nonCancelledSales

          WHERE
                nonCancelledSales.seller_id = @SellerId AND
                nonCancelledSales.cancelled = 0 AND
                nonCancelledSales.void = 0 AND
                nonCancelledSales.hasDiscount = 0 AND
                nonCancelledSales.dateOfSale <=  #ReportDate.dates AND
                nonCancelledSales.currentState =  (SELECT   MAX(hveest.date)

                                              FROM  salesStates hveest

                                              WHERE 
                                                    hveest.sell_id = nonCancelledSales.sell_id AND
                                                    hveest.date <= #ReportDate.dates) AND
                nonCancelledSales.lastProductDate = (SELECT     MAX(hvepro.date)

                                              FROM  productHistory hvepro

                                              WHERE 
                                                    hvepro.sell_id = nonCancelledSales.sell_id AND
                                                    hvepro.date <= #ReportDate.dates) 

        ) total_actives,

        (
          SELECT 

          ISNULL(SUM(nonCancelledSales.paymentValue),0) as active

          FROM      
                (

                    SELECT *

                    FROM salesView

                    WHERE

                        salesView.sell_id NOT IN 
                            (
                              SELECT sell_id

                              FROM salesStates

                              WHERE
                                  salesStates.aborted = 1
                            ) 

                  ) nonCancelledSales

          WHERE
                nonCancelledSales.seller_id = @SellerId AND
                nonCancelledSales.cancelled = 0 AND
                nonCancelledSales.void = 0 AND
                nonCancelledSales.hasDiscount = 0 AND
                nonCancelledSales.dateOfSale <=  #ReportDate.dates AND
                nonCancelledSales.currentState =  (SELECT   MAX(hveest.date)

                                              FROM  salesStates hveest

                                              WHERE 
                                                    hveest.sell_id = nonCancelledSales.sell_id AND
                                                    hveest.date <= #ReportDate.dates) AND
                nonCancelledSales.lastProductDate = (SELECT     MAX(hvepro.date)

                                              FROM  productHistory hvepro

                                              WHERE 
                                                    hvepro.sell_id = nonCancelledSales.sell_id AND
                                                    hvepro.date <= #ReportDate.dates)             
        ) active
FROM 
        #ReportDate
GROUP BY
        #ReportDate.dates



DROP TABLE #ReportDate

这是我看到的两个重复片段:

(
          SELECT 

          COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives

          FROM      
                (

                    SELECT *

                    FROM salesView

                    WHERE

                        salesView.sell_id NOT IN 
                            (
                              SELECT sell_id

                              FROM salesStates

                              WHERE
                                  salesStates.aborted = 1
                            ) 

                  ) nonCancelledSales

          WHERE
                nonCancelledSales.seller_id = @SellerId AND
                nonCancelledSales.cancelled = 0 AND
                nonCancelledSales.void = 0 AND
                nonCancelledSales.hasDiscount = 0 AND
                nonCancelledSales.dateOfSale <=  #ReportDate.dates AND
                nonCancelledSales.currentState =  (SELECT   MAX(hveest.date)

                                              FROM  salesStates hveest

                                              WHERE 
                                                    hveest.sell_id = nonCancelledSales.sell_id AND
                                                    hveest.date <= #ReportDate.dates) AND
                nonCancelledSales.lastProductDate = (SELECT     MAX(hvepro.date)

                                              FROM  productHistory hvepro

                                              WHERE 
                                                    hvepro.sell_id = nonCancelledSales.sell_id AND
                                                    hvepro.date <= #ReportDate.dates) 

        ) total_actives,

        (
          SELECT 

          ISNULL(SUM(nonCancelledSales.paymentValue),0) as active

          FROM      
                (

                    SELECT *

                    FROM salesView

                    WHERE

                        salesView.sell_id NOT IN 
                            (
                              SELECT sell_id

                              FROM salesStates

                              WHERE
                                  salesStates.aborted = 1
                            ) 

                  ) nonCancelledSales

          WHERE
                nonCancelledSales.seller_id = @SellerId AND
                nonCancelledSales.cancelled = 0 AND
                nonCancelledSales.void = 0 AND
                nonCancelledSales.hasDiscount = 0 AND
                nonCancelledSales.dateOfSale <=  #ReportDate.dates AND
                nonCancelledSales.currentState =  (SELECT   MAX(hveest.date)

                                              FROM  salesStates hveest

                                              WHERE 
                                                    hveest.sell_id = nonCancelledSales.sell_id AND
                                                    hveest.date <= #ReportDate.dates) AND
                nonCancelledSales.lastProductDate = (SELECT     MAX(hvepro.date)

                                              FROM  productHistory hvepro

                                              WHERE 
                                                    hvepro.sell_id = nonCancelledSales.sell_id AND
                                                    hvepro.date <= #ReportDate.dates)             
        ) active

是否完全有必要复制查询?在他得到的第一个中:

 COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives

关于第二个:

  ISNULL(SUM(nonCancelledSales.paymentValue),0) as active

我想必须有某种方法来重写查询,但我不确定如何。

【问题讨论】:

看起来像是一次全部查询,这可以解释GROUP BY #ReportDate.dates 您还可以通过删除该循环并使用计数表来填充您的日期列表来加快此速度。这可能不是性能方面最糟糕的部分,但是让这个集合基于而不是循环非常容易。这是一篇很棒的文章,解释了计数表以及它们如何替换循环。 sqlservercentral.com/articles/T-SQL/62867 @JamieD77 你是说 group by 是多余的吗? 你没有办法加入表格吗?看起来你会的。例如,您对 salesStates 表所做的选择是针对 salesView 表查询中的每一个结果进行的。直接对列进行这样的查询通常是个坏主意。 @JuanCarlosEduardoRomainaAc,加入表格是什么意思?你的意思是创建一个新表? 【参考方案1】:

如果你使用OUTER APPLY,你可以组合这些。

想法是:

SELECT . . ., x.actives, x.active
FROM #ReportDate OUTER APPLY
     (SELECT COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives, 
             COALESCE(SUM(nonCancelledSales.paymentValue), 0) as active
      . . . -- rest of query here
     ) x;

在这种情况下,OUTER APPLY 很像 FROM 子句中的相关子查询,可以返回多行。

【讨论】:

那么使用 OUTER APPLY 是重写这个查询的唯一也是最好的方法吗?我认为外部应用是针对表值函数的 @StephenH.Anderson 。 . .这是编写逻辑的最明显方式。子查询只能返回一个值,因此它解决了这个问题。我不明白逻辑,所以可能有一些方法可以更简单地编写它。但是APPLY 实现了所谓的“横向连接”。表值函数只是一种应用。

以上是关于查询优化。重复子查询的主要内容,如果未能解决你的问题,请参考以下文章

SQL 优化器/执行计划 - 重复子查询

SQL UDF 和查询优化 [重复]

MySQL5.7性能优化系列——SQL语句优化——使用物化策略优化子查询

有关优化此多层(具有多层子查询)SQL 查询的提示

INEXISTS的相关子查询用INNER JOIN 代替--性能优化

查询性能优化