查询优化。重复子查询
Posted
技术标签:
【中文标题】查询优化。重复子查询【英文标题】:Query optimization. Duplicate subqueries 【发布时间】:2016-08-11 15:58:45 【问题描述】:我们在旧系统中发现了一个缓慢的查询。我在查询中看到的是一个重复的片段。这是完整的查询:
DECLARE @SellerId INT;
DECLARE @DateFrom DATETIME;
DECLARE @DateTo DATETIME;
SET @SellerId = 5396884;
SET @DateFrom = '2016-01-05';
SET @DateTo = '2016-10-08';
DECLARE @CurrentDate DATETIME;
SET @CurrentDate = GETDATE();
CREATE TABLE #ReportDate (codes INT, dates DATETIME);
DECLARE @dif as INT;
DECLARE @cont as INT;
DECLARE @currdate as DATETIME;
SET @dif = DATEDIFF(day, @DateFrom, @DateTo);
SET @cont = 1;
SET @currdate = @DateFrom - 1;
WHILE (@cont <= @dif + 1)
BEGIN
SET @currdate = DATEADD(DAY, 1, @currdate);
INSERT INTO #ReportDate VALUES (@cont, @currdate);
SET @cont = @cont + 1;
END
/* HOW TO OPTIMIZE THIS ONE? */
SELECT
#ReportDate.dates as valid_date,
(
SELECT
COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives
FROM
(
SELECT *
FROM salesView
WHERE
salesView.sell_id NOT IN
(
SELECT sell_id
FROM salesStates
WHERE
salesStates.aborted = 1
)
) nonCancelledSales
WHERE
nonCancelledSales.seller_id = @SellerId AND
nonCancelledSales.cancelled = 0 AND
nonCancelledSales.void = 0 AND
nonCancelledSales.hasDiscount = 0 AND
nonCancelledSales.dateOfSale <= #ReportDate.dates AND
nonCancelledSales.currentState = (SELECT MAX(hveest.date)
FROM salesStates hveest
WHERE
hveest.sell_id = nonCancelledSales.sell_id AND
hveest.date <= #ReportDate.dates) AND
nonCancelledSales.lastProductDate = (SELECT MAX(hvepro.date)
FROM productHistory hvepro
WHERE
hvepro.sell_id = nonCancelledSales.sell_id AND
hvepro.date <= #ReportDate.dates)
) total_actives,
(
SELECT
ISNULL(SUM(nonCancelledSales.paymentValue),0) as active
FROM
(
SELECT *
FROM salesView
WHERE
salesView.sell_id NOT IN
(
SELECT sell_id
FROM salesStates
WHERE
salesStates.aborted = 1
)
) nonCancelledSales
WHERE
nonCancelledSales.seller_id = @SellerId AND
nonCancelledSales.cancelled = 0 AND
nonCancelledSales.void = 0 AND
nonCancelledSales.hasDiscount = 0 AND
nonCancelledSales.dateOfSale <= #ReportDate.dates AND
nonCancelledSales.currentState = (SELECT MAX(hveest.date)
FROM salesStates hveest
WHERE
hveest.sell_id = nonCancelledSales.sell_id AND
hveest.date <= #ReportDate.dates) AND
nonCancelledSales.lastProductDate = (SELECT MAX(hvepro.date)
FROM productHistory hvepro
WHERE
hvepro.sell_id = nonCancelledSales.sell_id AND
hvepro.date <= #ReportDate.dates)
) active
FROM
#ReportDate
GROUP BY
#ReportDate.dates
DROP TABLE #ReportDate
这是我看到的两个重复片段:
(
SELECT
COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives
FROM
(
SELECT *
FROM salesView
WHERE
salesView.sell_id NOT IN
(
SELECT sell_id
FROM salesStates
WHERE
salesStates.aborted = 1
)
) nonCancelledSales
WHERE
nonCancelledSales.seller_id = @SellerId AND
nonCancelledSales.cancelled = 0 AND
nonCancelledSales.void = 0 AND
nonCancelledSales.hasDiscount = 0 AND
nonCancelledSales.dateOfSale <= #ReportDate.dates AND
nonCancelledSales.currentState = (SELECT MAX(hveest.date)
FROM salesStates hveest
WHERE
hveest.sell_id = nonCancelledSales.sell_id AND
hveest.date <= #ReportDate.dates) AND
nonCancelledSales.lastProductDate = (SELECT MAX(hvepro.date)
FROM productHistory hvepro
WHERE
hvepro.sell_id = nonCancelledSales.sell_id AND
hvepro.date <= #ReportDate.dates)
) total_actives,
(
SELECT
ISNULL(SUM(nonCancelledSales.paymentValue),0) as active
FROM
(
SELECT *
FROM salesView
WHERE
salesView.sell_id NOT IN
(
SELECT sell_id
FROM salesStates
WHERE
salesStates.aborted = 1
)
) nonCancelledSales
WHERE
nonCancelledSales.seller_id = @SellerId AND
nonCancelledSales.cancelled = 0 AND
nonCancelledSales.void = 0 AND
nonCancelledSales.hasDiscount = 0 AND
nonCancelledSales.dateOfSale <= #ReportDate.dates AND
nonCancelledSales.currentState = (SELECT MAX(hveest.date)
FROM salesStates hveest
WHERE
hveest.sell_id = nonCancelledSales.sell_id AND
hveest.date <= #ReportDate.dates) AND
nonCancelledSales.lastProductDate = (SELECT MAX(hvepro.date)
FROM productHistory hvepro
WHERE
hvepro.sell_id = nonCancelledSales.sell_id AND
hvepro.date <= #ReportDate.dates)
) active
是否完全有必要复制查询?在他得到的第一个中:
COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives
关于第二个:
ISNULL(SUM(nonCancelledSales.paymentValue),0) as active
我想必须有某种方法来重写查询,但我不确定如何。
【问题讨论】:
看起来像是一次全部查询,这可以解释GROUP BY #ReportDate.dates
您还可以通过删除该循环并使用计数表来填充您的日期列表来加快此速度。这可能不是性能方面最糟糕的部分,但是让这个集合基于而不是循环非常容易。这是一篇很棒的文章,解释了计数表以及它们如何替换循环。 sqlservercentral.com/articles/T-SQL/62867
@JamieD77 你是说 group by 是多余的吗?
你没有办法加入表格吗?看起来你会的。例如,您对 salesStates 表所做的选择是针对 salesView 表查询中的每一个结果进行的。直接对列进行这样的查询通常是个坏主意。
@JuanCarlosEduardoRomainaAc,加入表格是什么意思?你的意思是创建一个新表?
【参考方案1】:
如果你使用OUTER APPLY
,你可以组合这些。
想法是:
SELECT . . ., x.actives, x.active
FROM #ReportDate OUTER APPLY
(SELECT COUNT(DISTINCT(nonCancelledSales.num_remito)) as actives,
COALESCE(SUM(nonCancelledSales.paymentValue), 0) as active
. . . -- rest of query here
) x;
在这种情况下,OUTER APPLY
很像 FROM
子句中的相关子查询,可以返回多行。
【讨论】:
那么使用 OUTER APPLY 是重写这个查询的唯一也是最好的方法吗?我认为外部应用是针对表值函数的 @StephenH.Anderson 。 . .这是编写逻辑的最明显方式。子查询只能返回一个值,因此它解决了这个问题。我不明白逻辑,所以可能有一些方法可以更简单地编写它。但是APPLY
实现了所谓的“横向连接”。表值函数只是一种应用。以上是关于查询优化。重复子查询的主要内容,如果未能解决你的问题,请参考以下文章
MySQL5.7性能优化系列——SQL语句优化——使用物化策略优化子查询