为每个提取的记录在其他表上计算聚合 - 性能
Posted
技术标签:
【中文标题】为每个提取的记录在其他表上计算聚合 - 性能【英文标题】:Compute aggregate on other table for each extracted record - performance 【发布时间】:2013-04-12 09:53:45 【问题描述】:我正在使用许多连接进行查询。 对于一个“加入”的表,存在一对多关联,我需要在这个表上聚合(SUM)。
我在每次查询执行时提取了大约 500 到 1000 条记录。它大约需要 100 到 200 毫秒(它有很多连接)。但是在添加聚合后,它会将执行时间增加到大约 5-6 秒!
我尝试了 2 种解决方案(问题在于最后两列:SubchargesTo 和 SubchargesFrom),两者的性能都大幅下降:
第一:
SELECT
RR.Id,
Customer.Name AS Customer,
PrincipalsCustomer.Name AS PrincipalsCustomer,
EffectiveCarrier.Name AS EffectiveCarrier,
CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
RR.containerNo AS ContainerNumber,
CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
Pol.Name AS POL,
Pod.Name AS POD,
Commodity.Name AS Commodity,
RR.Km AS Km,
RR.pricePerKm AS SalesPricePerKM,
RR.salesPrice AS SalesPrice,
RR.purchasePrice AS PurchasePrice
(SELECT SUM(salesAmount*salesCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
(SELECT SUM(incomeAmount*toSBCIncome) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesTo,
(SELECT SUM(costAmount*costCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
(SELECT SUM(costAmount*fromCustomerCost) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesFrom
FROM Report RR
JOIN TruckingReport TR ON TR.REPORT = RR.ID
LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
LEFT JOIN Pol ON TR.POL = Pol.Id
LEFT JOIN Pod ON TR.POD = Pod.Id
LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id
第二个:
SELECT
RR.Id,
Customer.Name AS Customer,
PrincipalsCustomer.Name AS PrincipalsCustomer,
EffectiveCarrier.Name AS EffectiveCarrier,
CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
RR.containerNo AS ContainerNumber,
CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
Pol.Name AS POL,
Pod.Name AS POD,
Commodity.Name AS Commodity,
RR.Km AS Km,
RR.pricePerKm AS SalesPricePerKM,
RR.salesPrice AS SalesPrice,
RR.purchasePrice AS PurchasePrice,
SUBCH1.sales + SUBCH2.sales AS SurchargesTo,
SUBCH1.costs + SUBCH2.costs AS SurchargesFrom
FROM Report RR
JOIN TruckingReport TR ON TR.REPORT = RR.ID
LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
LEFT JOIN Pol ON TR.POL = Pol.Id
LEFT JOIN Pod ON TR.POD = Pod.Id
LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id
LEFT JOIN ( SELECT REPORT, SUM(salesAmount*salesCost) AS sales, SUM(costAmount*costCost) AS costs
FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
)AS SUBCH1 ON SUBCH1.REPORT = RR.ID
LEFT JOIN ( SELECT REPORT, SUM(incomeAmount*toSBCIncome) AS sales, SUM(costAmount*fromCustomerCost) AS costs
FROM TruckingTobaccoSurcharges2 SR2 GROUP BY SR2.REPORT
)AS SUBCH2 ON SUBCH2.REPORT = RR.ID
有没有更快的方法来达到预期的效果? 或者这么多的连接不能真正让它更快?
任何帮助表示赞赏 =]
编辑:
按照 Nikola Markovinović 的建议,在 TruckingTobaccoSurcharges 表的报告 FK 上添加了索引,使其再次变快(使用解决方案 1)! 虽然没有尝试过解决方案2。 我仍然想知道我的查询是否会更好,因为正如其他人所说,我不是加入而是子查询......
【问题讨论】:
第 2 版对我来说似乎很好。两个表中的报表都有索引吗? 嗯。我在索引中还不是这样,但在报告表上展开“索引”后,我有“PK_Report (Clustered)”。TruckingTobaccoSurcharges
和 TruckingTobaccoSurcharges2
需要报告列上的索引。使用sp_helpidex 检查索引。 sp_helpindex 'TruckingTobaccoSurcharges'
将显示此表上的所有索引。如果您在 index_keys 下没有看到 Report
作为起始字段,则需要添加索引:create index ix_TruckingTobaccoSurcharges_Report on TruckingTobaccoSurcharges (Report)
。 TruckingTobaccoSurcharges2 也是如此。如果你需要速度,你必须掌握索引。
根据经验,所有外键都应该被索引,除非外表只有几行。
现在速度非常快!我可以再次以毫秒为单位进行测量。非常感谢=]我一定要了解索引=]
【参考方案1】:
你必须意识到你没有加入! 是的,你正在加入,因为到处都有 LEFT JOIN,但实际上不,这不是你的问题。 您正在 subquerying 检索您的总和,这很糟糕,很糟糕,很糟糕。子查询总是比加入慢。
这是一个子查询,而不是一个连接:
LEFT JOIN
(
SELECT REPORT,
SUM(salesAmount*salesCost) AS sales,
SUM(costAmount*costCost) AS costs
FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
)
因为你是子查询,所以 REPORT 上的索引没有被使用,所以速度很慢。
因此,对于您的问题,以下是应该起作用的方法:(注意,真正加入!!)
SELECT
SUM(s.salesAmount * s.salesCost) + SUM(s2.salesAmount * s2.salesCost) AS SurchargesTo,
SUM(s.costAmount * s.costCost) + SUM(s2.costAmount * s2.fromCustomerCost) AS SurchargesFrom
FROM Report RR
LEFT JOIN TruckingTobaccoSurcharges s
ON s.REPORT = RR.ID
LEFT JOIN TruckingTobaccoSurcharges2 ss
ON s.REPORT = RR.ID
GROUP BY s.REPORT
【讨论】:
此版本可能会在TruckingTobaccoSurcharges
和TruckingTobaccoSurcharges2
之间产生笛卡尔积,除非其中一个表不包含匹配的记录。总计将关闭。
我无法在我的查询中真正注入您的解决方案,因为我不能在一个结果中包含聚合和表列。我应该按所有其他列分组吗?以上是关于为每个提取的记录在其他表上计算聚合 - 性能的主要内容,如果未能解决你的问题,请参考以下文章
查询性能问题 - 对于 select 语句,即使表上有超过 20 万亿条记录的索引