为每个提取的记录在其他表上计算聚合 - 性能

Posted 2023-03-24

技术标签:

【中文标题】为每个提取的记录在其他表上计算聚合 - 性能【英文标题】：Compute aggregate on other table for each extracted record - performance 【发布时间】：2013-04-12 09:53:45 【问题描述】：

我正在使用许多连接进行查询。对于一个“加入”的表，存在一对多关联，我需要在这个表上聚合（SUM）。

我在每次查询执行时提取了大约 500 到 1000 条记录。它大约需要 100 到 200 毫秒（它有很多连接）。但是在添加聚合后，它会将执行时间增加到大约 5-6 秒！

我尝试了 2 种解决方案（问题在于最后两列：SubchargesTo 和 SubchargesFrom），两者的性能都大幅下降：

第一：

SELECT
            RR.Id,
            Customer.Name AS Customer,
            PrincipalsCustomer.Name AS PrincipalsCustomer,
            EffectiveCarrier.Name AS EffectiveCarrier,
            CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
            RR.containerNo AS ContainerNumber,
            CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
            Pol.Name AS POL,
            Pod.Name AS POD,
            Commodity.Name AS Commodity,
            RR.Km AS Km,
            RR.pricePerKm AS SalesPricePerKM,
            RR.salesPrice AS SalesPrice,
            RR.purchasePrice AS PurchasePrice
            (SELECT SUM(salesAmount*salesCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
            (SELECT SUM(incomeAmount*toSBCIncome) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesTo,
            (SELECT SUM(costAmount*costCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
            (SELECT SUM(costAmount*fromCustomerCost) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesFrom
        FROM Report RR
            JOIN TruckingReport TR ON TR.REPORT = RR.ID
            LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
            LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
            LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
            LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
            LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
            LEFT JOIN Pol ON TR.POL = Pol.Id
            LEFT JOIN Pod ON TR.POD = Pod.Id
            LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id

第二个：

SELECT
            RR.Id,
            Customer.Name AS Customer,
            PrincipalsCustomer.Name AS PrincipalsCustomer,
            EffectiveCarrier.Name AS EffectiveCarrier,
            CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
            RR.containerNo AS ContainerNumber,
            CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
            Pol.Name AS POL,
            Pod.Name AS POD,
            Commodity.Name AS Commodity,
            RR.Km AS Km,
            RR.pricePerKm AS SalesPricePerKM,
            RR.salesPrice AS SalesPrice,
            RR.purchasePrice AS PurchasePrice,
            SUBCH1.sales + SUBCH2.sales AS SurchargesTo,
            SUBCH1.costs + SUBCH2.costs AS SurchargesFrom
        FROM Report RR
            JOIN TruckingReport TR ON TR.REPORT = RR.ID
            LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
            LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
            LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
            LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
            LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
            LEFT JOIN Pol ON TR.POL = Pol.Id
            LEFT JOIN Pod ON TR.POD = Pod.Id
            LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id
            LEFT JOIN ( SELECT REPORT, SUM(salesAmount*salesCost) AS sales, SUM(costAmount*costCost) AS costs
                        FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
                        )AS SUBCH1 ON SUBCH1.REPORT = RR.ID
            LEFT JOIN ( SELECT REPORT, SUM(incomeAmount*toSBCIncome) AS sales, SUM(costAmount*fromCustomerCost) AS costs
                        FROM TruckingTobaccoSurcharges2 SR2 GROUP BY SR2.REPORT
                       )AS SUBCH2 ON SUBCH2.REPORT = RR.ID

有没有更快的方法来达到预期的效果？或者这么多的连接不能真正让它更快？

任何帮助表示赞赏 =]

编辑：

按照 Nikola Markovinović 的建议，在 TruckingTobaccoSurcharges 表的报告 FK 上添加了索引，使其再次变快（使用解决方案 1）！虽然没有尝试过解决方案2。我仍然想知道我的查询是否会更好，因为正如其他人所说，我不是加入而是子查询......

【问题讨论】：

第 2 版对我来说似乎很好。两个表中的报表都有索引吗？嗯。我在索引中还不是这样，但在报告表上展开“索引”后，我有“PK_Report (Clustered)”。 TruckingTobaccoSurcharges 和 TruckingTobaccoSurcharges2 需要报告列上的索引。使用sp_helpidex 检查索引。 sp_helpindex 'TruckingTobaccoSurcharges' 将显示此表上的所有索引。如果您在 index_keys 下没有看到 Report 作为起始字段，则需要添加索引：create index ix_TruckingTobaccoSurcharges_Report on TruckingTobaccoSurcharges (Report)。 TruckingTobaccoSurcharges2 也是如此。如果你需要速度，你必须掌握索引。根据经验，所有外键都应该被索引，除非外表只有几行。现在速度非常快！我可以再次以毫秒为单位进行测量。非常感谢=]我一定要了解索引=] 【参考方案1】：

你必须意识到你没有加入！是的，你正在加入，因为到处都有 LEFT JOIN，但实际上不，这不是你的问题。您正在 subquerying 检索您的总和，这很糟糕，很糟糕，很糟糕。子查询总是比加入慢。

这是一个子查询，而不是一个连接：

LEFT JOIN 
( 
    SELECT REPORT, 
           SUM(salesAmount*salesCost) AS sales, 
           SUM(costAmount*costCost) AS costs
    FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
)

因为你是子查询，所以 REPORT 上的索引没有被使用，所以速度很慢。

因此，对于您的问题，以下是应该起作用的方法：（注意，真正加入！！）

SELECT
    SUM(s.salesAmount * s.salesCost) + SUM(s2.salesAmount * s2.salesCost) AS SurchargesTo,
    SUM(s.costAmount * s.costCost) + SUM(s2.costAmount * s2.fromCustomerCost) AS SurchargesFrom
FROM Report RR
    LEFT JOIN TruckingTobaccoSurcharges s 
        ON s.REPORT = RR.ID
    LEFT JOIN TruckingTobaccoSurcharges2 ss
        ON s.REPORT = RR.ID
GROUP BY s.REPORT

【讨论】：

此版本可能会在TruckingTobaccoSurcharges 和TruckingTobaccoSurcharges2 之间产生笛卡尔积，除非其中一个表不包含匹配的记录。总计将关闭。我无法在我的查询中真正注入您的解决方案，因为我不能在一个结果中包含聚合和表列。我应该按所有其他列分组吗？

以上是关于为每个提取的记录在其他表上计算聚合 - 性能的主要内容，如果未能解决你的问题，请参考以下文章