为每个提取的记录在其他表上计算聚合 - 性能

Posted

技术标签:

【中文标题】为每个提取的记录在其他表上计算聚合 - 性能【英文标题】:Compute aggregate on other table for each extracted record - performance 【发布时间】:2013-04-12 09:53:45 【问题描述】:

我正在使用许多连接进行查询。 对于一个“加入”的表,存在一对多关联,我需要在这个表上聚合(SUM)。

我在每次查询执行时提取了大约 500 到 1000 条记录。它大约需要 100 到 200 毫秒(它有很多连接)。但是在添加聚合后,它会将执行时间增加到大约 5-6 秒!

我尝试了 2 种解决方案(问题在于最后两列:SubchargesToSubchargesFrom),两者的性能都大幅下降:

第一:

SELECT
            RR.Id,
            Customer.Name AS Customer,
            PrincipalsCustomer.Name AS PrincipalsCustomer,
            EffectiveCarrier.Name AS EffectiveCarrier,
            CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
            RR.containerNo AS ContainerNumber,
            CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
            Pol.Name AS POL,
            Pod.Name AS POD,
            Commodity.Name AS Commodity,
            RR.Km AS Km,
            RR.pricePerKm AS SalesPricePerKM,
            RR.salesPrice AS SalesPrice,
            RR.purchasePrice AS PurchasePrice
            (SELECT SUM(salesAmount*salesCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
            (SELECT SUM(incomeAmount*toSBCIncome) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesTo,
            (SELECT SUM(costAmount*costCost) FROM TruckingTobaccoSurcharges WHERE TruckingTobaccoSurcharges.REPORT = RR.Id) +
            (SELECT SUM(costAmount*fromCustomerCost) FROM TruckingTobaccoSurcharges2 WHERE TruckingTobaccoSurcharges2.REPORT = RR.Id) as SurchargesFrom
        FROM Report RR
            JOIN TruckingReport TR ON TR.REPORT = RR.ID
            LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
            LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
            LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
            LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
            LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
            LEFT JOIN Pol ON TR.POL = Pol.Id
            LEFT JOIN Pod ON TR.POD = Pod.Id
            LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id

第二个:

SELECT
            RR.Id,
            Customer.Name AS Customer,
            PrincipalsCustomer.Name AS PrincipalsCustomer,
            EffectiveCarrier.Name AS EffectiveCarrier,
            CAST(CASE RR.isImport WHEN 1 THEN RR.unloadingDateStart ELSE TR.loadingDateStart END AS DATE) AS LoadingUnloadingDate,
            RR.containerNo AS ContainerNumber,
            CASE RR.isImport WHEN 1 THEN PLACEUNLOADING.Name ELSE PlaceLoading.Name END AS LoadingUnloadingPlace,
            Pol.Name AS POL,
            Pod.Name AS POD,
            Commodity.Name AS Commodity,
            RR.Km AS Km,
            RR.pricePerKm AS SalesPricePerKM,
            RR.salesPrice AS SalesPrice,
            RR.purchasePrice AS PurchasePrice,
            SUBCH1.sales + SUBCH2.sales AS SurchargesTo,
            SUBCH1.costs + SUBCH2.costs AS SurchargesFrom
        FROM Report RR
            JOIN TruckingReport TR ON TR.REPORT = RR.ID
            LEFT JOIN Customer ON RR.CUSTOMER = Customer.ID
            LEFT JOIN PrincipalsCustomer ON RR.PRINCIPALSCUSTOMER = PrincipalsCustomer.ID
            LEFT JOIN EffectiveCarrier ON RR.EFFECTIVECARRIER = EffectiveCarrier.ID
            LEFT JOIN PlaceLoading ON TR.PLACELOADING = PlaceLoading.ID
            LEFT JOIN PlaceUnloading ON RR.PLACEUNLOADING = PlaceUnloading.ID
            LEFT JOIN Pol ON TR.POL = Pol.Id
            LEFT JOIN Pod ON TR.POD = Pod.Id
            LEFT JOIN Commodity ON RR.COMMODITY = Commodity.Id
            LEFT JOIN ( SELECT REPORT, SUM(salesAmount*salesCost) AS sales, SUM(costAmount*costCost) AS costs
                        FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
                        )AS SUBCH1 ON SUBCH1.REPORT = RR.ID
            LEFT JOIN ( SELECT REPORT, SUM(incomeAmount*toSBCIncome) AS sales, SUM(costAmount*fromCustomerCost) AS costs
                        FROM TruckingTobaccoSurcharges2 SR2 GROUP BY SR2.REPORT
                       )AS SUBCH2 ON SUBCH2.REPORT = RR.ID

有没有更快的方法来达到预期的效果? 或者这么多的连接不能真正让它更快?

任何帮助表示赞赏 =]

编辑:

按照 Nikola Markovinović 的建议,在 TruckingTobaccoSurcharges 表的报告 FK 上添加了索引,使其再次变快(使用解决方案 1)! 虽然没有尝试过解决方案2。 我仍然想知道我的查询是否会更好,因为正如其他人所说,我不是加入而是子查询......

【问题讨论】:

第 2 版对我来说似乎很好。两个表中的报表都有索引吗? 嗯。我在索引中还不是这样,但在报告表上展开“索引”后,我有“PK_Report (Clustered)”。 TruckingTobaccoSurchargesTruckingTobaccoSurcharges2 需要报告列上的索引。使用sp_helpidex 检查索引。 sp_helpindex 'TruckingTobaccoSurcharges' 将显示此表上的所有索引。如果您在 index_keys 下没有看到 Report 作为起始字段,则需要添加索引:create index ix_TruckingTobaccoSurcharges_Report on TruckingTobaccoSurcharges (Report)。 TruckingTobaccoSurcharges2 也是如此。如果你需要速度,你必须掌握索引。 根据经验,所有外键都应该被索引,除非外表只有几行。 现在速度非常快!我可以再次以毫秒为单位进行测量。非常感谢=]我一定要了解索引=] 【参考方案1】:

你必须意识到你没有加入! 是的,你正在加入,因为到处都有 LEFT JOIN,但实际上不,这不是你的问题。 您正在 subquerying 检索您的总和,这很糟糕,很糟糕,很糟糕。子查询总是比加入慢。

这是一个子查询,而不是一个连接:

LEFT JOIN 
( 
    SELECT REPORT, 
           SUM(salesAmount*salesCost) AS sales, 
           SUM(costAmount*costCost) AS costs
    FROM TruckingTobaccoSurcharges SR1 GROUP BY SR1.REPORT
)

因为你是子查询,所以 REPORT 上的索引没有被使用,所以速度很慢。

因此,对于您的问题,以下是应该起作用的方法:(注意,真正加入!!)

SELECT
    SUM(s.salesAmount * s.salesCost) + SUM(s2.salesAmount * s2.salesCost) AS SurchargesTo,
    SUM(s.costAmount * s.costCost) + SUM(s2.costAmount * s2.fromCustomerCost) AS SurchargesFrom
FROM Report RR
    LEFT JOIN TruckingTobaccoSurcharges s 
        ON s.REPORT = RR.ID
    LEFT JOIN TruckingTobaccoSurcharges2 ss
        ON s.REPORT = RR.ID
GROUP BY s.REPORT

【讨论】:

此版本可能会在TruckingTobaccoSurchargesTruckingTobaccoSurcharges2 之间产生笛卡尔积,除非其中一个表不包含匹配的记录。总计将关闭。 我无法在我的查询中真正注入您的解决方案,因为我不能在一个结果中包含聚合和表列。我应该按所有其他列分组吗?

以上是关于为每个提取的记录在其他表上计算聚合 - 性能的主要内容,如果未能解决你的问题,请参考以下文章

SQL:同一表上的多个连接的性能

查询性能问题 - 对于 select 语句,即使表上有超过 20 万亿条记录的索引

在两个索引表上使用组和连接进行单独 LINQ2SQL 选择与一个组合选择的性能

基于主键插入

以天为单位的日期时间差异的高性能计算

Rails 在连接表上加入条件