如何优化具有大量聚合的查询

Posted

技术标签:

【中文标题】如何优化具有大量聚合的查询【英文标题】:How to Optimize Query with Lots of Aggregates 【发布时间】:2011-07-19 18:47:31 【问题描述】:

如何优化此查询?现在,它的运行速度太慢了~10s。完整详情如下:

SELECT ProjectName, 
       Actuals_YTD, 
       Rem_Forecast, 
       Total_Forecast, 
       Approved_Budget, 
       Variance, 
       Variance_Percentage, 
       ProjectComments, 
       VersionType, 
       ModifiedDate 
FROM (SELECT pd.ProjectId, 
             pd.ProjectName, 
             SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) <= '06/01/2011' THEN feb.USDactualamount ELSE 0.0 END) AS Actuals_YTD, 
             SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) > '06/01/2011' THEN feb.forecastusd ELSE 0.0 END) AS Rem_Forecast, 
             ((SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) <= '06/01/2011' THEN feb.USDactualamount ELSE 0.0 END)) + (SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) > '06/01/2011' then feb.forecastusd else 0.0 end))) AS Total_Forecast, 
             SUM(COALESCE((feb.REVISEDPLANUSD),0)) AS Approved_Budget, 
             ((SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) <= '06/01/2011' THEN feb.USDactualamount ELSE 0.0 END)) + (SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) > '06/01/2011' then feb.forecastusd else 0.0 end))) - ((SUM(COALESCE((feb.REVISEDPLANUSD),0)))) AS Variance, 
             CASE WHEN (SUM(COALESCE((feb.REVISEDPLANUSD),0))) = 0 THEN NULL ELSE ((((((SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(base.ProjectMonth) <= '06/01/2011' THEN feb.USDactualamount else 0.0 end)) + (SUM(CASE WHEN RPD.PROJECTMONTH_TO_DATE(projectmonth) > '06/01/2011' then feb.forecastusd else 0.0 end)))) - (SUM(COALESCE((feb.REVISEDPLANUSD),0)))) / (SUM(COALESCE((feb.REVISEDPLANUSD),0)))) * 100) END AS Variance_Percentage, 
             pd.ProjectAux1, 
             pd.ProjectComments, 
             pd.VersionType, 
             MAX(base.ModifiedDate) AS ModifiedDate 
      FROM rpd.ProjectDetail pd  INNER JOIN rpd.FundSource fs ON pd.FundSourceId = fs.FundSourceId  
                                 INNER JOIN rpd.Baseline base ON pd.ProjectId = base.ProjectId  
                                 INNER JOIN rpd.FundEntityBaseline feb ON feb.BaselineId = base.BaselineId  
      GROUP BY pd.ProjectAux1, pd.ProjectId, pd.ProjectName, pd.ProjectComments, pd.VersionType)
WHERE VersionType Like '%Text%' WITH UR

这是 3 个表的架构(不包括 FundSource,因为它只有大约 200 行,我认为它可以忽略不计)

架构:

行:

基金实体基线:354603 基线:80208 项目详情:1813

ProjectDetail 指标:

1 个主键索引 (ProjectId) 1 个外键索引 (FundSourceId) 1 SELECT/GROUP BY 包含列的索引 (ProjectAux1, ProjectId、ProjectName、ProjectComments、VersionType) 1 索引(版本类型,项目名称)

基线指数:

1 个主键索引 (BaselineId) 1 个外键索引 (ProjectId) 1 索引与 (ProjectTeamId, ProjectMonth) 1 个仅包含 ProjectMonth 的索引

FundEntityBaseline 上的指数

1 个主键索引 (FundEntityBaselineId) 1 个外键索引 (BaselineId)

最新访问计划:

【问题讨论】:

你能显示 PROJECTMONTH_TO_DATE 函数/过程的来源吗? 【参考方案1】:

将 where 子句 (WHERE VersionType Like '%Text%) 移动到一条直线上,使其位于内部 SQL 语句中。现在的方式是,您的查询将首先进行所有可能的连接,然后使用 where 子句过滤该完整集。

所以你的陈述会是这样的

WHERE pd.VersionType Like '%Text%'
GROUP BY .....

【讨论】:

【参考方案2】:

将您的索引放入(=重新创建)页面大小为 32K 的表空间中 - 如果尚未配置的话。

【讨论】:

以上是关于如何优化具有大量聚合的查询的主要内容,如果未能解决你的问题,请参考以下文章

Elasticsearch聚合优化 | 聚合速度提升5倍!

优化具有大量数据的MYSQL查询

Elasticsearch聚合优化 | 聚合速度提升5倍

优化对大型索引对象的 MongoDB 聚合查询

优化 NHibernate 查询

优化 SQL Server 聚合查询