Azure SQL、聚集列存储索引、“TOP”性能

Posted 2023-04-14

技术标签:

【中文标题】Azure SQL、聚集列存储索引、“TOP”性能【英文标题】：Azure SQL, Clustered Columnstore Index, "TOP" performance 【发布时间】：2015-12-16 14:24:38 【问题描述】：

我有一个关于在 SQL Azure 上将 Top 与具有聚簇索引的表一起使用的问题。

两个表都有聚集列存储索引，表 HeaderTable 有 300K 行，表 ValuesTable 有 650 万行。

-- with no "Top"
--responce after 2 sec
declare @Date datetime  = getdate()
select  zp.idCol1, Value1, zp.idcol2 from [HeaderTable] zp 
    inner join [dbo].[ValuesTable] zpp 
        on zp.idcol2 = zpp.idcol2
            where zp.Date > @Date-30 and zp.Date < @Date-10 and zp.idCol1>0 and zpp.Value2 = 'SZT'
                    order by idcol2
go 

-- with  "Top 100"  
--responce after 27 sec
declare @Date datetime  = getdate()
select top 100 zp.idCol1, Value1, zp.idcol2 from [HeaderTable] zp
    inner join [dbo].[ValuesTable] zpp 
        on zp.idcol2 = zpp.idcol2
            where zp.Date > @Date-30 and zp.Date < @Date-10 and zp.idCol1>0 and zpp.Value2 = 'SZT'
                    order by idcol2

go 

-- Result into Temporary Table and Select top 100  from Temporaty Table 
-- responce after  2 sec

declare @Date datetime  = getdate()
select  zp.idCol1, Value1, zp.idcol2 into #d  from [HeaderTable] zp 
    inner join [dbo].[ValuesTable] zpp
        on zp.idcol2 = zpp.idcol2
            where zp.Date > @Date-30 and zp.Date < @Date-10 and zp.idCol1>0 and zpp.Value2 = 'SZT'

select top 100 * from #d order by #d.idcol2
drop table #d
go

如您所见，第二个查询中的顶部操作非常慢。也许有人对这个问题有一些提示？

【问题讨论】：

您不能指向查询计划的特定元素并让其负责。您几乎声称 TOP 总是很慢，这不是真的。将实际执行计划以 XML 形式发布到某处。计划如下：link 【参考方案1】：

这已在 Azure 上新的（兼容性级别 130，兼容性级别 130 目前支持预览版，尚未普遍提供）数据库的增强中进行了优化。

ALTER DATABASE <dbname> SET COMPATIBILITY_LEVEL = 130

与众不同。

【讨论】：

【参考方案2】：

第二个执行计划令人震惊。 SQL Server 通过将列存储缓冲到行存储临时表中来破坏列存储的所有优势...这是查询优化器的质量问题，因为这种策略在任何情况下都没有意义。

尝试让 SQL Server 相信 TOP 什么都不做：

DECLARE @top BIGINT = 100;
SELECT TOP (@top) ...
OPTION (OPTIMIZE FOR (@top = 100000000000000000000000000000000));

【讨论】：

感谢您的解决方案，这很有帮助。它运行 2 秒。

以上是关于Azure SQL、聚集列存储索引、“TOP”性能的主要内容，如果未能解决你的问题，请参考以下文章