用于查找冗余索引的 T-SQL
Posted
技术标签:
【中文标题】用于查找冗余索引的 T-SQL【英文标题】:T-SQL for finding Redundant Indexes 【发布时间】:2010-07-19 17:59:31 【问题描述】:有人知道可以检测整个数据库的冗余索引的 T-SQL 脚本吗?表中冗余索引的示例如下:
Index 1: 'ColumnA', 'ColumnB', 'ColumnC'
Index 2: 'ColumnA', 'ColumnB'
忽略其他考虑因素,例如列的宽度和覆盖索引,索引 2 将是多余的。
谢谢。
【问题讨论】:
我对 index2 的冗余提出异议。如果 columnC 很宽,则 index2 对于某些查询可能更有效。此外,覆盖索引不会使具有相同列顺序的所有非覆盖索引冗余。 查看此链接。 www.sql-server-performance.com. 【参考方案1】:有些情况下冗余不成立。例如,假设ColumnC
是一个巨大的字段,但有时您必须快速检索它。您的 index 1
不需要密钥查找:
select ColumnC from YourTable where ColumnnA = 12
另一方面,index 2
小得多,因此可以在内存中读取需要索引扫描的查询:
select * from YourTable where ColumnnA like '%hello%'
所以它们并不是真的多余。
如果您不相信我的上述论点,您可以找到“冗余”索引,例如:
;with ind as (
select a.object_id
, a.index_id
, cast(col_list.list as varchar(max)) as list
from (
select distinct object_id
, index_id
from sys.index_columns
) a
cross apply
(
select cast(column_id as varchar(16)) + ',' as [text()]
from sys.index_columns b
where a.object_id = b.object_id
and a.index_id = b.index_id
for xml path(''), type
) col_list (list)
)
select object_name(a.object_id) as TableName
, asi.name as FatherIndex
, bsi.name as RedundantIndex
from ind a
join sys.sysindexes asi
on asi.id = a.object_id
and asi.indid = a.index_id
join ind b
on a.object_id = b.object_id
and a.object_id = b.object_id
and len(a.list) > len(b.list)
and left(a.list, LEN(b.list)) = b.list
join sys.sysindexes bsi
on bsi.id = b.object_id
and bsi.indid = b.index_id
为您的用户带来蛋糕,以防性能“意外”下降:-)
【讨论】:
谢谢 - 但我不感兴趣,您认为索引 2 是否与索引 1 是多余的。我对 T-SQL 脚本感兴趣,它让我知道一个索引可能与另一个索引冗余。 问题不在于我是否相信你。我的问题不是关于什么构成冗余索引。如果更多的人只回答这个问题,就像发布的那样,而不是试图回答一个没有被问到的问题,那就太好了。我只是想要一个查询,让我知道哪些索引可能是多余的,然后我们将评估每个索引以决定采取什么行动。顺便说一句 - 很好的查询! 哇.. +1'd Andomar's answer for have to deal with this approach ...我当然感谢那些比我了解更多的人花时间指出的一个方面我的问题我可能没有考虑或意识到。谢谢@Andomar【参考方案2】:受Paul Nielsen的启发,我写了这个查询来查找/区分:
重复(忽略包含顺序) 冗余(不同的包含列) 重叠(不同的索引列)并记录他们的使用情况
(可能还想使用is_descending_key
,但我不需要它。)
WITH IndexColumns AS
(
SELECT I.object_id AS TableObjectId, OBJECT_SCHEMA_NAME(I.object_id) + '.' + OBJECT_NAME(I.object_id) AS TableName, I.index_id AS IndexId, I.name AS IndexName
, (IndexUsage.user_seeks + IndexUsage.user_scans + IndexUsage.user_lookups) AS IndexUsage
, IndexUsage.user_updates AS IndexUpdates
, (SELECT CASE is_included_column WHEN 1 THEN NULL ELSE column_id END AS [data()]
FROM sys.index_columns AS IndexColumns
WHERE IndexColumns.object_id = I.object_id
AND IndexColumns.index_id = I.index_id
ORDER BY index_column_id, column_id
FOR XML PATH('')
) AS ConcIndexColumnNrs
,(SELECT CASE is_included_column WHEN 1 THEN NULL ELSE COL_NAME(I.object_id, column_id) END AS [data()]
FROM sys.index_columns AS IndexColumns
WHERE IndexColumns.object_id = I.object_id
AND IndexColumns.index_id = I.index_id
ORDER BY index_column_id, column_id
FOR XML PATH('')
) AS ConcIndexColumnNames
,(SELECT CASE is_included_column WHEN 1 THEN column_id ELSE NULL END AS [data()]
FROM sys.index_columns AS IndexColumns
WHERE IndexColumns.object_id = I.object_id
AND IndexColumns.index_id = I.index_id
ORDER BY column_id
FOR XML PATH('')
) AS ConcIncludeColumnNrs
,(SELECT CASE is_included_column WHEN 1 THEN COL_NAME(I.object_id, column_id) ELSE NULL END AS [data()]
FROM sys.index_columns AS IndexColumns
WHERE IndexColumns.object_id = I.object_id
AND IndexColumns.index_id = I.index_id
ORDER BY column_id
FOR XML PATH('')
) AS ConcIncludeColumnNames
FROM sys.indexes AS I
LEFT OUTER JOIN sys.dm_db_index_usage_stats AS IndexUsage
ON IndexUsage.object_id = I.object_id
AND IndexUsage.index_id = I.index_id
AND IndexUsage.Database_id = db_id()
)
SELECT
C1.TableName
, C1.IndexName AS 'Index1'
, C2.IndexName AS 'Index2'
, CASE WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) AND (C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs) THEN 'Exact duplicate'
WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) THEN 'Different includes'
ELSE 'Overlapping columns' END
-- , C1.ConcIndexColumnNrs
-- , C2.ConcIndexColumnNrs
, C1.ConcIndexColumnNames
, C2.ConcIndexColumnNames
-- , C1.ConcIncludeColumnNrs
-- , C2.ConcIncludeColumnNrs
, C1.ConcIncludeColumnNames
, C2.ConcIncludeColumnNames
, C1.IndexUsage
, C2.IndexUsage
, C1.IndexUpdates
, C2.IndexUpdates
, 'DROP INDEX ' + C2.IndexName + ' ON ' + C2.TableName AS Drop2
, 'DROP INDEX ' + C1.IndexName + ' ON ' + C1.TableName AS Drop1
FROM IndexColumns AS C1
INNER JOIN IndexColumns AS C2
ON (C1.TableObjectId = C2.TableObjectId)
AND (
-- exact: show lower IndexId as 1
(C1.IndexId < C2.IndexId
AND C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
AND C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs)
-- different includes: show longer include as 1
OR (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
AND LEN(C1.ConcIncludeColumnNrs) > LEN(C2.ConcIncludeColumnNrs))
-- overlapping: show longer index as 1
OR (C1.IndexId <> C2.IndexId
AND C1.ConcIndexColumnNrs <> C2.ConcIndexColumnNrs
AND C1.ConcIndexColumnNrs like C2.ConcIndexColumnNrs + ' %')
)
ORDER BY C1.TableName, C1.ConcIndexColumnNrs
【讨论】:
先生,您应该获得奖牌! ?【参考方案3】:我创建了以下查询,它为我提供了很多很好的信息来识别重复和接近重复的索引。它还包括其他信息,例如索引占用多少内存页,这使我可以为更大的索引提供更高的优先级。它显示了对哪些列进行了索引以及包含了哪些列,因此我可以查看是否有两个索引几乎相同,而包含的列只有轻微的变化。
WITH IndexSummary AS
(
SELECT DISTINCT sys.objects.name AS [Table Name],
sys.indexes.name AS [Index Name],
SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]
FROM sys.columns
INNER JOIN sys.index_columns
ON sys.index_columns.column_id = sys.columns.column_id
AND sys.index_columns.object_id = sys.columns.object_id
WHERE sys.index_columns.index_id = sys.indexes.index_id
AND sys.index_columns.object_id = sys.indexes.object_id
AND sys.index_columns.is_included_column = 0
ORDER BY sys.columns.name
FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
ISNULL(SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]
FROM sys.columns
INNER JOIN sys.index_columns
ON sys.index_columns.column_id = sys.columns.column_id
AND sys.index_columns.object_id = sys.columns.object_id
WHERE sys.index_columns.index_id = sys.indexes.index_id
AND sys.index_columns.object_id = sys.indexes.object_id
AND sys.index_columns.is_included_column = 1
ORDER BY sys.columns.name
FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
INNER JOIN SYS.index_columns
ON sys.indexes.index_id = SYS.index_columns.index_id
AND sys.indexes.object_id = sys.index_columns.object_id
INNER JOIN sys.objects
ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)
SELECT IndexSummary.[Table Name],
IndexSummary.[Index Name],
IndexSummary.[Indexed Column Names],
IndexSummary.[Included Column Names],
PhysicalStats.page_count as [Page Count],
CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
AS PhysicalStats
ON PhysicalStats.index_id = IndexSummary.index_id
AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
FROM IndexSummary Summary2
WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
AND Summary2.[Indexed Column Names] = IndexSummary.[Indexed Column Names]) > 1
ORDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]
查询结果如下所示:
Table Name Index Indexed Cols Included Cols Pages Size (MB) Frag %
My_Table Indx_1 Col1 Col2, Col3 123 0.96 8.94
My_Table Indx_2 Col1 Col2, Col3 123 0.96 8.94
完整说明
完整的解释见Identifying Duplicate or Redundant Indexes in SQL Server。
【讨论】:
【参考方案4】:尝试下面的脚本来显示未使用的索引,希望对您有所帮助
/****************************************************************
Description: Script to show Unused Indexes using DMVs
****************************************************************/
SELECT TOP 100
o.name AS ObjectName
, i.name AS IndexName
, i.index_id AS IndexID
, dm_ius.user_seeks AS UserSeek
, dm_ius.user_scans AS UserScans
, dm_ius.user_lookups AS UserLookups
, dm_ius.user_updates AS UserUpdates
, p.TableRows
, 'DROP INDEX ' + QUOTENAME(i.name)
+ ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.object_id)) as 'drop statement'
FROM sys.dm_db_index_usage_stats dm_ius
INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id AND dm_ius.object_id = i.object_id
INNER JOIN sys.objects o on dm_ius.object_id = o.object_id
INNER JOIN sys.schemas s on o.schema_id = s.schema_id
INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.object_id
FROM sys.partitions p GROUP BY p.index_id, p.object_id) p
ON p.index_id = dm_ius.index_id AND dm_ius.object_id = p.object_id
WHERE OBJECTPROPERTY(dm_ius.object_id,'IsUserTable') = 1
AND dm_ius.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC
GO
【讨论】:
【参考方案5】:我只是在阅读一些 MSDN 博客,注意到 script to do this 并记住了这个问题。
我没有费心将它与 Andomar's 并排测试,看看其中一个是否比另一个有什么特别的好处。
我可能会对两者进行修改,但在评估冗余时考虑两个索引的大小。
编辑:
另请参阅 Kimberley Tripp 在Removing duplicate indexes 上的帖子
【讨论】:
以上是关于用于查找冗余索引的 T-SQL的主要内容,如果未能解决你的问题,请参考以下文章