SQL 查询:分区计数因变量或硬编码参数而异

Posted

技术标签:

【中文标题】SQL 查询:分区计数因变量或硬编码参数而异【英文标题】:SQL Query: partition count varies based on variable or hard coded parameter 【发布时间】:2015-10-14 10:54:58 【问题描述】:

我有一个有分区的表。在该表上,我们正在创建一个使用分区依据的视图。当我们通过从变量传递日期上下文来查询视图时,它不使用分区上下文进行查询。请帮我解决这个问题。

以下是创建表、填充数据和创建所需索引的查询:


IF EXISTS(SELECT 1 FROM sys.indexes WHERE name='IX_TRAN_DATE' AND object_id = OBJECT_ID('TEST_TRANSACTION'))
BEGIN
PRINT 'Dropping Index IX_TRAN_DATE on TEST_TRANSACTION'

DROP INDEX IX_TRAN_DATE
    ON TEST_TRANSACTION;

END   
GO

IF EXISTS (SELECT 1 FROM DBO.SYSOBJECTS WHERE ID = OBJECT_ID(N'TEST_TRANSACTION') AND OBJECTPROPERTY(ID, N'ISUSERTABLE') = 1)
AND NOT EXISTS(SELECT * FROM sys.indexes WHERE name='IX_TRAN_DATE' AND object_id = OBJECT_ID('TEST_TRANSACTION'))
BEGIN
PRINT 'Creating Index IX_TRAN_DATE on TEST_TRANSACTION with primary'

CREATE CLUSTERED INDEX [IX_TRAN_DATE] ON [dbo].[TEST_TRANSACTION]
(
    [TRAN_DATE]
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

END   
GO

IF EXISTS(SELECT * FROM sys.indexes WHERE name='IX_TRAN_DATE' AND object_id = OBJECT_ID('TEST_TRANSACTION'))
BEGIN
PRINT 'Dropping Index IX_TRAN_DATE on TEST_TRANSACTION with primary'

DROP INDEX IX_TRAN_DATE
    ON TEST_TRANSACTION;

END   
GO

IF EXISTS (SELECT * FROM sys.partition_schemes WHERE type = 'PS' AND name = 'DATETIME_PS')
    BEGIN
        PRINT 'Dropping Partition Scheme DATETIME_PS'
        DROP PARTITION SCHEME DATETIME_PS
    END
GO

IF EXISTS (SELECT * FROM sys.partition_functions WHERE type = 'R' AND name = 'DATETIME_PF')
    BEGIN        
        PRINT 'Dropping Partition Function DATETIME_PF'
        DROP PARTITION FUNCTION DATETIME_PF
    END

GO

IF EXISTS (SELECT * FROM DBO.SYSOBJECTS WHERE ID = OBJECT_ID(N'TEST_TRANSACTION') AND OBJECTPROPERTY(ID, N'ISUSERTABLE') = 1)
BEGIN
    DROP TABLE TEST_TRANSACTION
END
GO

PRINT 'Creating Partition Function DATETIME_PF'
GO

CREATE PARTITION FUNCTION DATETIME_PF (datetime)
AS RANGE RIGHT FOR VALUES 
(
    '01/01/2000',
    '01/01/2001',
    '01/01/2002',
    '01/01/2003',
    '01/01/2004',
    '01/01/2005',
    '01/01/2006',
    '01/01/2007',
    '01/01/2008',
    '01/01/2009',
    '01/01/2010',
    '01/01/2011',
    '01/01/2012',
    '01/01/2013',
    '01/01/2014',
    '01/01/2015',
    '01/01/2016',
    '01/01/2017',
    '01/01/2018',
    '01/01/2019',
    '01/01/2020',
    '01/01/2021',
    '01/01/2022',
    '01/01/2023',
    '01/01/2024',
    '01/01/2025'
);

PRINT 'Creating Partition Scheme DATETIME_PS'
GO

CREATE PARTITION SCHEME DATETIME_PS
AS PARTITION DATETIME_PF
ALL TO ([PRIMARY]);

GO

CREATE TABLE [dbo].[TEST_TRANSACTION](
    [TEST_TRAN_ID] [bigint] IDENTITY(100000, 1) NOT NULL,
    [TRAN_DATE] [datetime] NOT NULL,
    [CREATED_DATE] [datetime] NOT NULL,
    [CREATED_BY] nvarchar(64) NULL
)
GO

DECLARE @dateVar date 
SET @dateVar = '01/01/2013'

While (YEAR(@dateVar) < 2017)
BEGIN

    INSERT INTO [TEST_TRANSACTION] ([TRAN_DATE], [CREATED_DATE], [CREATED_BY]) 
    VALUES (@dateVar, GETDATE(), 'admin')

    SET @dateVar = DATEADD(DAY, 15, @dateVar)

END
GO

CREATE CLUSTERED INDEX [IX_TRAN_DATE] ON [dbo].[TEST_TRANSACTION]
(
    [TRAN_DATE]
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [DATETIME_PS]([TRAN_DATE])

我在 TEST_TRANSACTION 表上创建以下视图:

IF EXISTS (SELECT * FROM sysobjects WHERE type = 'V' AND name = 'TICKET_SHIPPER_DISTRIBUTION')
    BEGIN
        PRINT 'Dropping View TICKET_SHIPPER_DISTRIBUTION'
        DROP View TICKET_SHIPPER_DISTRIBUTION
    END
GO

PRINT 'Creating View TICKET_SHIPPER_DISTRIBUTION'
GO

CREATE VIEW dbo.TICKET_SHIPPER_DISTRIBUTION
AS

SELECT
    bitr.[TRAN_DATE]
    ,ROW_NUMBER() OVER (
    PARTITION BY 
       bitr.[TRAN_DATE]
    ORDER BY
        bitr.TRAN_DATE,
        bitr.CREATED_DATE ASC
    ) AS ROW_NUM
FROM
    TEST_TRANSACTION bitr

现在,如果我们运行以下查询

-- 查询 1 - 分区数 = 27

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

DECLARE @startDate datetime
DECLARE @endDate datetime
SET @startDate='2015-07-10 00:00:00'
SET @endDate = '2015-08-01 00:00:00'
Select * from TICKET_SHIPPER_DISTRIBUTION where TRAN_DATE >= @startDate and TRAN_DATE <= @endDate

execution plan snapshot for query 1

-- QUERY 2 的分区数 = 1

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
Select * from TICKET_SHIPPER_DISTRIBUTION WHERE TRAN_DATE > '2015-07-10 00:00:00' AND TRAN_DATE < '2015-08-01 00:00:00'

execution plan snapshot for query 2

请帮助我理解为什么查询 1 和 2 的执行存在差异,以及如何使查询 1 跟随分区并仅扫描 1 分区。

【问题讨论】:

【参考方案1】:

我在一个干净的数据库上运行了这个,它为每个查询返回 1 行计数。我正在使用 SQL 2014 开发人员版。

您的 在上面的查询中有所不同,但对结果应该没有影响。

【讨论】:

感谢您的回复。在这两种情况下,行数都是 1,但是如果我们观察执行计划,那么我们会看到分区数在情况 1 中为 27,在情况 2 中为 1【参考方案2】:

我已经想通了。实际上它与表的分区或分区无关。它实际上是关于我们如何传递日期参数。

参考http://www.sqlservercentral.com/Forums/Topic547887-149-1.aspx建议的3个选项,动态查询的选项2帮助了我。

我仍在努力使其在没有动态查询的单个存储过程中工作。任何建议将不胜感激。

【讨论】:

【参考方案3】:

好的,我明白了。

表上有一个聚集索引,但是当您从视图中执行 select * 时,它会将查询转换为聚集索引扫描,这就是它命中每个分区的原因。

如果您按以下方式发出查询,您将获得聚集索引搜索,并且只有一个分区命中,因为您指定了聚集索引中的唯一字段。

SELECT  TRAN_DATE
FROM    TICKET_SHIPPER_DISTRIBUTION
WHERE   TRAN_DATE >= @startDate AND
        TRAN_DATE <= @endDate;

或没有视图

SELECT  bitr.TRAN_DATE
       ,ROW_NUMBER() OVER ( PARTITION BY bitr.TRAN_DATE ORDER BY bitr.TRAN_DATE, bitr.CREATED_DATE ASC ) AS ROW_NUM
FROM    TEST_TRANSACTION bitr
WHERE   TRAN_DATE > @startDate AND
        TRAN_DATE < @endDate;    

您只需要注意如何设计和使用聚集/覆盖索引,就可以了,避免使用动态 SQL。

这里有一些关于索引设计和使用的非常好的文章http://www.sqlskills.com/blogs/kimberly/category/indexes/

【讨论】:

以上是关于SQL 查询:分区计数因变量或硬编码参数而异的主要内容,如果未能解决你的问题,请参考以下文章

Python编码——常见的编码设置

Spark sql 查询导致分区计数膨胀

读书笔记:SQL 查询中的SQL*Plus 替换变量(DEFINE变量)和参数

读书笔记:SQL 查询中的SQL*Plus 替换变量(DEFINE变量)和参数

SQL Server 错误或功能?小数转换

T-SQL 查询分区详细信息和行计数