TSQL 对窗口计数

Posted

技术标签:

【中文标题】TSQL 对窗口计数【英文标题】:TSQL Count Over a Window 【发布时间】:2014-12-12 22:38:13 【问题描述】:

找不到我所需要的问题/答案,也找不到示例。我想使用窗口函数。

我有以下架构,其中包含存储的过程以及这些过程中使用的表和列:

CREATE TABLE [dbo].[ProcedureDependencies](
[DatabaseName] [varchar](256) NOT NULL,
[ProcedureId] [int] NOT NULL,
[ProcedureSchemaName] [varchar](256) NOT NULL,
[ProcedureName] [varchar](256) NOT NULL,
[TableSchemaName] [varchar](256) NOT NULL,
[TableName] [varchar](256) NOT NULL,
[FieldName] [varchar](256) NOT NULL)

我想计算一个表名出现在不同过程中的次数。

我一直在尝试以下变体:

select 
    DatabaseName,
    TableName, 
    count(tablename) over (partition by DatabaseName,ProcedureName) cnt
from ProcedureDependencies
order by cnt desc

但是,我得到了不好的结果。例如,在我想要的下面的脚本中......

databasename    tablename    cnt
db1             tbl1         3
db1             tbl2         1
db1             tbl3         1

但我得到了....

databasename    tablename    cnt
db1             tbl1         3
db1             tbl2         3
db1             tbl3         3
db1             tbl1         1

脚本:

drop table #tmprmd;
create table #tmprmd (
    DatabaseName varchar(max),
    TableName varchar(max), 
    ProcedureName varchar(max), 
    FieldName varchar(max));
Insert Into #tmprmd
Values  ('db1',     'tbl1',     'proc1',    'field1'),
        ('db1',     'tbl1',     'proc1',    'field2'),
        ('db1',     'tbl2',     'proc1',    'field1'),
        ('db1',     'tbl1',     'proc2',    'field1'),
        ('db1',     'tbl3',     'proc1',    'field1'),
        ('db1',     'tbl1',     'proc3',    'field1');
with 
dist as (
    select 
        --distinct
        databasename,
        procedurename,
        tablename
    from #tmprmd--ProcedureDependencies
)
select 
distinct
    DatabaseName,
    TableName, 
    count(tablename) over (partition by DatabaseName,procedurename) cnt
from dist
order by cnt desc

【问题讨论】:

在想要的结果中不应该是 4,1,1 吗? @米海不行,必须是3,1,1 【参考方案1】:

我认为你做的比它需要的更难

drop table #tmprmd;
create table #tmprmd (
    DatabaseName varchar(max),
    TableName varchar(max), 
    ProcedureName varchar(max), 
    FieldName varchar(max));
Insert Into #tmprmd
Values  ('db1',     'tbl1',     'proc1',    'field1'),
        ('db1',     'tbl1',     'proc1',    'field2'),
        ('db1',     'tbl2',     'proc1',    'field1'),
        ('db1',     'tbl3',     'proc1',    'field1'),
        ('db1',     'tbl1',     'proc2',    'field1'),       
        ('db1',     'tbl1',     'proc3',    'field1');
select dist.DatabaseName, dist.TableName, count(distinct(procedurename)) 
from #tmprmd as dist  
group by dist.DatabaseName, dist.TableNameName

【讨论】:

【参考方案2】:
        IF OBJECT_ID('Tempdb..#tmprmd') IS NOT NULL 
            DROP TABLE #tmprmd
        CREATE TABLE #tmprmd
            (
              DatabaseName VARCHAR(MAX) ,
              TableName VARCHAR(MAX) ,
              ProcedureName VARCHAR(MAX) ,
              FieldName VARCHAR(MAX)
            );
        INSERT  INTO #tmprmd
        VALUES  ( 'db1', 'tbl1', 'proc1', 'field1' ),
                ( 'db1', 'tbl1', 'proc1', 'field2' ),
                ( 'db1', 'tbl2', 'proc1', 'field1' ),
                ( 'db1', 'tbl1', 'proc2', 'field1' ),
                ( 'db1', 'tbl3', 'proc1', 'field1' ),
                ( 'db1', 'tbl1', 'proc3', 'field1' );
----------------------------------------------------------
    -- variant 1
        WITH    cte
                  AS ( SELECT DISTINCT
                                T.DatabaseName ,
                                T.TableName ,
                                COUNT(*) OVER ( PARTITION BY T.DatabaseName, T.ProcedureName, T.TableName ) cnt
                       FROM     #tmprmd AS T
                     )
            SELECT  DISTINCT
                    DatabaseName ,
                    TableName ,
                    SUM(cte.cnt) OVER ( PARTITION BY DatabaseName, TableName ) cnt
            FROM    cte
----------------------------------------------------------
    --variant 2
    SELECT DISTINCT dist.DatabaseName,
                    dist.TableName,
                    MAX(cnt) OVER (PARTITION BY dist.DatabaseName, dist.TableName) cnt
    FROM (  SELECT DISTINCT T.DatabaseName,
                            T.TableName,
                            DENSE_RANK() OVER (PARTITION BY T.TableName order by T.ProcedureName ) cnt
            FROM #tmprmd AS T
         ) dist
----------------------------------------------------------
    --variant 3
    SELECT DISTINCT dist.DatabaseName,
                    dist.TableName,
                    COUNT(cnt) OVER (PARTITION BY dist.DatabaseName, dist.TableName) cnt
    FROM (  SELECT DISTINCT T.DatabaseName,
                            T.TableName,
                            DENSE_RANK() OVER (PARTITION BY T.TableName order by T.ProcedureName ) cnt
            FROM #tmprmd AS T
         ) dist
----------------------------------------------------------
    -- Variant 4, without using window function
    SELECT T.DatabaseName,
           T.TableName,
           COUNT(DISTINCT T.ProcedureName ) cnt
    FROM #tmprmd AS T
    GROUP BY T.DatabaseName,T.TableName

【讨论】:

变体 4 是我在您编辑前 13 小时发布的答案。你偷看了吗? @Blam,是的,之前的支持,如果你有异议,我可以从我的帖子中删除它。

以上是关于TSQL 对窗口计数的主要内容,如果未能解决你的问题,请参考以下文章

在 TSQL 中,如何添加一个计数列来计算查询中的行数?

如何使用多索引对时间序列进行滚动窗口计数?

四种限流算法原理

BigQuery:如何执行滚动时间戳窗口组计数,每天产生行

计数窗口函数 MySQL 中每个分区的最大计数

mapreduce计数器