T-SQL 窗口函数

Posted

技术标签:

【中文标题】T-SQL 窗口函数【英文标题】:T-SQL Window Function 【发布时间】:2017-11-17 16:37:42 【问题描述】:

为了举例,我有以下数据集: ToyReferenceNumberCompanyCompositeCalculatedFieldPermanentRecallDateReturnToStoreDate

基本上,每个项目可能有 3-4 个条目(虚构的公司 TheToyCompany,或 TTC):

TTC-011-0934, TTC, calculation, NULL, 2017-03-01 12:01:01.00
TTC-011-0934, TTC, calculation, NULL, 2014-05-01 12:01:01.00
TTC-011-0934, TTC, calculation, NULL, 2011-08-27 12:01:01.00
TTC-994-0132, TTC, calculation, 2017-06-12 12:01:01.00, NULL
TTC-994-0132, TTC, calculation, NULL, 2017-02-01 12:01:01.00
TTC-354-0122, TTC, calculation, NULL, 2015-03-01 12:01:01.00
TTC-354-0122, TTC, calculation, NULL, NULL

从业务逻辑的角度来看,对于第一个产品 (0934),它进行了多次召回(我们不在乎哪个印刷或生产批次),但每次都被修复并返回商店。

对于 0132,它试图修复缺陷,但随后公司决定报废该产品,因为它无论如何都没有销售。

对于 0122,产品批次于 2015 年 3 月 1 日被召回、修复,然后发送到商店,但当前批次目前正在修复(因此,NULL,NULL)。

管理层需要一份报告,说明当前修复工厂计费(例如修理玩具的人的时间表)。

伪查询:

 For a given product, return only the record with NULL, NULL dates (actively being fixed)
    IF not null, null, return only the record with the PermanentRecallDate
    IF no PermanentRecallDate, return only the record with the latest ReturnToStoreDate

oracle查询与下面的伪代码基本一致:

SELECT <normal columns> 
,MAX(ts.PermanentRecallDate) KEEP (dense_rank last order by ts.PermanentRecallDate NULLS LAST) PERANENTRECALLDATE
,MAX(ts.ReturnToStoreDate) KEEP (dense_rank last order by ts.ReturnToStoreDate NULLS LAST) RETURNTOSTOREDATE

oracle 查询非常简单,但我在 T-SQL 中需要它:

WITH CTE_ToyReferenceExport
AS
(
SELECT ts.ToyReferenceNumber AS TOYNUM
    ,tsh.Company AS COMPANY
    ,MAX(largeSetofCalculations) AS CompositeCalculatedField
    ,MAX(ts.PermanentRecallDate) AS PERMANETRECALL
    ,MAX(ts.ReturnToStoreDate) AS RETURNTOSTORE
    ,DENSE_RANK() OVER (PARTITION BY ts.ToyReferenceNumber ORDER BY ts.PermanentRecallDate) as PRDRank
    ,DENSE_RANK() OVER (PARTITION BY ts.ToyReferenceNumber ORDER BY ts.(ReturnToStoreDate) AS RTSDRank

    FROM origin.ToyStaging ts
    JOIN origin.ToyOrders to ON ts.ordernumer = to.ordno
    JOIN origin.ToyShipment tsh ON to.packno = tsh.crateno
    LEFT JOIN origin.Shippers sh ON to.packno = sh.cratenum AND 'calcField' = sh.originfield
GROUP BY ts.ToyReferenceNumber, tsh.Company, ts.permanentrecalldate, ts.returntostoredate
)

它还有很多内容,但让我大吃一惊的主要是获取由“MAX..Keep Dense_rank last order by.. NULLS LAST”逻辑返回的结果集。

任何帮助将不胜感激。 SQL Server 2012 是版本。

【问题讨论】:

【参考方案1】:

如果我理解正确,如果没有NULL,你想要最大日期,如果至少有一个,你想要NULL。如果这是正确的:

SELECT <normal columns>,
       (CASE WHEN COUNT(*) <> COUNT(ts.PermanentRecallDate) THEN NULL
             ELSE MAX(ts.PermanentRecallDate)
        END) as PERANENTRECALLDATE,
       (CASE WHEN COUNT(*) <> COUNT(ts.ReturnToStoreDate) THEN NULL
             ELSE MAX(ts.ReturnToStoreDate)
        END) as RETURNTOSTOREDATE

【讨论】:

其实给定一个如下分区:TTC-354-0122,TTC,calculation,NULL,2015-03-01 12:01:01.00 TTC-354-0122,TTC,calculation,NULL, NULL 我需要其中包含 NULL、NULL 的行(整行)。如果没有 NULL,NULL 行,我需要具有最新永久召回日期的行,除非该玩具为空,在这种情况下,我需要具有最新 ReturnToStoreDate 的行。【参考方案2】:

这是实现您的要求的一种方法。

WITH Limits AS (
  SELECT DISTINCT ToyReferenceNumber
       , FIRST_VALUE(PermanentRecallDate) 
               OVER (PARTITION BY ToyReferenceNumber
                         ORDER BY CASE WHEN ReturnToStoreDate IS NULL THEN 1 ELSE 2 END
                                , CASE WHEN PermanentRecallDate IS NULL THEN 1 ELSE 2 END
                                , PermanentRecallDate DESC) PermanentRecallDate
       , FIRST_VALUE(ReturnToStoreDate)
               OVER (PARTITION BY ToyReferenceNumber
                         ORDER BY CASE WHEN ReturnToStoreDate IS NULL THEN 1 ELSE 2 END
                                , ReturnToStoreDate DESC) ReturnToStoreDate
    FROM ToyStaging
)
SELECT ts.* 
  FROM ToyStaging TS
  JOIN Limits L
    ON TS.ToyReferenceNumber = L.ToyReferenceNumber
   AND (L.PermanentRecallDate IS NULL 
        and ts.PermanentRecallDate is null
        OR TS.PermanentRecallDate = L.PermanentRecallDate)
   AND (L.ReturnToStoreDate IS NULL
        AND ts.ReturnToStoreDate is null
        OR TS.ReturnToStoreDate = L.ReturnToStoreDate)

Limits CTE 中,如果给定ToyReferenceNumber 存在空空记录,FIRST_VALUE 分析函数都将返回空值。否则PermanentRecallDate 列将返回最新的PermanentRecallDate,并且ReturnToStoreDate 列将在不存在PermanentRecallDate 时返回最新的ReturnToStoreDate,从而实例化关于返回哪个记录的规则。 distinct 剪除重复记录,然后将Limits 连接回ToyStaging 以获得所需的数据。

鉴于您的样本数据,Limits CTE 返回:

| ToyReferenceNumber |    PermanentRecallDate |      ReturnToStoreDate |
|--------------------|------------------------|------------------------|
|       TTC-011-0934 |                 (null) | 2017-03-01 12:01:01.00 |
|       TTC-354-0122 |                 (null) |                 (null) |
|       TTC-994-0132 | 2017-06-12 12:01:01.00 |                 (null) |

整个查询返回:

| ToyReferenceNumber | Company | CompositeCalculatedField |    PermanentRecallDate |      ReturnToStoreDate |
|--------------------|---------|--------------------------|------------------------|------------------------|
|       TTC-011-0934 |     TTC |              calculation |                 (null) | 2017-03-01 12:01:01.00 |
|       TTC-354-0122 |     TTC |              calculation |                 (null) |                 (null) |
|       TTC-994-0132 |     TTC |              calculation | 2017-06-12 12:01:01.00 |                 (null) |

查看此SQL Fiddle 以了解它的实际效果。

【讨论】:

以上是关于T-SQL 窗口函数的主要内容,如果未能解决你的问题,请参考以下文章

具有 CTE 的 T-SQL 窗口函数,使用先前计算的值

窗口聚合函数与分组聚合函数的异同

Microsoft SQL Server 2022 新特性之 T-SQL 语言增强

Microsoft SQL Server 2022 新特性之 T-SQL 语言增强

窗口函数 SELECT - OVER Clause (Transact-SQL)

T-SQL:qualify和window 使用(十七)