使用(递归?)CTE + 窗口函数将销售订单归零?

Posted

技术标签:

【中文标题】使用(递归?)CTE + 窗口函数将销售订单归零?【英文标题】:Using a (Recursive?) CTE + Window Functions to zero out sales orders? 【发布时间】:2017-03-11 08:30:23 【问题描述】:

我正在尝试使用递归 CTE + 窗口函数来查找一系列买/卖订单的最后结果。

首先,这里有一些术语:

field_id 是商店的 ID。 field_number 是订单号,但同一个人可以重复使用 Field_date 是初始订单的日期。 Field_inserted 是发生此特定事务的时间。 Field_sale 是我们购买还是退货。

不幸的是,由于系统的工作方式,我无法在退货时获得成本,因此确定订单的最后结果很复杂(我们最终是否出售了任何结果)。我需要将购买与销售相匹配,这通常效果很好。但是,在以下情况下它会失败,我试图找到一种方法来一次性完成,可能使用递归 CTE。

这里有一些代码。

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES 
(1, 100, '20170311','20170311 01:00:00', 'Buy'), 
(1, 100, '20170311','20170311 01:01:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:01', 'buy'),
(2, 100, '20170311','20170311 01:03:00', 'REtu'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');

现在删除随后退回的购买。 ISNULL 是因为我是 NOT IN 将忽略所有 _lead/_lag 值为 NULL 的行。

WITH cte AS 
(SELECT 
        ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
        field_id,
        field_number, 
        field_date,
        field_sale,
       lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,      
       lag(field_sale)  OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag    
FROM   @tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy'  AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
  AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') =  'buy' )--AND field_sale_lag  IS NOT NULL)

我觉得很自鸣得意,以为我拥有它。但是,这是简单的情况。买,退货,买,退货。我们再试试另外一种情况,Buy Buy Return Return,它仍然有效,但显然会导致净值为 0..

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES 
(1, 100, '20170311','20170311 01:00:00', 'Buy'), 
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');


WITH cte AS 
(SELECT 
        ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
        field_id,
        field_number, 
        field_date,
        field_sale,
       lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,      
       lag(field_sale)  OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag    
FROM   @tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy'  AND ISNULL(field_sale_lead,'') = 'sell')--AND field_sale_lead IS NOT null)
  AND NOT (cte.field_sale = 'sell' AND ISNULL(field_sale_lag,'') =  'buy' )--AND field_sale_lag  IS NOT NULL)

但是,当您这样做时,您会意识到它找到了直接匹配项,但现在仍然存在买入/退货对,我想取消它。

在这一点上我被卡住了。我以前做过递归 CTE,但无论出于何种原因,我都无法弄清楚如何递归并使其抵消 1/1/100 和 4/1/100。我所能做的就是让它在递归中窒息。

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES 
(1, 100, '20170311','20170311 01:00:00', 'Buy'), 
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');

WITH cte AS 
(SELECT 
        ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
        field_id,
        field_number, 
        field_date,
        field_sale,
        field_inserted,
       lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,      
       lag(field_sale)  OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag    
FROM   @tablea
--) 
--SELECT * FROM cte
--WHERE NOT (cte.field_sale = 'Buy'  AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
--AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') =  'buy' )--AND field_sale_lag  IS NOT NULL)

UNION ALL
SELECT 
        ROW_NUMBER() OVER (PARTITION BY  cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS row_num,
        cte.field_id,
        cte.field_number, 
        cte.field_date,
        cte.field_sale,
        cte.field_inserted,
       lead(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lead,      
       lag(cte.field_sale)  OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lag    
FROM   @tablea INNER JOIN cte ON cte.field_date = [@tablea].field_date AND cte.field_id = [@tablea].field_id AND cte.field_number = [@tablea].field_number
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy'  AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
  AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') =  'buy' )--AND field_sale_lag  IS NOT NULL)

【问题讨论】:

如果序列是(Buy Buy Buy Return Return),哪些Buys必须去掉? @serg 好问题。我认为这将是最后两个 【参考方案1】:

我们可以通过使用common table expression 和row_number() 来解决这个没有循环或递归,如下所示:

如果我正确理解您的问题,您希望删除已退回的销售 ,并且对于每个'retu',它应该删除最近的'buy'

首先,我们将使用row_number()id 添加到我们的行集中,这样我们就可以唯一地标识我们的行。

接下来,我们添加br_rn(Buy/Return RowNumber 的缩写)被field_id, field_number, field_date 分区,但我们将同时添加 field_sale 到分区中;我们将通过field_inserted desc 订购。 这将让我们将每个 'retu' 与最近的 'buy' 匹配,一旦我们能够做到这一点,我们就可以消除所有带有 not exists() 的对:

;with cte as (
  select 
      id = row_number() over (
        order by field_id, field_number, field_date, field_inserted asc
        ) 
    , field_id
    , field_number
    , field_date 
    , field_inserted 
    , field_sale
    , br_rn = row_number() over (
        partition by field_id, field_number, field_date, field_sale
        order by field_inserted desc
        ) 
  from @tablea
)
select 
    id 
  , field_number
  , field_date
  , field_inserted
  , field_sale
from cte
where not exists (
  select 1
  from cte as i
  where i.field_id = cte.field_id
    and i.field_number = cte.field_number
    and i.field_date = cte.field_date
    and i.br_rn = cte.br_rn
    and i.id <> cte.id
    )
order by id

rextester 演示:http://rextester.com/TKXOC61533

对于这个输入:

  (1, 100, '20170311','20170311 01:00:00', 'Buy') 
, (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Retu')
, (1, 100, '20170311','20170311 01:03:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');

返回:

+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date |   field_inserted    | field_sale |
+----+----------+--------------+------------+---------------------+------------+
|  5 |        1 |          110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
|  6 |        2 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
+----+----------+--------------+------------+---------------------+------------+

对于这个输入:

  (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:03:00', 'Buy') 
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Buy') 
, (1, 100, '20170311','20170311 01:06:00', 'Retu')
, (1, 100, '20170311','20170311 01:07:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');

返回:

+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date |   field_inserted    | field_sale |
+----+----------+--------------+------------+---------------------+------------+
|  1 |        1 |          100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy        |
|  8 |        1 |          110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
|  9 |        2 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
+----+----------+--------------+------------+---------------------+------------+

对于这个输入:

  (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Retu')
, (1, 100, '20170312','20170311 01:06:00', 'Buy')
, (1, 100, '20170312','20170311 01:07:00', 'Buy')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy')

返回:

+----+----------+--------------+------------+---------------------+------------+
| id | field_id | field_number | field_date |   field_inserted    | field_sale |
+----+----------+--------------+------------+---------------------+------------+
|  5 |        1 |          100 | 2017-03-12 | 2017-03-11 01:06:00 | Buy        |
|  6 |        1 |          100 | 2017-03-12 | 2017-03-11 01:07:00 | Buy        |
|  7 |        1 |          110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
|  8 |        2 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
+----+----------+--------------+------------+---------------------+------------+

这可能有助于说明我们在消除任何对之前查看 cte 返回的内容。

在过滤之前只查看需要过滤的集合:

+----+----------+--------------+------------+---------------------+------------+-------+
| id | field_id | field_number | field_date |   field_inserted    | field_sale | br_rn |
+----+----------+--------------+------------+---------------------+------------+-------+
|  1 |        1 |          100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy        |     4 |
|  2 |        1 |          100 | 2017-03-11 | 2017-03-11 01:02:00 | Buy        |     3 |
|  3 |        1 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |     2 |
|  4 |        1 |          100 | 2017-03-11 | 2017-03-11 01:04:00 | Retu       |     3 |
|  5 |        1 |          100 | 2017-03-11 | 2017-03-11 01:05:00 | Buy        |     1 |
|  6 |        1 |          100 | 2017-03-11 | 2017-03-11 01:06:00 | Retu       |     2 |
|  7 |        1 |          100 | 2017-03-11 | 2017-03-11 01:07:00 | Retu       |     1 |
+----+----------+--------------+------------+---------------------+------------+-------+

这样看,我们可以很容易地看到'buy' 订单id 1 有一个br_rn4 并且没有关联的'retu'

【讨论】:

我发现了一个奇怪的情况,它不起作用。现在写详细信息。 不确定为什么这组不能正常工作。我希望它会在 20170312 年 1 月 100 日给出购买,但只有最后两场演出。现在重读解释。 (1, 100, '20170311','20170311 01:01:00', 'Buy'), (1, 100, '20170311','20170311 01:02:00', 'Buy'), (1, 100, '20170311','20170311 01:04:00', 'Retu'), (1, 100, '20170311','20170311 01:05:00', 'Retu'), (1, 100, '20170312','20170311 01:06:00', 'Buy'), (1, 100, '20170312','20170311 01:07:00', 'Buy'), (2, 100, '20170311','20170311 01:03:00', 'Buy'), (1, 110, '20170311','20170311 01:03:00', 'Buy') 此解决方案不考虑配对购买/退货的field_inserted。我的猜测是根据field_inserted,退货必须与之前的购买配对,因为您无法退回尚未购买的东西。 @mbourgon 我已经更新了答案以纠正我的疏忽。我最初未能在not exists() 中包含and i.field_date = cte.field_date,即使它在partition by 中。我还使用您的附加数据集更新了 rextester 演示,并将结果包含在答案中。 有道理,这就解决了问题!如果你去 NTSSUG 或 DFW SQLSat,请告诉我,我会在下一次聚会后给你买饮料或其他东西!【参考方案2】:

我可以建议在可能的情况下删除成对的顺序购买/退货。试试

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES 
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'), 
(1, 100, '20170311','20170311 01:03:00', 'Buy'), 
(1, 100, '20170311','20170311 01:04:00', 'Retu'),
(1, 100, '20170311','20170311 01:05:00', 'Buy'), 
(1, 100, '20170311','20170311 01:06:00', 'Retu'),
(1, 100, '20170311','20170311 01:07:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');

select * from @tablea
order by field_id,
        field_number, 
        field_inserted 

declare @eoj int =1;
while @eoj > 0
begin
    WITH cte AS 
    (
        SELECT 
            case field_sale when 'Buy' then 
                  lead (field_sale)  OVER (PARTITION BY field_id, field_number  ORDER BY field_inserted)
                  when 'Retu' then 
                  lag (field_sale)  OVER (PARTITION BY field_id, field_number  ORDER BY field_inserted)
                  end nbr_type,
            field_id,
            field_number, 
            field_date,
            field_sale,
            field_inserted 
    FROM   @tablea 
    ) 
    delete  
    from cte 
    where nbr_type is not null and nbr_type <> field_sale;
    set @eoj = @@rowcount;
    -- check it
    select * from @tablea
    order by field_id,
            field_number, 
            field_inserted; 
end;

它将重复 N+1 次,其中 N 是最长返回序列的长度。在上面的例子中 N=2。

【讨论】:

啊!我没有考虑过这样做会删除它。感谢您的关注。

以上是关于使用(递归?)CTE + 窗口函数将销售订单归零?的主要内容,如果未能解决你的问题,请参考以下文章

使用递归 CTE 计算预测平均值

窗口/分析函数

H2 数据库:在 CTE 中使用窗口函数时出错

重写查询以使用除 CTE 和子查询之外的窗口函数

具有 CTE 的 T-SQL 窗口函数,使用先前计算的值

将 SQL Server 中的递归 CTE 转换为 netezza