Oracle 排序间隙和孤岛查询

Posted

技术标签:

【中文标题】Oracle 排序间隙和孤岛查询【英文标题】:Oracle Sort of gaps and island query 【发布时间】:2021-05-31 18:29:26 【问题描述】:

让我展示数据和我想要实现的目标,而不是写长句子和段落:

create table ssb_price (itm_no varchar2(10), price number, price_code varchar2(10), valid_from_dt date, valid_to_dt date);

insert into ssb_price values ('A001', 83, 'AB', '01-JAN-21', '05-JAN-21');
insert into ssb_price values ('A001', 83, 'AB', '06-JAN-21', '12-JAN-21');
insert into ssb_price values ('A001', 98, 'SPQ', '13-JAN-21', '17-JAN-21');
insert into ssb_price values ('A001', 83, 'AB', '19-JAN-21', '24-JAN-21');
insert into ssb_price values ('A001', 83, 'DE', '25-JAN-21', '30-JAN-21');
insert into ssb_price values ('A001', 83, 'DE', '31-JAN-21', '04-FEB-21');
insert into ssb_price values ('A001', 77, 'XY', '07-FEB-21', '12-FEB-21');
insert into ssb_price values ('A001', 77, 'XY', '15-FEB-21', '20-FEB-21');
insert into ssb_price values ('A001', 62, 'SD', '23-FEB-21', '26-FEB-21');
insert into ssb_price values ('A001', 59, 'SD', '26-FEB-21', '03-MAR-21');

对于特定的 itm_no 和 price,如果 from 和 to 日期是连续的,那么我应该得到那个值。对于价格 77,从日期到下一个日期之间有 2 天(第 13 和 14 天)的差距,因此它不是连续的。让我粘贴所需的输出应该是什么样子:(从 excel 中截取)

我已经在另一个帖子中提出了这个问题。但是那个帖子很旧,没有任何反馈,所以创建了这个。请让我知道我是否应该将这篇文章与上一篇文章合并。

【问题讨论】:

我可以对上面的帖子进行编辑吗?还是我应该创建一个新帖子?以上场景需要稍作改动。 【参考方案1】:

这基本上是一个差距和孤岛问题。但是,您希望在最后一步使用窗口函数,而不是通过聚合来减少行数。

在您的数据中,时间框架整齐地平铺。这建议使用lag() 和累积总和来定义组:

select p.*,
       min(valid_from_dt) over (partition by itm_no, price, price_code, grp) as new_valid_from_dt,
       max(valid_to_dt) over (partition by itm_no, price, price_code, grp) as new_valid_to_dt
from (select p.*,
             sum(case when valid_from_dt = prev_valid_to_dt + interval '1' day then 0 else 1 end) over 
                   (partition by itm_no, price, price_code order by valid_from_dt) as grp
      from (select p.*,
                   lag(valid_to_dt) over (partition by itm_no, price, price_code order by valid_from_dt) as prev_valid_to_dt
            from ssb_price p 
           ) p
     ) p
order by itm_no, valid_from_dt;

Here 是一个 dbfiddle。

【讨论】:

感谢您的快速回复!像魅力一样工作!【参考方案2】:

从 Oracle 12 开始,您可以使用MATCH_RECOGNIZE

SELECT itm_no,
       price,
       price_code,
       valid_from_dt,
       valid_to_dt,
       MIN( valid_from_dt ) OVER ( PARTITION BY itm_no, mnum ) AS new_valid_from_dt,
       MAX( valid_to_dt ) OVER ( PARTITION BY itm_no, mnum ) AS new_valid_to_dt
FROM   ssb_price
MATCH_RECOGNIZE(
  PARTITION BY itm_no
  ORDER     BY valid_from_dt, valid_to_dt
  MEASURES
    MATCH_NUMBER() AS mnum
  ALL ROWS PER MATCH
  PATTERN ( start_range continued_range* )
  DEFINE
    continued_range AS (
      valid_from_dt = PREV( valid_to_dt ) + 1
      AND price = PREV( price )
    )
)

并且,从 Oracle 10g 开始,您可以使用 MODEL 子句:

SELECT itm_no,
       price,
       price_code,
       valid_from_dt,
       valid_to_dt,
       mn,
       MIN( valid_from_dt ) OVER ( PARTITION BY itm_no, mn ) AS new_valid_from_dt,
       MAX( valid_to_dt ) OVER ( PARTITION BY itm_no, mn ) AS new_valid_to_dt
FROM   (
  SELECT *
  FROM   (
    SELECT s.*,
           ROW_NUMBER() OVER ( PARTITION BY itm_no ORDER BY valid_from_dt ) AS rn
    FROM   ssb_price s
  )
  MODEL
    PARTITION BY ( itm_no )
    DIMENSION BY ( rn )
    MEASURES ( price, price_code, valid_from_dt, valid_to_dt, 1 AS mn )
    RULES (
      mn[rn>1] = mn[cv(rn)-1]
                 +
                 CASE
                 WHEN valid_from_dt[cv(rn)] = valid_to_dt[cv(rn)-1] + 1
                 AND  price[cv(rn)] = price[cv(rn) - 1]
                 THEN 0
                 ELSE 1
                 END
    )
)

其中,对于样本数据:

create table ssb_price (itm_no, price, price_code, valid_from_dt, valid_to_dt) AS
SELECT 'A001', 83, 'AB', DATE '2021-01-01', DATE '2021-01-05' FROM DUAL UNION ALL
SELECT 'A001', 83, 'AB', DATE '2021-01-06', DATE '2021-01-12' FROM DUAL UNION ALL
SELECT 'A001', 98, 'SPQ', DATE '2021-01-13', DATE '2021-01-17' FROM DUAL UNION ALL
SELECT 'A001', 83, 'AB', DATE '2021-01-19', DATE '2021-01-24' FROM DUAL UNION ALL
SELECT 'A001', 83, 'DE', DATE '2021-01-25', DATE '2021-01-30' FROM DUAL UNION ALL
SELECT 'A001', 83, 'DE', DATE '2021-01-31', DATE '2021-02-04' FROM DUAL UNION ALL
SELECT 'A001', 77, 'XY', DATE '2021-02-07', DATE '2021-02-12' FROM DUAL UNION ALL
SELECT 'A001', 77, 'XY', DATE '2021-02-15', DATE '2021-02-20' FROM DUAL UNION ALL
SELECT 'A001', 62, 'SD', DATE '2021-02-23', DATE '2021-02-26' FROM DUAL UNION ALL
SELECT 'A001', 59, 'SD', DATE '2021-02-26', DATE '2021-03-03' FROM DUAL;

输出:

ITM_NO PRICE PRICE_CODE VALID_FROM_DT VALID_TO_DT NEW_VALID_FROM_DT NEW_VALID_TO_DT
A001 83 AB 2021-01-01 00:00:00 2021-01-05 00:00:00 2021-01-01 00:00:00 2021-01-12 00:00:00
A001 83 AB 2021-01-06 00:00:00 2021-01-12 00:00:00 2021-01-01 00:00:00 2021-01-12 00:00:00
A001 98 SPQ 2021-01-13 00:00:00 2021-01-17 00:00:00 2021-01-13 00:00:00 2021-01-17 00:00:00
A001 83 AB 2021-01-19 00:00:00 2021-01-24 00:00:00 2021-01-19 00:00:00 2021-02-04 00:00:00
A001 83 DE 2021-01-25 00:00:00 2021-01-30 00:00:00 2021-01-19 00:00:00 2021-02-04 00:00:00
A001 83 DE 2021-01-31 00:00:00 2021-02-04 00:00:00 2021-01-19 00:00:00 2021-02-04 00:00:00
A001 77 XY 2021-02-07 00:00:00 2021-02-12 00:00:00 2021-02-07 00:00:00 2021-02-12 00:00:00
A001 77 XY 2021-02-15 00:00:00 2021-02-20 00:00:00 2021-02-15 00:00:00 2021-02-20 00:00:00
A001 62 SD 2021-02-23 00:00:00 2021-02-26 00:00:00 2021-02-23 00:00:00 2021-02-26 00:00:00
A001 59 SD 2021-02-26 00:00:00 2021-03-03 00:00:00 2021-02-26 00:00:00 2021-03-03 00:00:00

db小提琴here

【讨论】:

谢谢哥们!我不在 12c 上,但我一定会检查 MODEL 关键字。老实说,我从未使用过模型。我会阅读它。感谢您介绍一个新概念!

以上是关于Oracle 排序间隙和孤岛查询的主要内容,如果未能解决你的问题,请参考以下文章

基于列序列的间隙和孤岛查询/重置行数

间隙和孤岛 SQL 错误

如何在 SQL Server 中查询最新岛的大小?

使用未在 Mysql 中排序的多个日期范围查询给定月份的日期范围间隙

按月、日、小时+间隙和孤岛问题分组

分组依据基于 Redshift 中的后续标志(间隙和孤岛问题)