使用条件对行进行排名

Posted

技术标签:

【中文标题】使用条件对行进行排名【英文标题】:rank rows with conditions 【发布时间】:2021-03-01 09:01:27 【问题描述】:

我有下表,我需要根据“索引”列(在表中给出)计算“排名索引”。 意思是,仅当自上一个时间戳以来已过去 6 小时或更长时间时才提升索引。 应该是key1+key2分区。

有什么想法吗?

【问题讨论】:

【参考方案1】:

如果你在 11g 上也可以使用它

with Your_table (key1, key2, datetime, IDX) as (
  select 11, 22, to_date('2021-01-01 00:00', 'yyyy-mm-dd hh24:mi'), 321 from dual union all
  select 11, 22, to_date('2021-01-01 01:00', 'yyyy-mm-dd hh24:mi'), 322 from dual union all
  select 11, 22, to_date('2021-01-01 02:00', 'yyyy-mm-dd hh24:mi'), 323 from dual union all
  select 11, 22, to_date('2021-01-01 08:30', 'yyyy-mm-dd hh24:mi'), 324 from dual union all
  select 11, 22, to_date('2021-01-01 09:00', 'yyyy-mm-dd hh24:mi'), 325 from dual union all
  select 11, 22, to_date('2021-01-01 16:00', 'yyyy-mm-dd hh24:mi'), 326 from dual union all
  select 11, 22, to_date('2021-01-01 17:00', 'yyyy-mm-dd hh24:mi'), 327 from dual union all
  select 11, 22, to_date('2021-01-02 04:00', 'yyyy-mm-dd hh24:mi'), 328 from dual union all
  ---
  select 999, 777, to_date('2021-01-01 00:00', 'yyyy-mm-dd hh24:mi'), 17 from dual union all
  select 999, 777, to_date('2021-01-01 01:00', 'yyyy-mm-dd hh24:mi'), 18 from dual union all
  select 999, 777, to_date('2021-01-22 02:00', 'yyyy-mm-dd hh24:mi'), 19 from dual union all
  select 999, 777, to_date('2021-01-22 04:00', 'yyyy-mm-dd hh24:mi'), 20 from dual
)
, temp_rws_ordered (key1, key2, datetime, IDX, rnb) as (  
  select KEY1, KEY2, DATETIME, IDX, row_number()over(order by KEY1, KEY2, DATETIME)rnb 
  from Your_table
), cte (key1, key2, datetime, IDX, rnb, treshold, rank_index) as (
  select key1, key2, datetime, IDX, rnb, datetime treshold, IDX rank_index 
  from temp_rws_ordered 
  where rnb = 1
  union all
  select t.key1, t.key2, t.datetime, t.IDX, t.rnb
  , case 
      when t.KEY1 = c.KEY1 and t.KEY2 = c.KEY2 then
        case 
          when (t.datetime - c.treshold)*24 > 6 then t.datetime 
          else c.treshold
        end
      else t.datetime
    end treshold
  , case 
      when t.KEY1 = c.KEY1 and t.KEY2 = c.KEY2 then 
        case 
          when (t.datetime - c.treshold)*24 > 6 then c.rank_index + 1
          else c.rank_index
        end
      else t.IDX
    end rank_index
  from temp_rws_ordered t
  join cte c on (t.rnb = c.rnb + 1)
)
select KEY1, KEY2, DATETIME, IDX, RANK_INDEX
from cte
;

【讨论】:

【参考方案2】:

从 Oracle 12c 开始,您可以使用MATCH_RECOGNIZE

SELECT *
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY key1, key2
  ORDER     BY datetime
  MEASURES
    FIRST( idx ) AS rank_idx
  ALL ROWS PER MATCH
  PATTERN ( within_6_hours* last_row  )
  DEFINE
    within_6_hours AS (
      NEXT( datetime ) < LAST( datetime ) + INTERVAL '6' HOUR
    )
)

其中,对于您的示例数据:

CREATE TABLE table_name ( key1, key2, datetime, idx ) AS
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '00:00' HOUR TO MINUTE, 321 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '01:00' HOUR TO MINUTE, 322 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '02:00' HOUR TO MINUTE, 323 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '08:30' HOUR TO MINUTE, 324 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '09:00' HOUR TO MINUTE, 325 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '16:00' HOUR TO MINUTE, 326 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-01' + INTERVAL '17:00' HOUR TO MINUTE, 327 FROM DUAL UNION ALL
SELECT 11, 22, DATE '2020-01-02' + INTERVAL '04:00' HOUR TO MINUTE, 328 FROM DUAL UNION ALL
SELECT 999, 777, DATE '2020-01-01' + INTERVAL '00:00' HOUR TO MINUTE, 17 FROM DUAL UNION ALL
SELECT 999, 777, DATE '2020-01-01' + INTERVAL '01:00' HOUR TO MINUTE, 18 FROM DUAL UNION ALL
SELECT 999, 777, DATE '2020-01-22' + INTERVAL '02:00' HOUR TO MINUTE, 19 FROM DUAL UNION ALL
SELECT 999, 777, DATE '2020-01-22' + INTERVAL '04:00' HOUR TO MINUTE, 20 FROM DUAL;

输出:

键1 |键2 |日期时间 | RANK_IDX | IDX ---: | ---: | :----------------- | --------: | --: 11 | 22 | 2020-01-01 00:00:00 | 321 | 321 11 | 22 | 2020-01-01 01:00:00 | 321 | 322 11 | 22 | 2020-01-01 02:00:00 | 321 | 323 11 | 22 | 2020-01-01 08:30:00 | 324 | 324 11 | 22 | 2020-01-01 09:00:00 | 324 | 325 11 | 22 | 2020-01-01 16:00:00 | 326 | 326 11 | 22 | 2020-01-01 17:00:00 | 326 | 327 11 | 22 | 2020-01-02 04:00:00 | 328 | 328 999 |第777章2020-01-01 00:00:00 | 17 | 17 999 |第777章2020-01-01 01:00:00 | 17 | 18 999 |第777章2020-01-22 02:00:00 | 19 | 19 999 |第777章2020-01-22 04:00:00 | 19 | 20

您也可以申请LAG 两次,这将适用于Oracle 12 之前的版本:

SELECT key1,
       key2,
       datetime,
       idx,
       COALESCE(
         rank_idx,
         LAG( rank_idx ) IGNORE NULLS OVER ( PARTITION BY key1, key2 ORDER BY datetime )
       ) AS rank_idx
FROM   (
  SELECT t.*,
         CASE
         WHEN datetime
              < LAG( datetime ) OVER ( PARTITION BY key1, key2 ORDER BY datetime )
                + INTERVAL '6' HOUR
         THEN NULL
         ELSE idx
         END AS rank_idx 
  FROM   table_name t
)

db小提琴here

【讨论】:

【参考方案3】:

我会为此使用窗口函数:

select t.*,
       (min_index - 1 +
        sum(case when prev_datetime > datetime - interval '6' hour then 0 else 1 end) over
            (partition by key1, key2 order by datetime)
       ) as rank_index
from (select t.*,
             min(index) over (partition by key1, key2) as min_index,
             lag(datetime) over (partition by key1, key2 order by datetime) as prev_datetime
      from t
     ) t;

Here 是一个数据库fioddle。

【讨论】:

以上是关于使用条件对行进行排名的主要内容,如果未能解决你的问题,请参考以下文章

如何在R中一次按两列对行进行排名?

根据列值对行进行排名/计数

SQL 按 Top 3 和其他对行进行分组。 (按州和其他排名前 3 名的城市的收入)

如何按r中其他列中的条件对行进行排序?

我想使用某些条件进行排名

使用条件和排名进行分组的 Python/Pandas 实现