有重复组时选择最后一组的第一条记录
Posted
技术标签:
【中文标题】有重复组时选择最后一组的第一条记录【英文标题】:Select the first record of the last group when there are repeating groups 【发布时间】:2018-12-25 20:01:52 【问题描述】:尝试为每个 POLICY_ID 选择最新重复 STATUS 组的第一条记录。我该怎么做?
编辑/注意:可以有两个以上的状态重复,如最后三行所示。
查看数据:
期望的输出:
用于数据的 SQL:
--drop table mytable;
create table mytable (ROW_ID Number(5), POLICY_ID Number(5),
CHANGE_NO Number(5), STATUS VARCHAR(50), CHANGE_DATE DATE);
insert into mytable values ( 81, 1, 1, 'A', date '2018-01-01');
insert into mytable values ( 95, 1, 2, 'A', date '2018-01-02');
insert into mytable values ( 100, 1, 3, 'B', date '2018-01-03');
insert into mytable values ( 150, 1, 4, 'C', date '2018-01-04');
insert into mytable values ( 165, 1, 5, 'A', date '2018-01-05');
insert into mytable values ( 175, 1, 6, 'A', date '2018-01-06');
insert into mytable values ( 599, 2, 1, 'S', date '2018-01-11');
insert into mytable values ( 602, 2, 2, 'S', date '2018-01-12');
insert into mytable values ( 611, 2, 3, 'S', date '2018-01-13');
insert into mytable values ( 629, 2, 4, 'T', date '2018-01-14');
insert into mytable values ( 720, 2, 5, 'U', date '2018-01-15');
insert into mytable values ( 790, 2, 6, 'S', date '2018-01-16');
insert into mytable values ( 812, 2, 7, 'S', date '2018-01-17');
insert into mytable values ( 825, 2, 8, 'S', date '2018-01-18');
select * from mytable;
【问题讨论】:
您使用哪个版本的 Oracle?如果是 12c,那么您可以使用 MATCH_RECOGNIZE 【参考方案1】:嗯。 . .
select t.*
from (select t.*,
row_number() over (partition by policy_id order by change_date asc) as seqnum
from t
where not exists (select 1
from t t2
where t2.policy_id = t.policy_id and
t2.status <> t.status and
t2.change_date > t.change_date
)
) t
where seqnum = 1;
内部子查询查找所有行,其中 - 对于给定的策略编号 - 没有具有不同状态的后续行。这定义了最后一组记录。
然后它使用row_number()
枚举行。这些外部查询为每个policy_number
选择第一行。
【讨论】:
成功了,谢谢!你能再解释一下“选择1”部分吗?但是,您没有从 t2 中选择任何列,在 where 子句中您使用了该表中的列。 @kzmlbyrk 。 . .not exists
检查子查询是否返回了 row。行中的值没有区别。 select 1
是最容易输入的。【参考方案2】:
您可以使用 LEAD 和 LAG 函数来识别开始“重复”的行。条件 status <> previous status and status = next status
将识别此类行。
SELECT *
FROM (
SELECT cte1.*, ROW_NUMBER() OVER (PARTITION BY POLICY_ID ORDER BY CHANGE_DATE DESC) AS rn
FROM (
SELECT mytable.*, CASE WHEN
STATUS <> LAG(STATUS, 1, '!') OVER (PARTITION BY POLICY_ID ORDER BY CHANGE_DATE) AND
STATUS = LEAD(STATUS) OVER (PARTITION BY POLICY_ID ORDER BY CHANGE_DATE)
THEN 1 END AS toselect
FROM mytable
) cte1
WHERE toselect = 1
) cte2
WHERE rn = 1
【讨论】:
【参考方案3】:如果你使用Oracle 12c
,你可以使用MATCH_RECOGNIZE
:
SELECT ROW_ID, POLICY_ID, CHANGE_NO, STATUS, CHANGE_DATE
FROM mytable
MATCH_RECOGNIZE (
PARTITION BY POLICY_ID
ORDER BY CHANGE_DATE
MEASURES MATCH_NUMBER() m,FIRST(R.ROW_ID) r
ALL ROWS PER MATCH
PATTERN (R+)
DEFINE R AS STATUS=NEXT(STATUS)
) MR
WHERE ROW_ID = R
ORDER BY ROW_NUMBER() OVER(PARTITION BY POLICY_ID ORDER BY M DESC)
FETCH FIRST 1 ROW WITH TIES;
db<>fiddle demo
或者:
SELECT *
FROM mytable
MATCH_RECOGNIZE (
PARTITION BY POLICY_ID
ORDER BY CHANGE_DATE DESC
MEASURES MATCH_NUMBER() m
,LAST(R.ROW_ID) ROW_ID
,LAST(R.STATUS) STATUS
,LAST(R.CHANGE_NO) CHANGE_NO
,LAST(R.CHANGE_DATE) CHANGE_DATE
ONE ROW PER MATCH
PATTERN (R+)
DEFINE R AS STATUS=PREV(STATUS)
) MR
WHERE M = 1
db<>fiddle demo2
【讨论】:
【参考方案4】:match_recognize
的另一种方法:
select row_id, policy_id, change_no, status, change_date
from mytable
match_recognize (
partition by policy_id
order by change_date
measures
strt.row_id as row_id
, strt.change_no as change_no
, strt.change_date as change_date
, strt.status as status
pattern (strt unchanged* final)
define
unchanged as next(unchanged.status) = prev(unchanged.status)
, final as next(final.status) is null
) mr
order by mr.policy_id;
【讨论】:
以上是关于有重复组时选择最后一组的第一条记录的主要内容,如果未能解决你的问题,请参考以下文章