如何获取行的最后更改时间戳?
Posted
技术标签:
【中文标题】如何获取行的最后更改时间戳?【英文标题】:How the get last change timestamp for row? 【发布时间】:2017-04-29 12:07:17 【问题描述】:我有一个这样的数据集(下面的 DDL):
+----+------------------+----------------------+---------------------+
| ID | NAME | EMAIL | LAST_UPD |
+----+------------------+----------------------+---------------------+
| 1 | JOHN SMITH | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:50:51 |
+----+------------------+----------------------+---------------------+
| 1 | J SMITH | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:51:15 |
+----+------------------+----------------------+---------------------+
| 1 | J SMITH | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:51:36 |
+----+------------------+----------------------+---------------------+
| 1 | JOHN JAMES SMITH | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:52:11 |
+----+------------------+----------------------+---------------------+
| 2 | JAMES FORD | JAMES.FORD@GMAIL.COM | 29/04/2017 10:52:57 |
+----+------------------+----------------------+---------------------+
| 2 | JAMES FORD | J.FORD@GMAIL.COM | 29/04/2017 10:53:17 |
+----+------------------+----------------------+---------------------+
| 2 | JAMES FORD | J.FORD@GMAIL.COM | 29/04/2017 11:47:15 |
+----+------------------+----------------------+---------------------+
我正在尝试获取名称和电子邮件列的最后更新日期(更改的时间戳)以及 ID 中这些字段的相应值。如果给定的属性没有改变,则应该接收到最小的 LAST_UPD。我尝试了这个并得到了我想要的值,但是如何为给定的 ID“挤压”这个值?
SELECT
ID,
NAME,
CASE
WHEN LAG(NAME)OVER(PARTITION BY ID ORDER BY LAST_UPD) != NAME
THEN LAST_UPD
WHEN LEAD(NAME)OVER(PARTITION BY ID ORDER BY LAST_UPD) = NAME
THEN LAST_UPD
END NAME_CHANGED,
EMAIL,
CASE
WHEN LAG(EMAIL)OVER(PARTITION BY ID ORDER BY LAST_UPD) != EMAIL
THEN LAST_UPD
WHEN LEAD(EMAIL)OVER(PARTITION BY ID ORDER BY LAST_UPD) = EMAIL
THEN LAST_UPD
END EMAIL_CHANGED
FROM CUSTOMER
;
结果应该是这样的:
+----+------------------+---------------------+----------------------+---------------------+
| ID | NAME | NAME_CHANGED | EMAIL | EMAIL_CHANGED |
+----+------------------+---------------------+----------------------+---------------------+
| 1 | JOHN JAMES SMITH | 29/04/2017 10:52:11 | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:50:51 |
+----+------------------+---------------------+----------------------+---------------------+
| 2 | JAMES FORD | 29/04/2017 10:52:57 | J.FORD@GMAIL.COM | 29/04/2017 10:53:17 |
+----+------------------+---------------------+----------------------+---------------------+
DDL:
CREATE TABLE CUSTOMER
(
ID VARCHAR2(20)
, NAME VARCHAR2(50)
, EMAIL VARCHAR2(50)
, LAST_UPD DATE
);
REM INSERTING into CUSTOMER
SET DEFINE OFF;
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','JOHN SMITH','JOHN.SMITH@GMAIL.COM',to_date('29/04/2017 10:50:51','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','J SMITH','JOHN.SMITH@GMAIL.COM',to_date('29/04/2017 10:51:15','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','J SMITH','JOHN.SMITH@GMAIL.COM',to_date('29/04/2017 10:51:36','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('1','JOHN JAMES SMITH','JOHN.SMITH@GMAIL.COM',to_date('29/04/2017 10:52:11','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','JAMES.FORD@GMAIL.COM',to_date('29/04/2017 10:52:57','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','J.FORD@GMAIL.COM',to_date('29/04/2017 10:53:17','DD/MM/YYYY HH24:MI:SS'));
Insert into CUSTOMER (ID,NAME,EMAIL,LAST_UPD) values ('2','JAMES FORD','J.FORD@GMAIL.COM',to_date('29/04/2017 11:47:15','DD/MM/YYYY HH24:MI:SS'));
COMMIT;
SELECT * FROM CUSTOMER;
【问题讨论】:
【参考方案1】:我认为关键思想是从指示电子邮件或名称是否更改的标志开始。您可以使用延迟来获得此信息。而且,使用正确的逻辑,您甚至可以将第一条记录标记为更改。
然后,您需要标记为更改的每一列的最后一条记录。以下代码使用first_value()
函数执行此操作——因为它可以忽略空值:
select distinct id,
first_value((case when name_changed then name end) ignore nulls) over (partition by id order by last_upd desc),
max(case when name_changed then last_upd end) over (partition by id) as last_upd_name,
first_value((case when email_changed then email end) ignore nulls) over (partition by id order by last_upd desc),
max(case when email_changed then last_upd end) over (partition by id) as last_upd_email
from (select c.*,
(case when c.name = lag(c.name) over (partition by c.id over order by c.last_upd) as name_changed,
then 0 else 1
end) as name_changed,
(case when c.email = lag(c.email) over (partition by c.id over order by c.last_upd) as email_change
then 0 else 1
end) as email_changed
from customer c
) c;
【讨论】:
谢谢,这似乎不适用于 Oracle 数据库,试图弄清楚如何更改语法。 @jrara 。 . .这是为 Oracle 设计的。这可能是一些简单的语法错误,例如不匹配的括号或缺少逗号。【参考方案2】:select id, name, max(nc) name_changed, email, max(mc) email_changed
from (
select id,
first_value(name) over(partition by id order by last_upd desc) name,
case lead(name) over(partition by id order by last_upd desc)
when name then NULL else last_upd end nc,
first_value(email) over(partition by id order by last_upd desc) email,
case lead(email) over(partition by id order by last_upd desc)
when email then NULL else last_upd end mc
from CUSTOMER
)
group by id,name,email
【讨论】:
谢谢,这也有效。 @Gordon Linoff 提供的解决方案在我的原始数据集(5,500 万行)上稍慢。 @jrara 略微更正。 -1 窗口排序。比较速度,如果不难的话。您的数据样本中有多少结果行?【参考方案3】:with data as
(
select ROWNUM AS RN, I.*
from
(
select id,COL, VAL, LAST_UPD from customer
unpivot(val for (col) in (NAME, EMAIL)) order by id, col, last_upd
) I
)
,
cte (rn, id, col, val, last_upd, dummy) as
(
select rn, id, col, val, last_upd, 1
from data
where rn in (select rn from (select rn, min(rn) over (partition by id, col) m from data) where rn = m)
union all
select
data.rn, data.id, data.col,
case when cte.val = data.val then cte.val else data.val end,
case when cte.val = data.val then cte.last_upd else data.last_upd end,
cte.dummy+1
from
data, cte
where
cte.rn + 1 = data.rn and cte.col = data.col and cte.id = data.id
)
,
rs as
(
select * from (
select cte.*, max(dummy) over (partition by id, col) m from cte
order by rn, id, col) where dummy = m
)
SELECT
n.ID, n.val as NAME, n.last_upd as NAME_CHANGED,
m.VAL as EMAIL, m.lAST_UPD as EMAIL_CHANGED
FROM
(select * from rs where col = 'NAME') n
join
(select * from rs where col = 'EMAIL') m
on (n.id = m.id)
;
【讨论】:
【参考方案4】:基于@Gordon Linoff 的回答,针对 Oracle 数据库进行了修改,这可行:
WITH CUST AS (
SELECT ID,
NAME,
EMAIL,
LAST_UPD,
CASE WHEN NAME = LAG(NAME) OVER (PARTITION BY ID ORDER BY LAST_UPD) THEN 0 ELSE 1 END AS NAME_CHANGED,
CASE WHEN EMAIL = LAG(EMAIL) OVER (PARTITION BY ID ORDER BY LAST_UPD) THEN 0 ELSE 1 END AS EMAIL_CHANGED
FROM CUSTOMER
)
SELECT DISTINCT CUST.ID,
FIRST_VALUE(CASE WHEN NAME_CHANGED = 1 THEN CUST.NAME END) IGNORE NULLS OVER (PARTITION BY ID ORDER BY LAST_UPD DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS NAME,
MAX(CASE WHEN CUST.NAME_CHANGED = 1 THEN CUST.LAST_UPD END) OVER (PARTITION BY CUST.ID) AS LAST_UPD_NAME,
FIRST_VALUE(CASE WHEN EMAIL_CHANGED = 1 THEN EMAIL END) IGNORE NULLS OVER (PARTITION BY ID ORDER BY LAST_UPD DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL,
MAX(CASE WHEN CUST.EMAIL_CHANGED = 1 THEN CUST.LAST_UPD END) OVER (PARTITION BY CUST.ID) AS LAST_UPD_EMAIL
FROM CUST
ORDER BY CUST.ID
;
结果:
+----+------------------+---------------------+----------------------+---------------------+
| ID | NAME | NAME_CHANGED | EMAIL | EMAIL_CHANGED |
+----+------------------+---------------------+----------------------+---------------------+
| 1 | JOHN JAMES SMITH | 29/04/2017 10:52:11 | JOHN.SMITH@GMAIL.COM | 29/04/2017 10:50:51 |
+----+------------------+---------------------+----------------------+---------------------+
| 2 | JAMES FORD | 29/04/2017 10:52:57 | J.FORD@GMAIL.COM | 29/04/2017 10:53:17 |
+----+------------------+---------------------+----------------------+---------------------+
【讨论】:
以上是关于如何获取行的最后更改时间戳?的主要内容,如果未能解决你的问题,请参考以下文章