如何从事务事实表生成时间点快照表?
Posted
技术标签:
【中文标题】如何从事务事实表生成时间点快照表?【英文标题】:How to generate point in time snapshot table from a transaction fact table? 【发布时间】:2021-07-14 12:26:54 【问题描述】:我有一个交易表,通过关闭前一条记录的结束日期并使用当前系统时间和结束日期打开一条新记录来记录客户状态(A、B、C、D)的变化新的记录将被设置为高开放日期。
FactID | Cust_ID | Status | EffectiveDate | EndDate |
---|---|---|---|---|
1 | 1 | A | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM |
2 | 1 | B | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM |
3 | 1 | C | 24/05/2021 2:47:28 PM | 24/05/2021 4:15:45 PM |
4 | 1 | A | 24/05/2021 4:15:45 PM | 24/05/2021 8:05:09 PM |
5 | 1 | D | 24/05/2021 8:05:09 PM | 31/12/9000 |
我正在尝试根据上述事务表在时间点(日终报告)构建快照。
ReportDate | Cust_ID | EODStatus | A_SDate | A_EDate | B_SDate | B_EDate | C_SDate | C_EDate | D_SDate | D_EDate |
---|---|---|---|---|---|---|---|---|---|---|
20/05/2021 11:59:59 PM | 1 | A | 20/05/2021 8:52:29 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
21/05/2021 11:59:59 PM | 1 | B | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
22/05/2021 11:59:59 PM | 1 | B | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
23/05/2021 11:59:59 PM | 1 | B | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
24/05/2021 11:59:59 PM | 1 | D | 20/05/2021 8:52:29 PM | 24/05/2021 8:05:09 PM | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 24/05/2021 2:47:28 PM | 24/05/2021 4:15:45 PM | 24/05/2021 8:05:09 PM | 31/12/9000 |
25/05/2021 11:59:59 PM | 1 | D | 20/05/2021 8:52:29 PM | 24/05/2021 8:05:09 PM | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 24/05/2021 2:47:28 PM | 24/05/2021 4:15:45 PM | 24/05/2021 8:05:09 PM | 31/12/9000 |
当我尝试在构建快照之前扩展事务表时,我现在卡住了。任何指针将不胜感激。
WITH
date_ranges
AS
(SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
FROM all_objects
WHERE ROWNUM <= 6),
transactions (factid, cust_id, status, effectivedate, enddate)
AS
(SELECT 1, 1, 'A', TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 2, 1, 'B', TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 3, 1, 'C', TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 4, 1, 'A', TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 5, 1, 'D', TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
dataset
AS
(SELECT DISTINCT reportdate,
cust_id,
status AS eodstatus,
effectivedate,
enddate
FROM transactions CROSS JOIN date_ranges)
SELECT reportdate,
cust_id,
eodstatus,
effectivedate,
enddate,
CASE
WHEN eodstatus = 'A' THEN MIN (effectivedate)
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS a_sdate,
CASE WHEN eodstatus = 'A' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS a_edate,
CASE
WHEN eodstatus = 'B' THEN MIN (effectivedate)
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS b_sdate,
CASE WHEN eodstatus = 'B' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS b_edate,
CASE
WHEN eodstatus = 'C' THEN MIN (effectivedate)
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS c_sdate,
CASE WHEN eodstatus = 'C' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS c_edate,
CASE
WHEN eodstatus = 'D' THEN MIN (effectivedate)
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS d_sdate,
CASE WHEN eodstatus = 'D' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS d_edate
FROM dataset t
WHERE reportdate BETWEEN effectivedate AND enddate
GROUP BY reportdate, cust_id, eodstatus, effectivedate, enddate
ORDER BY reportdate, cust_id, eodstatus;
REPORTDATE | CUST_ID | EODSTATUS | EFFECTIVEDATE | ENDDATE | A_SDATE | A_EDATE | B_SDATE | B_EDATE | C_SDATE | C_EDATE | D_SDATE | D_EDATE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20/05/2021 11:59:59 PM | 1 | "A" | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
21/05/2021 11:59:59 PM | 1 | "B" | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
22/05/2021 11:59:59 PM | 1 | "B" | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
23/05/2021 11:59:59 PM | 1 | "B" | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 |
24/05/2021 11:59:59 PM | 1 | "D" | 24/05/2021 8:05:09 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 24/05/2021 8:05:09 PM | 31/12/9000 |
25/05/2021 11:59:59 PM | 1 | "D" | 24/05/2021 8:05:09 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 24/05/2021 8:05:09 PM | 31/12/9000 |
SQLFiddle here
PS:我查看了另一个thread in SO,它的标题几乎相同,但没有多大帮助。
更新 1:
我现在能够获得所有报告日期的每日状态,但开始和结束日期的计算以及将值转发到后续行仍然没有发生(因为我还没有弄清楚)。
开始日期 - MIN(给定状态的生效日期) 结束日期 - MAX(给定状态的结束日期)更新 2: 计算的开始日期和结束日期不得早于报告日期。请参阅展示当前问题的 SQL 输出
【问题讨论】:
你可以用9000-12-31
代替日期NULL
会更一致。
这里是 SQL Fiddle sqlfiddle.com/#!4/c50f8/1
那么,您说您在尝试扩展事务表时被卡住了——您能否更明确地帮助我理解具体问题——问题是什么? :)
NULL
从理论上看可能会更好@WernfriedDomscheit,但会使查询更加复杂(并且可能不一致)。您必须使用一些额外的NVL
或OR
条件,而不是some_date >= EffectiveDate and some_date < EndDate
(以获取某个时间戳的版本)。通常是 EndDate
减少最低精度(例如 DATE 为 1 秒)以获得更简单的谓词 some_date BETWEEN EffectiveDate and EndDate
。你在实际项目中遇到过NULL
吗?
我认为您需要回复此评论 所以,您说您在尝试扩展事务表时被卡住了——您能否更明确地帮助我理解具体的问题 -- 问题是什么? 继续。
【参考方案1】:
我已经有一段时间没有做 Oracle 了,但你需要两个组件:
-
当前快照
固定的历史快照
这会为给定的硬编码日期生成快照。我没有 Oracle 来检查变量是如何工作的,所以你必须自己做日期变量部分。
注意:
我假设 Cust_ID 一次只能有一个状态 现实世界的数据比这更复杂,而且总是有边缘情况 如果 Cust 没有当前状态,则不会有行 刚刚注意到您的日期重叠。这是一个问题,因为客户同时处于两种状态您可以加入日历表以针对所有日期运行此操作,但这可能会非常耗费性能,并且您通常只想每天生成以添加到现有表中。
以下是从小提琴复制的代码
设置代码
CREATE TABLE t
(FactID int, Cust_ID int, Status varchar2(1), EffectiveDate DATE, EndDate DATE)
;
INSERT ALL
INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
VALUES (1, 1, 'A', TIMESTAMP'2021-05-20 08:52:29.000', TIMESTAMP'2021-05-21 03:08:22.000')
INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
VALUES (2, 1, 'B', TIMESTAMP'2021-05-21 03:08:22.000', TIMESTAMP'2021-05-24 02:47:28.000')
INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
VALUES (3, 1, 'C', TIMESTAMP'2021-05-24 02:47:28.000', TIMESTAMP'2021-05-24 04:15:45.000')
INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
VALUES (4, 1, 'A', TIMESTAMP'2021-05-24 04:15:45.000', TIMESTAMP'2021-05-24 08:05:09.000')
INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
VALUES (5, 1, 'D', TIMESTAMP'2021-05-24 08:05:09.000', TIMESTAMP'9000-12-31 00:00:00.000')
SELECT * FROM dual
;
查询
SELECT
T.Cust_ID, DATE '2021-05-25' ReportDate, T.Status, T.EffectiveDate,T.EndDate,
H.A_SDATE, H.A_EDATE, H.B_SDATE, H.B_EDATE, H.C_SDATE, H.C_EDATE
FROM
(
-- Todays snapshot
SELECT Cust_ID,Status, EffectiveDate,EndDate
FROM t
WHERE DATE '2021-05-25' BETWEEN EffectiveDate AND EndDate
) T
LEFT OUTER JOIN
(
-- Static capture of all states
SELECT Cust_ID,
MIN(CASE WHEN Status = 'A' THEN EffectiveDate ELSE NULL END) A_SDATE,
MAX(CASE WHEN Status = 'A' THEN LEAST(DATE '2021-07-10',EndDate) ELSE NULL END) A_EDATE,
MIN(CASE WHEN Status = 'B' THEN EffectiveDate ELSE NULL END) B_SDATE,
MAX(CASE WHEN Status = 'B' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) B_EDATE,
MIN(CASE WHEN Status = 'C' THEN EffectiveDate ELSE NULL END) C_SDATE,
MAX(CASE WHEN Status = 'C' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) C_EDATE
FROM t
-- Exclude state changes after the process date
WHERE EffectiveDate < DATE '2021-05-25'
GROUP BY Cust_ID
) H
ON T.Cust_ID = H.Cust_ID
【讨论】:
MIN 和 MAX 应该考虑的值只有首先,我真诚地感谢所有试图帮助我的人。我以某种复杂的逻辑设法完成了这个几乎不可能完成的任务(但它确实有效)。我尝试提供内联 cmets 来解释推导。特别提到@Wernfried Domscheit,他编写了 PIVOT 逻辑并删除了答案,这在很大程度上帮助了我。
WITH
date_ranges
-- Generate dates
AS
(SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
FROM all_objects
WHERE ROWNUM <= 6),
-- Mock up source records
transactions (factid, cust_id,status,effectivedate,enddate)
AS
(SELECT 1,1,'A',
TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 2,1,'B',
TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 3,1,'C',
TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 4,1,'A',
TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
UNION ALL
SELECT 5,1,'D',
TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
dataset
-- Apply cross join to get report date into transactions
-- Could've been much better; time crunched
AS
(SELECT DISTINCT reportdate,cust_id,status AS eodstatus,effectivedate,enddate
FROM transactions CROSS JOIN date_ranges),
dataset1
-- Ignore start and end dates if they are older than the reporting date
AS
( SELECT reportdate,
cust_id,
eodstatus,
CASE
WHEN reportdate > effectivedate THEN effectivedate
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS effectivedate,
CASE
WHEN reportdate > enddate THEN enddate
ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
END AS enddate
FROM dataset
WHERE reportdate > effectivedate),
dataset2
-- Grab the min of start and max of end for all reporting days
AS
( SELECT reportdate,
cust_id,
eodstatus,
eodstatus AS status,
MIN (effectivedate) effectivedate,
MAX (enddate) enddate
FROM dataset1
GROUP BY reportdate, cust_id, eodstatus),
dataset_new
-- Apply PIVOT to capture the start and end date per known statues and replacing NULLs with high open end dates
AS
( SELECT reportdate,
cust_id,
eodstatus,
COALESCE ('A','B','C','D') AS status,
NVL (a_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) a_sdate,
NVL (a_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) a_edate,
NVL (b_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) b_sdate,
NVL (b_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) b_edate,
NVL (c_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) c_sdate,
NVL (c_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) c_edate,
NVL (d_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) d_sdate,
NVL (d_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY')) d_edate
FROM dataset2
PIVOT (MIN (effectivedate) AS "SDATE", MAX (enddate) AS "EDATE"
FOR status
IN ('A' AS "A", 'B' AS "B", 'C' AS "C", 'D' AS "D"))
ORDER BY reportdate),
date_manipulations
-- Merging multiple entries into one record a day
AS
( SELECT reportdate,
cust_id,
MIN (a_sdate) a_sdate,
MIN (a_edate) a_edate,
MIN (b_sdate) b_sdate,
MIN (b_edate) b_edate,
MIN (c_sdate) c_sdate,
MIN (c_edate) c_edate,
MIN (d_sdate) d_sdate,
MIN (d_edate) d_edate
FROM dataset_new
GROUP BY reportdate, cust_id
ORDER BY 1)
-- JOIN with transaction to report the original status
SELECT a.*, b.status
FROM date_manipulations a JOIN transactions b ON reportdate BETWEEN effectivedate AND enddate;
REPORTDATE | CUST_ID | A_SDATE | A_EDATE | B_SDATE | B_EDATE | C_SDATE | C_EDATE | D_SDATE | D_EDATE | STATUS |
---|---|---|---|---|---|---|---|---|---|---|
20/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | "A" |
21/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | "B" |
22/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | "B" |
23/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 21/05/2021 3:08:22 PM | 21/05/2021 3:08:22 PM | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | 31/12/9000 | "B" |
24/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 24/05/2021 8:05:09 PM | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 24/05/2021 2:47:28 PM | 24/05/2021 4:15:45 PM | 24/05/2021 8:05:09 PM | 31/12/9000 | "D" |
25/05/2021 11:59:59 PM | 1 | 20/05/2021 8:52:29 PM | 24/05/2021 8:05:09 PM | 21/05/2021 3:08:22 PM | 24/05/2021 2:47:28 PM | 24/05/2021 2:47:28 PM | 24/05/2021 4:15:45 PM | 24/05/2021 8:05:09 PM | 31/12/9000 | "D" |
【讨论】:
以上是关于如何从事务事实表生成时间点快照表?的主要内容,如果未能解决你的问题,请参考以下文章