如何从事务事实表生成时间点快照表?

Posted

技术标签:

【中文标题】如何从事务事实表生成时间点快照表?【英文标题】:How to generate point in time snapshot table from a transaction fact table? 【发布时间】:2021-07-14 12:26:54 【问题描述】:

我有一个交易表,通过关闭前一条记录的结束日期并使用当前系统时间和结束日期打开一条新记录来记录客户状态(A、B、C、D)的变化新的记录将被设置为高开放日期。

FactID Cust_ID Status EffectiveDate EndDate
1 1 A 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM
2 1 B 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM
3 1 C 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM
4 1 A 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM
5 1 D 24/05/2021 8:05:09 PM 31/12/9000

我正在尝试根据上述事务表在时间点(日终报告)构建快照。

ReportDate Cust_ID EODStatus A_SDate A_EDate B_SDate B_EDate C_SDate C_EDate D_SDate D_EDate
20/05/2021 11:59:59 PM 1 A 20/05/2021 8:52:29 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
21/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
22/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
23/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
24/05/2021 11:59:59 PM 1 D 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000
25/05/2021 11:59:59 PM 1 D 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000

当我尝试在构建快照之前扩展事务表时,我现在卡住了。任何指针将不胜感激。

WITH
    date_ranges
    AS
        (SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
           FROM all_objects
          WHERE ROWNUM <= 6),
    transactions (factid, cust_id, status, effectivedate, enddate)
    AS
        (SELECT 1, 1, 'A', TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 2, 1, 'B', TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 3, 1, 'C', TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 4, 1, 'A', TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 5, 1, 'D', TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
    dataset
    AS
        (SELECT DISTINCT reportdate,
                         cust_id,
                         status     AS eodstatus,
                         effectivedate,
                         enddate
           FROM transactions CROSS JOIN date_ranges)
  SELECT reportdate,
         cust_id,
         eodstatus,
         effectivedate,
         enddate,
         CASE
             WHEN eodstatus = 'A' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS a_sdate,
         CASE WHEN eodstatus = 'A' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS a_edate,
         CASE
             WHEN eodstatus = 'B' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS b_sdate,
         CASE WHEN eodstatus = 'B' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS b_edate,
         CASE
             WHEN eodstatus = 'C' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS c_sdate,
         CASE WHEN eodstatus = 'C' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS c_edate,
         CASE
             WHEN eodstatus = 'D' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS d_sdate,
         CASE WHEN eodstatus = 'D' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
          END             AS d_edate
    FROM dataset t
   WHERE reportdate BETWEEN effectivedate AND enddate
GROUP BY reportdate, cust_id, eodstatus, effectivedate, enddate
ORDER BY reportdate, cust_id, eodstatus;
REPORTDATE CUST_ID EODSTATUS EFFECTIVEDATE ENDDATE A_SDATE A_EDATE B_SDATE B_EDATE C_SDATE C_EDATE D_SDATE D_EDATE
20/05/2021 11:59:59 PM 1 "A" 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
21/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
22/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
23/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
24/05/2021 11:59:59 PM 1 "D" 24/05/2021 8:05:09 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 24/05/2021 8:05:09 PM 31/12/9000
25/05/2021 11:59:59 PM 1 "D" 24/05/2021 8:05:09 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 24/05/2021 8:05:09 PM 31/12/9000

SQLFiddle here

PS:我查看了另一个thread in SO,它的标题几乎相同,但没有多大帮助。

更新 1:

我现在能够获得所有报告日期的每日状态,但开始和结束日期的计算以及将值转发到后续行仍然没有发生(因为我还没有弄清楚)。

开始日期 - MIN(给定状态的生效日期) 结束日期 - MAX(给定状态的结束日期)

更新 2: 计算的开始日期和结束日期不得早于报告日期。请参阅展示当前问题的 SQL 输出

【问题讨论】:

你可以用9000-12-31代替日期NULL会更一致。 这里是 SQL Fiddle sqlfiddle.com/#!4/c50f8/1 那么,您说您在尝试扩展事务表时被卡住了——您能否更明确地帮助我理解具体问题——问题是什么? :) NULL 从理论上看可能会更好@WernfriedDomscheit,但会使查询更加复杂(并且可能不一致)。您必须使用一些额外的NVLOR条件,而不是some_date &gt;= EffectiveDate and some_date &lt; EndDate(以获取某个时间戳的版本)。通常是 EndDate 减少最低精度(例如 DATE 为 1 秒)以获得更简单的谓词 some_date BETWEEN EffectiveDate and EndDate。你在实际项目中遇到过NULL吗? 我认为您需要回复此评论 所以,您说您在尝试扩展事务表时被卡住了——您能否更明确地帮助我理解具体的问题 -- 问题是什么? 继续。 【参考方案1】:

我已经有一段时间没有做 Oracle 了,但你需要两个组件:

    当前快照 固定的历史快照

这会为给定的硬编码日期生成快照。我没有 Oracle 来检查变量是如何工作的,所以你必须自己做日期变量部分。

注意:

我假设 Cust_ID 一次只能有一个状态 现实世界的数据比这更复杂,而且总是有边缘情况 如果 Cust 没有当前状态,则不会有行 刚刚注意到您的日期重叠。这是一个问题,因为客户同时处于两种状态

您可以加入日历表以针对所有日期运行此操作,但这可能会非常耗费性能,并且您通常只想每天生成以添加到现有表中。

以下是从小提琴复制的代码

设置代码

CREATE TABLE t
    (FactID int, Cust_ID int, Status varchar2(1), EffectiveDate DATE, EndDate DATE)
;

INSERT ALL 
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (1, 1, 'A', TIMESTAMP'2021-05-20 08:52:29.000', TIMESTAMP'2021-05-21 03:08:22.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (2, 1, 'B', TIMESTAMP'2021-05-21 03:08:22.000', TIMESTAMP'2021-05-24 02:47:28.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (3, 1, 'C', TIMESTAMP'2021-05-24 02:47:28.000', TIMESTAMP'2021-05-24 04:15:45.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (4, 1, 'A', TIMESTAMP'2021-05-24 04:15:45.000', TIMESTAMP'2021-05-24 08:05:09.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (5, 1, 'D', TIMESTAMP'2021-05-24 08:05:09.000', TIMESTAMP'9000-12-31 00:00:00.000')         

SELECT * FROM dual
;

查询

SELECT
T.Cust_ID, DATE '2021-05-25' ReportDate, T.Status, T.EffectiveDate,T.EndDate,
H.A_SDATE, H.A_EDATE, H.B_SDATE, H.B_EDATE, H.C_SDATE, H.C_EDATE
FROM
(
    -- Todays snapshot
    SELECT Cust_ID,Status, EffectiveDate,EndDate
    FROM t 
    WHERE DATE '2021-05-25' BETWEEN EffectiveDate AND EndDate 
) T
LEFT OUTER JOIN
(
-- Static capture of all states
    SELECT Cust_ID, 
    MIN(CASE WHEN Status = 'A' THEN EffectiveDate ELSE NULL END) A_SDATE, 
    MAX(CASE WHEN Status = 'A' THEN LEAST(DATE '2021-07-10',EndDate) ELSE NULL END) A_EDATE,
    MIN(CASE WHEN Status = 'B' THEN EffectiveDate ELSE NULL END) B_SDATE, 
    MAX(CASE WHEN Status = 'B' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) B_EDATE,
    MIN(CASE WHEN Status = 'C' THEN EffectiveDate ELSE NULL END) C_SDATE, 
    MAX(CASE WHEN Status = 'C' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) C_EDATE

    FROM t 
    -- Exclude state changes after the process date
    WHERE EffectiveDate < DATE '2021-05-25'
    GROUP BY Cust_ID
) H
ON T.Cust_ID = H.Cust_ID

【讨论】:

MIN 和 MAX 应该考虑的值只有 我将更新 H 子表中的 WHERE。您是否希望 MAX 值显示实际在报告日期之后的报告日期? 报告日期时间是提供一个时间点的快照。因此,MAX 值不应大于报告日期,除非其开放式结束高日期 好的,我已经使用 LEAST 将其编辑到查询中 如果将其扩展到所有状态,这将忽略给定示例的状态 C,并且仍然提供大于报告日期的结束日期【参考方案2】:

首先,我真诚地感谢所有试图帮助我的人。我以某种复杂的逻辑设法完成了这个几乎不可能完成的任务(但它确实有效)。我尝试提供内联 cmets 来解释推导。特别提到@Wernfried Domscheit,他编写了 PIVOT 逻辑并删除了答案,这在很大程度上帮助了我。

WITH
    date_ranges
-- Generate dates 
    AS
        (SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
           FROM all_objects
          WHERE ROWNUM <= 6),
-- Mock up source records
    transactions (factid, cust_id,status,effectivedate,enddate)
    AS
        (SELECT 1,1,'A',
                TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 2,1,'B',
                TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 3,1,'C',
                TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 4,1,'A',
                TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 5,1,'D',
                TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
    dataset
-- Apply cross join to get report date into transactions
-- Could've been much better; time crunched
    AS
        (SELECT DISTINCT reportdate,cust_id,status     AS eodstatus,effectivedate,enddate
           FROM transactions CROSS JOIN date_ranges),
    dataset1
-- Ignore start and end dates if they are older than the reporting date
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  CASE
                      WHEN reportdate > effectivedate THEN effectivedate
                      ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
                  END    AS effectivedate,
                  CASE
                      WHEN reportdate > enddate THEN enddate
                      ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
                  END    AS enddate
             FROM dataset
            WHERE reportdate > effectivedate),
    dataset2
-- Grab the min of start and max of end for all reporting days
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  eodstatus               AS status,
                  MIN (effectivedate)     effectivedate,
                  MAX (enddate)           enddate
             FROM dataset1
         GROUP BY reportdate, cust_id, eodstatus),
    dataset_new
-- Apply PIVOT to capture the start and end date per known statues and replacing NULLs with high open end dates
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  COALESCE ('A','B','C','D')                           AS status,
                  NVL (a_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    a_sdate,
                  NVL (a_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    a_edate,
                  NVL (b_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    b_sdate,
                  NVL (b_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    b_edate,
                  NVL (c_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    c_sdate,
                  NVL (c_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    c_edate,
                  NVL (d_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    d_sdate,
                  NVL (d_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    d_edate
             FROM dataset2
                  PIVOT (MIN (effectivedate) AS "SDATE", MAX (enddate) AS "EDATE"
                        FOR status
                        IN ('A' AS "A", 'B' AS "B", 'C' AS "C", 'D' AS "D"))
         ORDER BY reportdate),
    date_manipulations
-- Merging multiple entries into one record a day
    AS
        (  SELECT reportdate,
                  cust_id,
                  MIN (a_sdate)     a_sdate,
                  MIN (a_edate)     a_edate,
                  MIN (b_sdate)     b_sdate,
                  MIN (b_edate)     b_edate,
                  MIN (c_sdate)     c_sdate,
                  MIN (c_edate)     c_edate,
                  MIN (d_sdate)     d_sdate,
                  MIN (d_edate)     d_edate
             FROM dataset_new
         GROUP BY reportdate, cust_id
         ORDER BY 1)
-- JOIN with transaction to report the original status 
SELECT a.*, b.status
  FROM date_manipulations a JOIN transactions b ON reportdate BETWEEN effectivedate AND enddate;
REPORTDATE CUST_ID A_SDATE A_EDATE B_SDATE B_EDATE C_SDATE C_EDATE D_SDATE D_EDATE STATUS
20/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "A"
21/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
22/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
23/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
24/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000 "D"
25/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000 "D"

【讨论】:

以上是关于如何从事务事实表生成时间点快照表?的主要内容,如果未能解决你的问题,请参考以下文章

阿里笔记之数据模型

阿里笔记之数据模型

数据仓库之三大事实表

数据仓库建模-维度建模

事实表设计

离线数仓搭建_12_DWD层业务数据创建