递归 SQL 中的聚合函数

Posted

技术标签:

【中文标题】递归 SQL 中的聚合函数【英文标题】:aggregate function in recursive SQL 【发布时间】:2016-02-24 19:05:54 【问题描述】:

这个问题是this question的扩展和简化版本。

我一直在尝试用 SQL 求解以下迭代方程:

U^F,D_t,p = (\sum_D U^F,D_t-1,p + C_t-1,p )*R^F,D_t-1,p

导致:

我能想到的最接近的类比是,U^F,D_t,p 是品牌 F 的一些汽车,具有某种颜色 (D),汽车经销商 (p) 在时间 @987654338 有售@。所以上面的等式基本上是说:取前一天的汽车单位t-1(即U^F,D_t-1,p),对颜色求和(\sum_D),然后加上前一天的C值(C_t-1,p,不管是什么),然后乘以前一天的其他数字 RR^F,D_t-1,p,不管是什么)。

简化问题

我已经设法解决了上述方程的简化形式,即:

即,没有汽车颜色的总和 (D)。示例数据和 SQL 查询是in the fiddle that I link,但我也将其粘贴在这里以供参考:

完整数据:

CREATE TABLE DYNAMICS ( T DATE, T_M1 DATE, P INTEGER, F VARCHAR(255), DELTA_F VARCHAR(255), R_T_M1 NUMBER, C_T_M1 NUMBER, U_T_M1 NUMBER, R_T NUMBER, C_T NUMBER, U_T NUMBER );  

-- DAY 1, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,100.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,50.0 );  
-- DAY 1, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,10.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,5.0 );  
-- DAY 2, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.7, 0.8, 100, 0.9,0.9, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.7, 0.8, 50, 0.6,0.5, NULL );  
-- DAY 2, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, 10, 0.7,0.8, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.7, 0.8, 5, 0.3,0.3, NULL );  
-- DAY 3, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.9, 0.9, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.6, 0.5, NULL, 1.7,1.8, NULL );  
-- DAY 3, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.3, 0.3, NULL, 0.8,0.9, NULL );  

样本数据:

以下演示了汽车经销商p=1、汽车型号F=BMW 的颜色为D=RED 的示例数据(数学方程式中的D 在SQL 中称为DELTA)。初始条件 (t=0) 在这里是 2015-01-01。对于t 的所有日子,t (R_T, C_T) 和t-1 (R_T_M1, C_T_M1) 处的所有参数均已给出。了解它们后,任务就是计算所有日子里的汽车单位t > t=0

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 | (null) | 0.2 | 0.3 | (null) |

查询:

为了解决简化的问题,我也贴在这里供参考的I have come up with the query in the linked fiddle:

-- 
-- SQL
-- T -> t 
-- T_M1 -> t-1 
-- 
WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_t-1 does not exist, and U_t=0 is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T, 
            -- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
            (RECU.U_T+NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

对于上面粘贴的示例数据,此查询会导致:

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 |  70.56 |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 |  70.56 | 0.2 | 0.3 | 64.314 |

效果很好,即:2015-01-02、U_t = (100+0.8)*0.7 = 70.56、2015-01-03、U_t = (70.56+0.9)*0.9 = 64.314

查询的编写方式使其适用于不同的汽车经销商和不同的汽车品牌,可以通过运行the query in the linked fiddle 进行检查

回到完整的问题

上面的查询不能正确处理原始方程中汽车颜色的总和:

这与简化数据无关,因为所有汽车(BMW 和 MERCEDES)都只出现在 RED 中,因此颜色的总和实际上消失了。

这样的完整逻辑可能应该通过上面原始查询中内置的GROUP BY/SUM 表达式来实现。 很遗憾,我不知道该怎么做。

所以,假设你有像简化问题部分那样形状的数据,但现在每个汽车品牌都存在两种颜色,e.g., like in this linked fiddle:

|                         T |                       T_M1 | P |        F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|----------|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |   BLACK |    0.2 |    0.6 | (null) | 0.5 | 0.8 |    5.5 |
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |      5 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |   BLACK |    0.5 |    0.8 |    5.5 | 1.3 | 0.5 | (null) |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |     RED |    0.7 |    0.8 |      5 | 4.3 | 0.5 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |   BLACK |    1.3 |    0.5 | (null) | 0.3 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |     RED |    4.3 |    0.5 | (null) | 0.4 | 0.9 | (null) |

鉴于这些数据,您会期望经销商p=2 F=MERCEDES 汽车动态如下所示:

U^MERCEDES,BLACK_T=2015-01-02,P=2 = ( (5.5 + 5) + 0.8 )*0.5 = 11.3*0.5 = 5.65 
U^MERCEDES,RED_T=2015-01-02,P=2 = ( (5.5 + 5) + 0.8 )*0.7 = 11.3*0.7 = 7.91

U^MERCEDES,BLACK_T=2015-01-03,P=2 = ( (5.65 + 7.91) + 0.5 )*1.3 = 14.06*1.3 = 18.278
U^MERCEDES,RED_T=2015-01-03,P=2 = ( (5.65 + 7.91) + 0.5 )*4.3 = 14.06*4.3 = 60.458

问题是如何调整上面的简化查询来解决这个问题。

【问题讨论】:

由于上面重复的 whatever that,请考虑在 math.stack.exchange 上发布此公式和数据问题,以便在此处为 SQL 人员转换为关系表上下文。 【参考方案1】:

结果证明解决方案比我想象的要容易(尽管我花了一天时间尝试了各种各样的事情,现在一切似乎都是微不足道的)。

在原始小提琴数据上的查询工作(测试)如下:

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_t-1 does not exist, and U_t=0 is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T,
            -- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
            ( (( SUM(RECU.U_T) OVER (PARTITION BY NEW.T, NEW.T_M1, NEW.P, NEW.F) ) + NEW.C_T_M1)*NEW.R_T_M1 ) AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM RECU 
ORDER BY P, F, T, DELTA_F;

这是对原始查询的最小更改(仅影响原始查询的一行),并且使用了 ORACLE 解析函数。

【讨论】:

【参考方案2】:

我认为这不是最好的答案,但我认为它可以为您提供您正在寻找的结果。

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.

    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
-- Start SUM of u_t
              (select sum(u_t) from DYNAMICS d2
               where d2.T=d1.T and d2.T_M1=d1.T_M1 and d2.P=d1.P and d2.F=d1.F
               group by T, T_M1, P, F) as u_t
-- End SUM of u_t   
    FROM DYNAMICS d1
        -- Initial condition: U_t-1 does not exist, and U_t=0 is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.

    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T
              , 
            -- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
            (
              RECU.U_T
              +NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

我添加的内容介于Start SUM of u_tEnd SUM of u_t cmets 之间,这里是fiddle。

【讨论】:

以上是关于递归 SQL 中的聚合函数的主要内容,如果未能解决你的问题,请参考以下文章

sql 2005 聚合函数

SQL Server中的聚合函数都有哪些?

在SQL中的聚合函数

SQL Server中的聚合函数都有哪些?

关于Sql中的聚合函数的问题

sql 聚合函数都有哪些