递归 SQL 中的聚合函数
Posted
技术标签:
【中文标题】递归 SQL 中的聚合函数【英文标题】:aggregate function in recursive SQL 【发布时间】:2016-02-24 19:05:54 【问题描述】:这个问题是this question的扩展和简化版本。
我一直在尝试用 SQL 求解以下迭代方程:
U^F,D_t,p = (\sum_D U^F,D_t-1,p + C_t-1,p )*R^F,D_t-1,p
导致:
我能想到的最接近的类比是,U^F,D_t,p
是品牌 F
的一些汽车,具有某种颜色 (D
),汽车经销商 (p
) 在时间 @987654338 有售@。所以上面的等式基本上是说:取前一天的汽车单位t-1
(即U^F,D_t-1,p
),对颜色求和(\sum_D
),然后加上前一天的C
值(C_t-1,p
,不管是什么),然后乘以前一天的其他数字 R
(R^F,D_t-1,p
,不管是什么)。
简化问题
我已经设法解决了上述方程的简化形式,即:
即,没有汽车颜色的总和 (D
)。示例数据和 SQL 查询是in the fiddle that I link,但我也将其粘贴在这里以供参考:
完整数据:
CREATE TABLE DYNAMICS ( T DATE, T_M1 DATE, P INTEGER, F VARCHAR(255), DELTA_F VARCHAR(255), R_T_M1 NUMBER, C_T_M1 NUMBER, U_T_M1 NUMBER, R_T NUMBER, C_T NUMBER, U_T NUMBER );
-- DAY 1, P_1
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,100.0 );
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,50.0 );
-- DAY 1, P_2
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,10.0 );
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,5.0 );
-- DAY 2, P_1
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.7, 0.8, 100, 0.9,0.9, NULL );
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.7, 0.8, 50, 0.6,0.5, NULL );
-- DAY 2, P_2
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, 10, 0.7,0.8, NULL );
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.7, 0.8, 5, 0.3,0.3, NULL );
-- DAY 3, P_1
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.9, 0.9, NULL, 0.2,0.3, NULL );
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.6, 0.5, NULL, 1.7,1.8, NULL );
-- DAY 3, P_2
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, NULL, 0.2,0.3, NULL );
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.3, 0.3, NULL, 0.8,0.9, NULL );
样本数据:
以下演示了汽车经销商p=1
、汽车型号F=BMW
的颜色为D=RED
的示例数据(数学方程式中的D
在SQL 中称为DELTA
)。初始条件 (t=0
) 在这里是 2015-01-01。对于t
的所有日子,t
(R_T, C_T
) 和t-1
(R_T_M1, C_T_M1
) 处的所有参数均已给出。了解它们后,任务就是计算所有日子里的汽车单位t > t=0
。
| T | T_M1 | P | F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T | U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW | RED | 0.5 | 0.6 | (null) | 0.7 | 0.8 | 100 |
| January, 02 2015 00:00:00 | January, 01 2015 00:00:00 | 1 | BMW | RED | 0.7 | 0.8 | 100 | 0.9 | 0.9 | (null) |
| January, 03 2015 00:00:00 | January, 02 2015 00:00:00 | 1 | BMW | RED | 0.9 | 0.9 | (null) | 0.2 | 0.3 | (null) |
查询:
为了解决简化的问题,我也贴在这里供参考的I have come up with the query in the linked fiddle:
--
-- SQL
-- T -> t
-- T_M1 -> t-1
--
WITH RECU( T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1, U_T_M1,
R_T, C_T, U_T ) AS (
-- Anchor member.
SELECT T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1,
U_T_M1,
R_T, C_T,
U_T
FROM DYNAMICS
-- Initial condition: U_t-1 does not exist, and U_t=0 is given
WHERE ( U_T_M1 IS NULL AND U_T IS NOT NULL )
UNION ALL
-- Recursive member.
SELECT NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,
NEW.R_T_M1, NEW.C_T_M1,
RECU.U_T AS U_T_M1,
NEW.R_T, NEW.C_T,
-- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
(RECU.U_T+NEW.C_T_M1)*NEW.R_T_M1 AS U_T
FROM DYNAMICS NEW
INNER JOIN RECU
ON
-- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
NEW.T_M1 = RECU.T AND
NEW.P = RECU.P AND
NEW.F = RECU.F AND
NEW.DELTA_F = RECU.DELTA_F
)
SELECT * FROM RECU ORDER BY P, F, T;
对于上面粘贴的示例数据,此查询会导致:
| T | T_M1 | P | F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T | U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW | RED | 0.5 | 0.6 | (null) | 0.7 | 0.8 | 100 |
| January, 02 2015 00:00:00 | January, 01 2015 00:00:00 | 1 | BMW | RED | 0.7 | 0.8 | 100 | 0.9 | 0.9 | 70.56 |
| January, 03 2015 00:00:00 | January, 02 2015 00:00:00 | 1 | BMW | RED | 0.9 | 0.9 | 70.56 | 0.2 | 0.3 | 64.314 |
效果很好,即:2015-01-02、U_t = (100+0.8)*0.7 = 70.56
、2015-01-03、U_t = (70.56+0.9)*0.9 = 64.314
。
查询的编写方式使其适用于不同的汽车经销商和不同的汽车品牌,可以通过运行the query in the linked fiddle 进行检查
回到完整的问题
上面的查询不能正确处理原始方程中汽车颜色的总和:
这与简化数据无关,因为所有汽车(BMW 和 MERCEDES)都只出现在 RED 中,因此颜色的总和实际上消失了。
这样的完整逻辑可能应该通过上面原始查询中内置的GROUP BY/SUM
表达式来实现。 很遗憾,我不知道该怎么做。
所以,假设你有像简化问题部分那样形状的数据,但现在每个汽车品牌都存在两种颜色,e.g., like in this linked fiddle:
| T | T_M1 | P | F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T | U_T |
|---------------------------|----------------------------|---|----------|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES | BLACK | 0.2 | 0.6 | (null) | 0.5 | 0.8 | 5.5 |
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES | RED | 0.5 | 0.6 | (null) | 0.7 | 0.8 | 5 |
| January, 02 2015 00:00:00 | January, 01 2015 00:00:00 | 2 | MERCEDES | BLACK | 0.5 | 0.8 | 5.5 | 1.3 | 0.5 | (null) |
| January, 02 2015 00:00:00 | January, 01 2015 00:00:00 | 2 | MERCEDES | RED | 0.7 | 0.8 | 5 | 4.3 | 0.5 | (null) |
| January, 03 2015 00:00:00 | January, 02 2015 00:00:00 | 2 | MERCEDES | BLACK | 1.3 | 0.5 | (null) | 0.3 | 0.9 | (null) |
| January, 03 2015 00:00:00 | January, 02 2015 00:00:00 | 2 | MERCEDES | RED | 4.3 | 0.5 | (null) | 0.4 | 0.9 | (null) |
鉴于这些数据,您会期望经销商p=2
F=MERCEDES
汽车动态如下所示:
U^MERCEDES,BLACK_T=2015-01-02,P=2 = ( (5.5 + 5) + 0.8 )*0.5 = 11.3*0.5 = 5.65
U^MERCEDES,RED_T=2015-01-02,P=2 = ( (5.5 + 5) + 0.8 )*0.7 = 11.3*0.7 = 7.91
U^MERCEDES,BLACK_T=2015-01-03,P=2 = ( (5.65 + 7.91) + 0.5 )*1.3 = 14.06*1.3 = 18.278
U^MERCEDES,RED_T=2015-01-03,P=2 = ( (5.65 + 7.91) + 0.5 )*4.3 = 14.06*4.3 = 60.458
问题是如何调整上面的简化查询来解决这个问题。
【问题讨论】:
由于上面重复的 whatever that,请考虑在 math.stack.exchange 上发布此公式和数据问题,以便在此处为 SQL 人员转换为关系表上下文。 【参考方案1】:结果证明解决方案比我想象的要容易(尽管我花了一天时间尝试了各种各样的事情,现在一切似乎都是微不足道的)。
在原始小提琴数据上的查询工作(测试)如下:
WITH RECU( T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1, U_T_M1,
R_T, C_T, U_T ) AS (
-- Anchor member.
SELECT T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1,
U_T_M1,
R_T, C_T,
U_T
FROM DYNAMICS
-- Initial condition: U_t-1 does not exist, and U_t=0 is given
WHERE ( U_T_M1 IS NULL AND U_T IS NOT NULL )
UNION ALL
-- Recursive member.
SELECT NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,
NEW.R_T_M1, NEW.C_T_M1,
RECU.U_T AS U_T_M1,
NEW.R_T, NEW.C_T,
-- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
( (( SUM(RECU.U_T) OVER (PARTITION BY NEW.T, NEW.T_M1, NEW.P, NEW.F) ) + NEW.C_T_M1)*NEW.R_T_M1 ) AS U_T
FROM DYNAMICS NEW
INNER JOIN RECU
ON
-- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
NEW.T_M1 = RECU.T AND
NEW.P = RECU.P AND
NEW.F = RECU.F AND
NEW.DELTA_F = RECU.DELTA_F
)
SELECT * FROM RECU
ORDER BY P, F, T, DELTA_F;
这是对原始查询的最小更改(仅影响原始查询的一行),并且使用了 ORACLE 解析函数。
【讨论】:
【参考方案2】:我认为这不是最好的答案,但我认为它可以为您提供您正在寻找的结果。
WITH RECU( T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1, U_T_M1,
R_T, C_T, U_T ) AS (
-- Anchor member.
SELECT T, T_M1, P, F, DELTA_F,
R_T_M1, C_T_M1,
U_T_M1,
R_T, C_T,
-- Start SUM of u_t
(select sum(u_t) from DYNAMICS d2
where d2.T=d1.T and d2.T_M1=d1.T_M1 and d2.P=d1.P and d2.F=d1.F
group by T, T_M1, P, F) as u_t
-- End SUM of u_t
FROM DYNAMICS d1
-- Initial condition: U_t-1 does not exist, and U_t=0 is given
WHERE ( U_T_M1 IS NULL AND U_T IS NOT NULL )
UNION ALL
-- Recursive member.
SELECT NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,
NEW.R_T_M1, NEW.C_T_M1,
RECU.U_T AS U_T_M1,
NEW.R_T, NEW.C_T
,
-- Here the magic happens, i.e., (U_t-1 + C_t-1)*R_t-1 = U_t
(
RECU.U_T
+NEW.C_T_M1)*NEW.R_T_M1 AS U_T
FROM DYNAMICS NEW
INNER JOIN RECU
ON
-- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
NEW.T_M1 = RECU.T AND
NEW.P = RECU.P AND
NEW.F = RECU.F AND
NEW.DELTA_F = RECU.DELTA_F
)
SELECT * FROM RECU ORDER BY P, F, T;
我添加的内容介于Start SUM of u_t
和End SUM of u_t
cmets 之间,这里是fiddle。
【讨论】:
以上是关于递归 SQL 中的聚合函数的主要内容,如果未能解决你的问题,请参考以下文章