使用递归 CTE 计算预测平均值

Posted

技术标签:

【中文标题】使用递归 CTE 计算预测平均值【英文标题】:Calculate forecast average using recursive CTE 【发布时间】:2016-02-02 09:23:05 【问题描述】:

我试图回答一个问题 here,我需要根据前 3 个月的实际销售或预测来计算销售预测。

Month   Actuals Forecast  
1       10    
2       15    
3       17    
4                14.00 
5                15.33  
6                15.44 
7                14.93

Month 4 = (10+15+17)/3 
Month 5 = (15+17+14)/3 
Month 6 = (17+14+15.33)/3
Month 7 = (14+15.33+15.44)/3

我一直在尝试使用递归 CTE:

;WITH cte([month],forecast) AS (
    SELECT 1,CAST(10 AS DECIMAL(28,2))
    UNION ALL
    SELECT 2,CAST(15 AS DECIMAL(28,2))
    UNION ALL 
    SELECT 3,CAST(17 AS DECIMAL(28,2))
    UNION ALL
    SELECT
        [month]=[month]+1,
        forecast=CAST(AVG(forecast) OVER (ORDER BY [month] ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS DECIMAL(28,2))
    FROM
        cte
    WHERE
        [month]<=12
)
SELECT * FROM cte WHERE month<=12;

小提琴:http://sqlfiddle.com/#!6/9ac4a/3

但它并没有按预期工作,因为它返回以下结果:

| month | forecast |
|-------|----------|
|     1 |       10 |
|     2 |       15 |
|     3 |       17 |
|     4 |   (null) |
|     5 |   (null) |
|     6 |   (null) |
|     7 |   (null) |
|     8 |   (null) |
|     9 |   (null) |
|    10 |   (null) |
|    11 |   (null) |
|    12 |   (null) |
|     3 |   (null) |
|     4 |   (null) |
|     5 |   (null) |
|     6 |   (null) |
|     7 |   (null) |
|     8 |   (null) |
|     9 |   (null) |
|    10 |   (null) |
|    11 |   (null) |
|    12 |   (null) |
|     2 |   (null) |
|     3 |   (null) |
|     4 |   (null) |
|     5 |   (null) |
|     6 |   (null) |
|     7 |   (null) |
|     8 |   (null) |
|     9 |   (null) |
|    10 |   (null) |
|    11 |   (null) |
|    12 |   (null) |

预期输出:

| month | forecast |
|-------|----------|
|     1 |       10 |
|     2 |       15 |
|     3 |       17 |
|     4 |    14.00 |
|     5 |    15.33 |
|     6 |    15.44 |
|     7 |    14.93 |
|     8 |    15.23 |
|     9 |    15.20 |
|    10 |    15.12 |
|    11 |    15.18 |
|    12 |    15.17 |

谁能告诉我这个查询有什么问题?

【问题讨论】:

你能提供预期的输出吗 预期输出是我问题中的第一个表:月份和预测。所以基本上我只有前 3 个月的值(实际值),对于几个月>3 我需要预测该值,作为最后 3 个值的平均值。 添加了预期结果,由@TT 提供 【参考方案1】:

我提出这样的建议:

WITH T AS
(
    SELECT 1 AS [month], CAST(10 AS DECIMAL(28,2)) AS [forecast], CAST(-5 AS DECIMAL(28,2)) AS three_months_ago_forecast, CAST(9 AS decimal(28,2)) AS two_months_ago_forecast, CAST(26 AS decimal(28,2)) as one_month_ago_forecast
    UNION ALL
    SELECT 2,CAST(15 AS DECIMAL(28,2)), CAST(9 AS decimal(28,2)), CAST(26 AS decimal(28,2)), CAST(10 AS DECIMAL(28,2))
    UNION ALL 
    SELECT 3,CAST(17 AS DECIMAL(28,2)), CAST(26 AS decimal(28,2)), CAST(10 AS DECIMAL(28,2)), CAST(15 AS DECIMAL(28,2))
),
LT AS -- LastForecast
(
    SELECT *
    FROM T
    WHERE [month] = 3
),
FF AS -- Future Forecast
(
    SELECT *
    FROM LT

    UNION ALL

    SELECT 
        FF.[month] + 1 AS [month], 
        CAST( (FF.forecast * 4 - FF.three_months_ago_forecast) / 3 AS decimal(28,2)) AS forecast,
        FF.two_months_ago_forecast as three_months_ago_forecast,
        FF.one_month_ago_forecast as two_months_ago_forecast,
        FF.forecast as one_month_ago_forecast
    FROM FF
    WHERE
        FF.[month] < 12

)
SELECT * FROM T
WHERE [month] < 3
UNION ALL
SELECT * FROM FF

输出:

+-------+----------+---------------------------+-------------------------+------------------------+
| month | forecast | three_months_ago_forecast | two_months_ago_forecast | one_month_ago_forecast |
+-------+----------+---------------------------+-------------------------+------------------------+
|     1 | 10.00    | -5.00                     | 9.00                    | 26.00                  |
|     2 | 15.00    | 9.00                      | 26.00                   | 10.00                  |
|     3 | 17.00    | 26.00                     | 10.00                   | 15.00                  |
|     4 | 14.00    | 10.00                     | 15.00                   | 17.00                  |
|     5 | 15.33    | 15.00                     | 17.00                   | 14.00                  |
|     6 | 15.44    | 17.00                     | 14.00                   | 15.33                  |
|     7 | 14.92    | 14.00                     | 15.33                   | 15.44                  |
|     8 | 15.23    | 15.33                     | 15.44                   | 14.92                  |
|     9 | 15.20    | 15.44                     | 14.92                   | 15.23                  |
|    10 | 15.12    | 14.92                     | 15.23                   | 15.20                  |
|    11 | 15.19    | 15.23                     | 15.20                   | 15.12                  |
|    12 | 15.18    | 15.20                     | 15.12                   | 15.19                  |
+-------+----------+---------------------------+-------------------------+------------------------+

【讨论】:

【参考方案2】:

试试这个

WITH cte
     AS (SELECT *
         FROM   (VALUES (1,10,NULL),
                        (2,15,NULL),
                        (3,17,NULL),
                        (4,NULL,14.00),
                        (5,NULL,15.33),
                        (6,NULL,15.44),
                        (7,NULL,14.93)) tc (month, act, fore))
SELECT mon,avg(res) 
FROM   cte a
       CROSS apply (SELECT TOP 3 ( COALESCE(a.act, a.fore) ) AS res,
                                 b.month                     AS mon
                    FROM   cte b
                    WHERE  a.month < b.month
                    ORDER  BY a.month DESC) cs
GROUP  BY mon
ORDER  BY mon 

或者在Sql Server 2012+使用这个

SELECT
    [month]=[month]+1,
    forecast=CAST(AVG(COALESCE(act,fore)) OVER (ORDER BY [month] ROWS BETWEEN 3 PRECEDING  AND CURRENT row  ) AS DECIMAL(28,2))
FROM
    cte

【讨论】:

谢谢@VR46,你能告诉我我的查询有什么问题吗?因为这实际上是我的问题。 Msg 8120,Level 16,State 1,Line 1 列 'cte.month' 在选择列表中无效,因为它既不包含在聚合函数或 GROUP BY 子句中。 > @JesúsLópez - 立即查看 现在没有报错,但是查询没有输出预期的结果 第一个版本需要一个多月的 TALLY 表(仅打印出 7 个月)。第二个版本有错误,应该是BETWEEN 2 PRECEDING AND CURRENT ROW(并打印出8个月)

以上是关于使用递归 CTE 计算预测平均值的主要内容,如果未能解决你的问题,请参考以下文章

如何在R中随时间递归计算平均值

使用 .NET for Spark 对 DataFrame 进行递归计算

当某些gt在预测中没有对应时,如何计算ground-truth和预测地标之间的NME(归一化平均误差)?

使用连续预测变量的多个值计算 emmeans

如何在 Python 的滚动平均值计算中忽略 NaN

炒股必看的时序预测基本方法 —— 移动平均(SMAEMAWMA)