如何使用窗口函数向 mySQL 查询添加新的计算列?

Posted

技术标签:

【中文标题】如何使用窗口函数向 mySQL 查询添加新的计算列?【英文标题】:How can I add a new calculated column using a window function to my SQL query? 【发布时间】:2020-12-16 19:35:54 【问题描述】:

我的数据如下所示:


Trader Name      | Currency_Code | Counterparty | Traded_Amount | Total_Traded_Volume | Baseline_Avg | Variance
Jules Winnfield  | GBP           |  GOLD        | 10000         | 30000               | 10000        | 0
Jules Winnfield  | GBP           |  BARC        | 8000          | 30000               | 11000        | -3000
Jules Winnfield  | GBP           |  JPMORG      | 12000         | 30000               | 9000         | +3000
Jules Winnfield  | EUR           |  GOLD        | 15000         | 27000               | 6000         | 21000
Jules Winnfield  | EUR           |  BARC        | 2000          | 27000               | 12500        | -10500
Jules Winnfield  | EUR           |  JPMORG      | 10000         | 27000               | 8500         | +1500

让我花一点时间简要解释一下这个数据集:

    交易者在三个交易对手(例如在本例中为高盛、巴克莱和摩根大通)进行了总价值 30000 英镑的交易。 单个金额,即 £10000、£8000 和 £12000 是对单个交易本身执行的简单 sum() 聚合,其中 £30000 通过使用 OVER (PARTITION BY TRADER_NAME, CURRENCY_CODE) 的另一个聚合获得 baseline_average 计算与所有其他交易对手的平均交易量 - 例如Jules 与巴克莱的交易额为 8000 英镑,与其他交易对手(高盛和摩根大通)的平均交易量为 11000 英镑。方差是 traded_amount 和 baseline_average 之间的差异。

用于生成上述输出的代码是:

SELECT 

     OT.TRADER_NAME, 
     OT.CURRENCY_CODE, 
     OT.COUNTERPARTY, 
     SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
     SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
     (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)- 
     SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0) 
     AS BASELINE_AVG,
     SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, 
     OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, 
     OT.CURRENCY_CODE)-1),0) AS VARIANCE

FROM ORDERS_TRADES_DATA OT
GROUP BY OT.TRADER_NAME, OT.CURRENCY_CODE, OT.COUNTERPARTY, FX.FX_RATE

到目前为止一切顺利。只要我指定我感兴趣的货币,这使我能够对数据进行切片。但是,我现在想添加一个列,将交易者的整个交易量汇总为等值美元 - 本质上,每个用户的一个 traded_volume 以美元为一个窗口函数——我可以用它来分析。我将外汇汇率存储在单独的表中,并且可以应用联接。已尝试运行以下查询:

SELECT 

     OT.TRADER_NAME, 
     OT.CURRENCY_CODE, 
     OT.COUNTERPARTY, 
     SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
     SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
     (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)- 
     SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0) 
     AS BASELINE_AVG,
     SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, 
     OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, 
     OT.CURRENCY_CODE)-1),0) AS VARIANCE,
     SUM(OT.TRADED_AMOUNT)/FX.FX_RATE AS TRADED_AMOUNT_USD,
     SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE) AS TOTAL_TRADED_VOL_USD,
     (SUM(OT.TRADED_AMOUNT)/FX.FX_RATE OVER (PARTITION BY OT.TRADER_NAME)- 
     SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME)-1),0) 
     AS BASELINE_AVG_USD,
     SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE) - (SUM(OT.TRADED_AMOUNT)/FX.FX_RATE OVER (PARTITION BY 
     OT.TRADER_NAME)-SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)/NULLIF(SUM(1) OVER (PARTITION BY 
     OT.TRADER_NAME)-1),0) AS VARIANCE_USD

FROM ORDERS_TRADES_DATA OT
LEFT JOIN FX_RATES_TABLE FX ON OT.CURRENCY_CODE = FX.ASSET_CURRENCY_CODE
GROUP BY OT.TRADER_NAME, OT.CURRENCY_CODE, OT.COUNTERPARTY, FX.FX_RATE
     

...当我收到错误时不起作用:

无法对包含聚合或子查询的表达式执行聚合函数。

我如何在这里实现我的目标?

【问题讨论】:

【参考方案1】:

即时错误是由于分层聚合SUM 调用:SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)。但是由于SELECT 包含GROUP BY 中未引​​用的非聚合列,聚合查询中缺少GROUP BY 子句会引发另一个错误。

但是,请避免使用任何SUM() OVER(...) 窗口函数,并加入多个聚合级别(trade/currency 级别和trade/currency/counterparty 级别)。然后在没有聚合的外部查询中运行所需的计算。请注意:除以零是未定义的。

WITH trader_curr_agg AS (
     SELECT   OT.TRADER_NAME
            , OT.CURRENCY_CODE
            , SUM(OT.TRADED_AMOUNT) AS TOTAL_TRADED_VOL
            , COUNT(*) AS TRADE_COUNTS
     FROM ORDERS_TRADES_DATA OT
     GROUP BY   OT.TRADER_NAME
              , OT.CURRENCY_CODE
),  
    trader_counterparty_agg AS (
     SELECT   OT.TRADER_NAME
            , OT.CURRENCY_CODE
            , OT.COUNTERPARTY
            , SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT
     FROM ORDERS_TRADES_DATA OT
     GROUP BY   OT.TRADER_NAME
              , OT.CURRENCY_CODE
              , OT.COUNTERPARTY
)

SELECT
         tcntr.TRADER_NAME
       , tcntr.CURRENCY_CODE
       , tcntr.COUNTERPARTY

       , tcntr.TRADED_AMOUNT
       , tcurr.TOTAL_TRADED_VOL
       , (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT)
                  / NULLIF(tcurr.TRADE_COUNTS-1, 0) AS BASELINE_AVG
       , (tcntr.TRADED_AMOUNT - (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT)) 
                  / NULLIF(tcurr.TRADE_COUNTS-1, 0) AS VARIANCE

       , tcntr.TRADED_AMOUNT / FX.FX_RATE AS TRADED_AMOUNT_USD
       , tcurr.TOTAL_TRADED_VOL / FX.FX_RATE AS TOTAL_TRADED_VOL_USD
       , ((tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT) 
                  / NULLIF(tcurr.TRADE_COUNTS-1, 0)) / FX.FX_RATE AS BASELINE_AVG_USD
       , ((tcntr.TRADED_AMOUNT - (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT)) 
                  / NULLIF(tcurr.TRADE_COUNTS-1, 0)) / FX.FX_RATE AS VARIANCE_USD
 
FROM trader_counterparty_agg tcntr
INNER JOIN trader_currency_agg tcurr
    ON tcntr.TRADER_NAME = tcurr.TRADER_NAME
    AND tcntr.CURRENCY_CODE = tcurr.CURRENCY_CODE
LEFT JOIN FX_RATES_TABLE FX 
    ON tcntr.CURRENCY_CODE = FX.ASSET_CURRENCY_CODE

【讨论】:

非常感谢,我会试一试 - 很抱歉我错过了编写完整的代码;我实际上使用了 Group By - 第一个查询按我提到的那样工作,但第二个查询(我正在尝试美元转换)没有......让我试试你的解决方案。 明白。仔细阅读您的错误实际上是由于在您的一个计算中分层SUMSUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)。但是缺少GROUP BY 也是一个问题。是的,请考虑这个解决方案。您避免了许多SUM() OVER() 调用以提高可读性,甚至计算一次总和以提高效率。如果我未经测试的翻译产生问题,请根据需要调整公式。真正的区别是_USD 列除以FX_RATE 对延迟的响应表示歉意,在这方面花了很多时间 - 我能够成功地调整您的解决方案,也花了一些时间将我自己的调整添加到公式中。关于可读性,您是绝对正确的,但考虑到所需的计算和报告的数量,这很可能是不可避免的!【参考方案2】:

你可以这样写查询:

SELECT
     A.TRADER_NAME, 
     A.CURRENCY_CODE, 
     A.COUNTERPARTY, 
     A.TRADED_AMOUNT,
     A.TOTAL_TRADED_VOL,
     A.BASELINE_AVG,
     A.VARIANCE,              
     A.TRADED_AMOUNT/FX.FX_RATE AS TRADED_AMOUNT_USD,
     A.TOTAL_TRADED_VOL/FX.FX_RATE AS TOTAL_TRADED_VOL_USD,
     A.BASELINE_AVG/FX.FX_RATE AS BASELINE_AVG_USD,
     A.VARIANCE/FX.FX_RATE AS VARIANCE_USD
     
FROM   
    (SELECT 
         OT.TRADER_NAME, 
         OT.CURRENCY_CODE, 
         OT.COUNTERPARTY, 
         SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
         SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
         (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)- 
         SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0) 
         AS BASELINE_AVG,
         SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, 
         OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, 
         OT.CURRENCY_CODE)-1),0) AS VARIANCE

    FROM ORDERS_TRADES_DATA O) A
LEFT JOIN FX_RATES_TABLE FX ON FX.ASSET_CURRENCY_CODE = A.CURRENCY_CODE

【讨论】:

非常感谢您提出的解决方案;我现在实际上已经使用了基于 cte 的解决方案。非常感谢。

以上是关于如何使用窗口函数向 mySQL 查询添加新的计算列?的主要内容,如果未能解决你的问题,请参考以下文章

如何在 MySQL 中为每个类别创建一个 SQL 窗口函数列?

计算并向R中的数据框添加新变量

如何使用 phpmyadmin 向 mysql 数据库中的列添加自动增量?

通过 VBA 函数在查询中添加新的计算字段

我可以向这个 UNION 添加一个新的自动增量列吗?

无论如何,他们是不是要向使用 group by 返回特定列的最新行的查询添加连接