如何使用窗口函数向 mySQL 查询添加新的计算列?
Posted
技术标签:
【中文标题】如何使用窗口函数向 mySQL 查询添加新的计算列?【英文标题】:How can I add a new calculated column using a window function to my SQL query? 【发布时间】:2020-12-16 19:35:54 【问题描述】:我的数据如下所示:
Trader Name | Currency_Code | Counterparty | Traded_Amount | Total_Traded_Volume | Baseline_Avg | Variance
Jules Winnfield | GBP | GOLD | 10000 | 30000 | 10000 | 0
Jules Winnfield | GBP | BARC | 8000 | 30000 | 11000 | -3000
Jules Winnfield | GBP | JPMORG | 12000 | 30000 | 9000 | +3000
Jules Winnfield | EUR | GOLD | 15000 | 27000 | 6000 | 21000
Jules Winnfield | EUR | BARC | 2000 | 27000 | 12500 | -10500
Jules Winnfield | EUR | JPMORG | 10000 | 27000 | 8500 | +1500
让我花一点时间简要解释一下这个数据集:
-
交易者在三个交易对手(例如在本例中为高盛、巴克莱和摩根大通)进行了总价值 30000 英镑的交易。
单个金额,即 £10000、£8000 和 £12000 是对单个交易本身执行的简单
sum()
聚合,其中 £30000 通过使用 OVER (PARTITION BY TRADER_NAME, CURRENCY_CODE)
的另一个聚合获得
baseline_average 计算与所有其他交易对手的平均交易量 - 例如Jules 与巴克莱的交易额为 8000 英镑,与其他交易对手(高盛和摩根大通)的平均交易量为 11000 英镑。方差是 traded_amount 和 baseline_average 之间的差异。
用于生成上述输出的代码是:
SELECT
OT.TRADER_NAME,
OT.CURRENCY_CODE,
OT.COUNTERPARTY,
SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
(SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-
SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0)
AS BASELINE_AVG,
SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-1),0) AS VARIANCE
FROM ORDERS_TRADES_DATA OT
GROUP BY OT.TRADER_NAME, OT.CURRENCY_CODE, OT.COUNTERPARTY, FX.FX_RATE
到目前为止一切顺利。只要我指定我感兴趣的货币,这使我能够对数据进行切片。但是,我现在想添加一个列,将交易者的整个交易量汇总为等值美元 - 本质上,每个用户的一个 traded_volume 以美元为一个窗口函数——我可以用它来分析。我将外汇汇率存储在单独的表中,并且可以应用联接。已尝试运行以下查询:
SELECT
OT.TRADER_NAME,
OT.CURRENCY_CODE,
OT.COUNTERPARTY,
SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
(SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-
SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0)
AS BASELINE_AVG,
SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-1),0) AS VARIANCE,
SUM(OT.TRADED_AMOUNT)/FX.FX_RATE AS TRADED_AMOUNT_USD,
SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE) AS TOTAL_TRADED_VOL_USD,
(SUM(OT.TRADED_AMOUNT)/FX.FX_RATE OVER (PARTITION BY OT.TRADER_NAME)-
SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME)-1),0)
AS BASELINE_AVG_USD,
SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE) - (SUM(OT.TRADED_AMOUNT)/FX.FX_RATE OVER (PARTITION BY
OT.TRADER_NAME)-SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)/NULLIF(SUM(1) OVER (PARTITION BY
OT.TRADER_NAME)-1),0) AS VARIANCE_USD
FROM ORDERS_TRADES_DATA OT
LEFT JOIN FX_RATES_TABLE FX ON OT.CURRENCY_CODE = FX.ASSET_CURRENCY_CODE
GROUP BY OT.TRADER_NAME, OT.CURRENCY_CODE, OT.COUNTERPARTY, FX.FX_RATE
...当我收到错误时不起作用:
无法对包含聚合或子查询的表达式执行聚合函数。
我如何在这里实现我的目标?
【问题讨论】:
【参考方案1】:即时错误是由于分层聚合SUM
调用:SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)
。但是由于SELECT
包含GROUP BY
中未引用的非聚合列,聚合查询中缺少GROUP BY
子句会引发另一个错误。
但是,请避免使用任何SUM() OVER(...)
窗口函数,并加入多个聚合级别(trade/currency
级别和trade/currency/counterparty
级别)。然后在没有聚合的外部查询中运行所需的计算。请注意:除以零是未定义的。
WITH trader_curr_agg AS (
SELECT OT.TRADER_NAME
, OT.CURRENCY_CODE
, SUM(OT.TRADED_AMOUNT) AS TOTAL_TRADED_VOL
, COUNT(*) AS TRADE_COUNTS
FROM ORDERS_TRADES_DATA OT
GROUP BY OT.TRADER_NAME
, OT.CURRENCY_CODE
),
trader_counterparty_agg AS (
SELECT OT.TRADER_NAME
, OT.CURRENCY_CODE
, OT.COUNTERPARTY
, SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT
FROM ORDERS_TRADES_DATA OT
GROUP BY OT.TRADER_NAME
, OT.CURRENCY_CODE
, OT.COUNTERPARTY
)
SELECT
tcntr.TRADER_NAME
, tcntr.CURRENCY_CODE
, tcntr.COUNTERPARTY
, tcntr.TRADED_AMOUNT
, tcurr.TOTAL_TRADED_VOL
, (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT)
/ NULLIF(tcurr.TRADE_COUNTS-1, 0) AS BASELINE_AVG
, (tcntr.TRADED_AMOUNT - (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT))
/ NULLIF(tcurr.TRADE_COUNTS-1, 0) AS VARIANCE
, tcntr.TRADED_AMOUNT / FX.FX_RATE AS TRADED_AMOUNT_USD
, tcurr.TOTAL_TRADED_VOL / FX.FX_RATE AS TOTAL_TRADED_VOL_USD
, ((tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT)
/ NULLIF(tcurr.TRADE_COUNTS-1, 0)) / FX.FX_RATE AS BASELINE_AVG_USD
, ((tcntr.TRADED_AMOUNT - (tcurr.TOTAL_TRADED_VOL - tcntr.TRADED_AMOUNT))
/ NULLIF(tcurr.TRADE_COUNTS-1, 0)) / FX.FX_RATE AS VARIANCE_USD
FROM trader_counterparty_agg tcntr
INNER JOIN trader_currency_agg tcurr
ON tcntr.TRADER_NAME = tcurr.TRADER_NAME
AND tcntr.CURRENCY_CODE = tcurr.CURRENCY_CODE
LEFT JOIN FX_RATES_TABLE FX
ON tcntr.CURRENCY_CODE = FX.ASSET_CURRENCY_CODE
【讨论】:
非常感谢,我会试一试 - 很抱歉我错过了编写完整的代码;我实际上使用了 Group By - 第一个查询按我提到的那样工作,但第二个查询(我正在尝试美元转换)没有......让我试试你的解决方案。 明白。仔细阅读您的错误实际上是由于在您的一个计算中分层SUM
:SUM((SUM(OT.TRADED_AMOUNT)/FX.FX_RATE)
。但是缺少GROUP BY
也是一个问题。是的,请考虑这个解决方案。您避免了许多SUM() OVER()
调用以提高可读性,甚至计算一次总和以提高效率。如果我未经测试的翻译产生问题,请根据需要调整公式。真正的区别是_USD
列除以FX_RATE
对延迟的响应表示歉意,在这方面花了很多时间 - 我能够成功地调整您的解决方案,也花了一些时间将我自己的调整添加到公式中。关于可读性,您是绝对正确的,但考虑到所需的计算和报告的数量,这很可能是不可避免的!【参考方案2】:
你可以这样写查询:
SELECT
A.TRADER_NAME,
A.CURRENCY_CODE,
A.COUNTERPARTY,
A.TRADED_AMOUNT,
A.TOTAL_TRADED_VOL,
A.BASELINE_AVG,
A.VARIANCE,
A.TRADED_AMOUNT/FX.FX_RATE AS TRADED_AMOUNT_USD,
A.TOTAL_TRADED_VOL/FX.FX_RATE AS TOTAL_TRADED_VOL_USD,
A.BASELINE_AVG/FX.FX_RATE AS BASELINE_AVG_USD,
A.VARIANCE/FX.FX_RATE AS VARIANCE_USD
FROM
(SELECT
OT.TRADER_NAME,
OT.CURRENCY_CODE,
OT.COUNTERPARTY,
SUM(OT.TRADED_AMOUNT) AS TRADED_AMOUNT,
SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE) AS TOTAL_TRADED_VOL,
(SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-
SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME, OT.CURRENCY_CODE)-1),0)
AS BASELINE_AVG,
SUM(OT.TRADED_AMOUNT) - (SUM(OT.TRADED_AMOUNT) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-SUM(OT.TRADED_AMOUNT))/NULLIF(SUM(1) OVER (PARTITION BY OT.TRADER_NAME,
OT.CURRENCY_CODE)-1),0) AS VARIANCE
FROM ORDERS_TRADES_DATA O) A
LEFT JOIN FX_RATES_TABLE FX ON FX.ASSET_CURRENCY_CODE = A.CURRENCY_CODE
【讨论】:
非常感谢您提出的解决方案;我现在实际上已经使用了基于 cte 的解决方案。非常感谢。以上是关于如何使用窗口函数向 mySQL 查询添加新的计算列?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 MySQL 中为每个类别创建一个 SQL 窗口函数列?