通过 BigQuery 上的更改事件聚合时间序列
Posted
技术标签:
【中文标题】通过 BigQuery 上的更改事件聚合时间序列【英文标题】:Aggregate time series by change events on BigQuery 【发布时间】:2022-01-21 06:20:49 【问题描述】:在 BigQuery 上,我有一个时间序列数据,代表以太坊上 DEX 池的快照。每行都有一个时间戳、一个池地址和一个余额。我需要一个仅在余额发生变化时才返回行列表的查询。
因此例如具有以下行:
ts | pool | balance
------------------------------
1 | 0x123 | 100
2 | 0x123 | 100
3 | 0x123 | 80
4 | 0x123 | 80
5 | 0x123 | 100
查询将返回:
ts | pool | balance
------------------------------
1 | 0x123 | 100
3 | 0x123 | 80
5 | 0x123 | 100
我能得到一些帮助吗?
【问题讨论】:
【参考方案1】:考虑以下选项
select * from pools where true
qualify ifnull(balance != lag(balance) over win, true)
window win as (partition by pool order by ts)
如果应用于您问题中的样本数据 - 输出是
【讨论】:
【参考方案2】:当我在编写和简化我的问题时,我最终自己解决了它:)
所以这是我写的查询,希望它能帮助你解决类似的问题:
WITH pools AS (
SELECT 1 as ts, "a" as pool, 100 as balance UNION ALL
SELECT 2 as ts, "a" as pool, 100 as balance UNION ALL
SELECT 3 as ts, "a" as pool, 80 as balance UNION ALL
SELECT 4 as ts, "a" as pool, 80 as balance UNION ALL
SELECT 5 as ts, "a" as pool, 100 as balance
),
data AS (
SELECT pool, ts, balance, (LAG(ts) OVER (PARTITION BY pool ORDER BY ts ASC)) AS prev_ts,
(LAG(balance) OVER (PARTITION BY pool ORDER BY ts ASC)) AS prev_balance
FROM pools
ORDER BY ts
)
SELECT ts, pool, balance, prev_balance
FROM data
WHERE balance != prev_balance or prev_balance is NULL
【讨论】:
以上是关于通过 BigQuery 上的更改事件聚合时间序列的主要内容,如果未能解决你的问题,请参考以下文章
将 BigQuery 的聚合具体化转储到 SQL 服务器、Dataflow 与 Airflow