通过 BigQuery 上的更改事件聚合时间序列

Posted

技术标签:

【中文标题】通过 BigQuery 上的更改事件聚合时间序列【英文标题】:Aggregate time series by change events on BigQuery 【发布时间】:2022-01-21 06:20:49 【问题描述】:

在 BigQuery 上,我有一个时间序列数据,代表以太坊上 DEX 池的快照。每行都有一个时间戳、一个池地址和一个余额。我需要一个仅在余额发生变化时才返回行列表的查询。

因此例如具有以下行:

 ts | pool  | balance
------------------------------
 1  | 0x123 | 100
 2  | 0x123 | 100
 3  | 0x123 | 80
 4  | 0x123 | 80
 5  | 0x123 | 100

查询将返回:

 ts | pool  | balance
------------------------------
 1  | 0x123 | 100
 3  | 0x123 | 80
 5  | 0x123 | 100

我能得到一些帮助吗?

【问题讨论】:

【参考方案1】:

考虑以下选项

select * from pools where true 
qualify ifnull(balance != lag(balance) over win, true)
window win as (partition by pool order by ts)

如果应用于您问题中的样本数据 - 输出是

【讨论】:

【参考方案2】:

当我在编写和简化我的问题时,我最终自己解决了它:)

所以这是我写的查询,希望它能帮助你解决类似的问题:

WITH pools AS (
    SELECT 1 as ts, "a" as pool, 100 as balance UNION ALL 
    SELECT 2 as ts, "a" as pool, 100 as balance UNION ALL 
    SELECT 3 as ts, "a" as pool, 80 as balance UNION ALL 
    SELECT 4 as ts, "a" as pool, 80 as balance UNION ALL 
    SELECT 5 as ts, "a" as pool, 100 as balance
),
data AS (
    SELECT pool, ts, balance, (LAG(ts) OVER (PARTITION BY pool ORDER BY ts ASC)) AS prev_ts,
        (LAG(balance) OVER (PARTITION BY pool ORDER BY ts ASC)) AS prev_balance
    FROM pools
    ORDER BY ts
)
SELECT ts, pool, balance, prev_balance 
FROM data
WHERE balance != prev_balance or prev_balance is NULL

【讨论】:

以上是关于通过 BigQuery 上的更改事件聚合时间序列的主要内容,如果未能解决你的问题,请参考以下文章

无法通过自定义分析标签聚合 Firebase 通知事件?

将 BigQuery 的聚合具体化转储到 SQL 服务器、Dataflow 与 Airflow

没有 CQRS 的领域事件和版本控制

如何使用 BigQuery 计算 GitHub 上的推送事件?

计算 BigQuery 中的谷歌分析独特事件

每个键触发聚合事件集,包括它们的更改时间戳