在 BigQuery 中选择具有不同 2 列的最新事件
Posted
技术标签:
【中文标题】在 BigQuery 中选择具有不同 2 列的最新事件【英文标题】:Select the latest event with a distinct 2 columns in BigQuery 【发布时间】:2020-11-11 22:48:00 【问题描述】:我有一个 BigQuery 表,其架构如下:
"name": "timeCreated", "type": "datetime",
"name": "userid", "type": "string",
"name": "textid", "type": "string",
"name": "textvalue": "type": "float"
我正在尝试进行查询,因此我最终得到了每对 userid 和 textid 组合的最新 timeCreated 行。我已经尝试过 GROUP BY 等,但我似乎无法通过 timeCreated 字段获取 ORDER,然后删除每对 userid 和 textid 列不在顶部的所有行。
【问题讨论】:
【参考方案1】:要在 Google BigQuery 中获取组的最新(最后)或最早(第一个)元素,您可以使用 ARRAY_AGG 和 [OFFSET(0)] 和适当的 ORDER BY(DESC 或 ASC):
WITH test_table AS (
SELECT DATETIME '2020-11-01 01:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.1 AS textvalue UNION ALL
SELECT DATETIME '2020-11-01 03:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.2 AS textvalue UNION ALL
SELECT DATETIME '2020-11-01 02:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.3 AS textvalue UNION ALL
SELECT DATETIME '2020-11-01 02:00:00' AS timeCreated, 'user1' AS userid, 'text2' AS textid, 1.4 AS textvalue UNION ALL
SELECT DATETIME '2020-11-01 01:00:00' AS timeCreated, 'user1' AS userid, 'text2' AS textid, 1.5 AS textvalue UNION ALL
SELECT DATETIME '2020-11-01 00:00:00' AS timeCreated, 'user2' AS userid, 'text1' AS textid, 1.6 AS textvalue
)
SELECT
userid,
textid,
ARRAY_AGG(timeCreated ORDER BY timeCreated DESC)[OFFSET(0)] AS latest FROM test_table
GROUP BY userid, textid
【讨论】:
【参考方案2】:以下是 BigQuery 标准 SQL
#standardSQL
select as value array_agg(t order by timeCreated desc limit 1)[offset(0)]
from `project.dataset.table` t
group by userid, textid
【讨论】:
以上是关于在 BigQuery 中选择具有不同 2 列的最新事件的主要内容,如果未能解决你的问题,请参考以下文章