在 BigQuery 中选择具有不同 2 列的最新事件

Posted

技术标签:

【中文标题】在 BigQuery 中选择具有不同 2 列的最新事件【英文标题】:Select the latest event with a distinct 2 columns in BigQuery 【发布时间】:2020-11-11 22:48:00 【问题描述】:

我有一个 BigQuery 表,其架构如下:


  "name": "timeCreated", "type": "datetime",
  "name": "userid", "type": "string",
  "name": "textid", "type": "string",
  "name": "textvalue": "type": "float"

我正在尝试进行查询,因此我最终得到了每对 userid 和 textid 组合的最新 timeCreated 行。我已经尝试过 GROUP BY 等,但我似乎无法通过 timeCreated 字段获取 ORDER,然后删除每对 userid 和 textid 列不在顶部的所有行。

【问题讨论】:

【参考方案1】:

要在 Google BigQuery 中获取组的最新(最后)或最早(第一个)元素,您可以使用 ARRAY_AGG 和 [OFFSET(0)] 和适当的 ORDER BY(DESC 或 ASC):

WITH test_table AS (
  SELECT DATETIME '2020-11-01 01:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.1 AS textvalue UNION ALL
  SELECT DATETIME '2020-11-01 03:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.2 AS textvalue UNION ALL
  SELECT DATETIME '2020-11-01 02:00:00' AS timeCreated, 'user1' AS userid, 'text1' AS textid, 1.3 AS textvalue UNION ALL
  SELECT DATETIME '2020-11-01 02:00:00' AS timeCreated, 'user1' AS userid, 'text2' AS textid, 1.4 AS textvalue UNION ALL
  SELECT DATETIME '2020-11-01 01:00:00' AS timeCreated, 'user1' AS userid, 'text2' AS textid, 1.5 AS textvalue UNION ALL
  SELECT DATETIME '2020-11-01 00:00:00' AS timeCreated, 'user2' AS userid, 'text1' AS textid, 1.6 AS textvalue
)
SELECT 
  userid,
  textid,
  ARRAY_AGG(timeCreated ORDER BY timeCreated DESC)[OFFSET(0)] AS latest FROM test_table
GROUP BY userid, textid

【讨论】:

【参考方案2】:

以下是 BigQuery 标准 SQL

#standardSQL
select as value array_agg(t order by timeCreated desc limit 1)[offset(0)]
from `project.dataset.table` t
group by userid, textid

【讨论】:

以上是关于在 BigQuery 中选择具有不同 2 列的最新事件的主要内容,如果未能解决你的问题,请参考以下文章

在 C# 中插入具有重复记录列的 BigQuery 行

SQL BigQuery - 插入具有不同日期范围的行

如何从具有 DATE 列的 BigQuery 表中导出 AVRO 文件并将其再次加载到 BigQuery

如何选择配置单元中具有不同 2 列的配置单元中的所有值

BigQuery 选择重复列的一行

如何选择第 n 列,并在 BigQuery 中对列的选择进行排序