BigQuery row_number 删除重复项
Posted
技术标签:
【中文标题】BigQuery row_number 删除重复项【英文标题】:BigQuery row_number to remove duplicates 【发布时间】:2021-09-01 16:21:04 【问题描述】:我想只保留表中时间戳最新的ID,有没有更优化和更有效的方法来解决这个问题
我尝试过的一个查询
SELECT * except(row_number)
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY ID)
row_number
FROM employees
)
WHERE row_number = 1
员工表:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:13:14
1 James IT 2019-05-21 12:14:14
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 13:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-22 14:18:14
3 David IT 2019-06-23 12:18:14
结果:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-23 12:18:14
【问题讨论】:
【参考方案1】:您只是在子查询语句中缺少ORDER BY
子句。
WITH
DATA AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS _row,
*
FROM
employees )
SELECT
* EXCEPT(_row)
FROM
DATA
WHERE
_row = 1
【讨论】:
【参考方案2】:SELECT *
FROM employees
WHERE TRUE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) = 1
【讨论】:
以上是关于BigQuery row_number 删除重复项的主要内容,如果未能解决你的问题,请参考以下文章