这个查询如何使用窗口函数为每个键返回多个结果?
Posted
技术标签:
【中文标题】这个查询如何使用窗口函数为每个键返回多个结果?【英文标题】:How is this query using a window function returning multiple results per key? 【发布时间】:2021-06-05 16:38:24 【问题描述】:我写了以下查询
SELECT
data.id,
LAST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type,
LAST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)
) AS access_type_timestamp
FROM
table
data
是一个结构体。
我希望每个id
返回一行,最新的access_type
和ts
对应id
。但是,它有时仍会为每个 id
返回多行。
我做错了什么?
【问题讨论】:
窗口函数应用于表格的每一行,不影响返回的行数。使用SELECT DISTINCT ....
【参考方案1】:
使用SELECT DISTINCT
。我建议FIRST_VALUE()
:
SELECT DISTINCT
data.id,
FIRST_VALUE(data.access_type) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type,
FIRST_VALUE(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (
PARTITION BY data.id
ORDER BY CAST(FROM_UNIXTIME(ts) AS TIMESTAMP) DESC
) AS access_type_timestamp
FROM table;
窗口函数不会减少行数。
另外,我会假设ts
的排序方式与时间戳相同,因此可以简化。另外,第二个就是MAX()
:
SELECT DISTINCT data.id,
FIRST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts DESC
) AS access_type,
MAX(CAST(FROM_UNIXTIME(ts) AS TIMESTAMP)) OVER (PARTITION BY data.id) AS access_type_timestamp
FROM table;
如果你使用LAST_VALUE()
,你需要一个窗口子句:
LAST_VALUE(data.access_type) OVER (PARTITION BY data.id
ORDER BY ts
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS access_type
带有ORDER BY
的默认窗口子句是RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
——这有一个令人讨厌的习惯,导致LAST_VALUE()
只返回当前行中的值。
【讨论】:
以上是关于这个查询如何使用窗口函数为每个键返回多个结果?的主要内容,如果未能解决你的问题,请参考以下文章