BigQuery FIRST_VALUE 和 IGNORE_NULLS - 为啥它会这样工作?

Posted

技术标签:

【中文标题】BigQuery FIRST_VALUE 和 IGNORE_NULLS - 为啥它会这样工作?【英文标题】:BigQuery FIRST_VALUE and IGNORE_NULLS - why it works this way?BigQuery FIRST_VALUE 和 IGNORE_NULLS - 为什么它会这样工作? 【发布时间】:2017-10-26 19:16:05 【问题描述】:

我的问题是从某个窗口的列中找到第一个值,这是带有查询的示例数据:

WITH finishers AS
 (SELECT 'Bob' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39'
  UNION ALL SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39')
SELECT *,
 FIRST_VALUE (name IGNORE NULLS) OVER(PARTITION BY division ORDER BY finish_time) AS fastest_in_division
FROM finishers
ORDER by division

结果是:

Row name    finish_time             division  fastest_in_division    
1   Bob     2016-10-18 02:51:45 UTC F30-34    Bob    
2   null    2016-10-18 02:54:11 UTC F35-39    **null**   
3   Mary    2016-10-18 02:59:01 UTC F35-39    Mary   
4   John    2016-10-18 03:01:17 UTC F35-39    Mary   

虽然我的期望是:

Row name    finish_time             division  fastest_in_division    
1   Bob     2016-10-18 02:51:45 UTC F30-34    Bob    
2   null    2016-10-18 02:54:11 UTC F35-39    **Mary**
3   Mary    2016-10-18 02:59:01 UTC F35-39    Mary   
4   John    2016-10-18 03:01:17 UTC F35-39    Mary   

似乎 IGNORE_NULLS 在 'name' 为 null 并且按顺序排在第一位时会跳过行 - 然后它返回 'null' 而不是 'Mary',就像在其他行中一样。有没有办法绕过这种行为?

【问题讨论】:

按升序排列,有NULL的行在前,所以窗口中没有非NULL的名字。也许您打算将 ORDER BY 与 DESC 一起使用? 我认为窗口是“除法”,其中还有其他非 NULL 名称,但也许我错了。如果 NULL 的行最后出现,ORDER BY DESC 也会有同样的问题,这样对我没有帮助。 【参考方案1】:

为了达到您的期望,查询应如下所示

#standardSQL
WITH finishers AS  (
  SELECT 'Bob' AS name, TIMESTAMP '2016-10-18 2:51:45' AS finish_time, 'F30-34' AS division UNION ALL 
  SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39' UNION ALL 
  SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39' UNION ALL 
  SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
)
SELECT *,
  FIRST_VALUE (name IGNORE NULLS) 
    OVER(PARTITION BY division ORDER BY finish_time 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS fastest_in_division
FROM finishers
ORDER BY finish_time, division

结果如你所愿:

Row name    finish_time             division    fastest_in_division  
1   Bob     2016-10-18 02:51:45 UTC F30-34      Bob  
2   null    2016-10-18 02:54:11 UTC F35-39      Mary     
3   Mary    2016-10-18 02:59:01 UTC F35-39      Mary     
4   John    2016-10-18 03:01:17 UTC F35-39      Mary

您遇到的问题是因为默认情况下 - ORDER BY 的 OVER 范围是 BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 在基于给定顺序的各个分区中,但看起来您希望整个分区都参与其中

【讨论】:

[垃圾邮件] 已接受 :)

以上是关于BigQuery FIRST_VALUE 和 IGNORE_NULLS - 为啥它会这样工作?的主要内容,如果未能解决你的问题,请参考以下文章

OVER 函数和 first_value

Oracle分析函数-first_value()和last_value()

Oracle分析函数-first_value()和last_value()

Oracle分析函数-first_value()和last_value()

oracle first_value,last_valus

FIRST_VALUE 窗口函数 - 查询执行期间超出资源