BigQuery 重复的 rank() 数字

Posted

技术标签:

【中文标题】BigQuery 重复的 rank() 数字【英文标题】:BigQuery duplicated rank() numbers 【发布时间】:2020-01-15 08:02:02 【问题描述】:

我有一个包含 account_no、order_id、start_date、end_date 和排名的交易表。我正在尝试根据交易的开始日期和结束日期对交易进行排名。但问题是所有交易都有相似的开始和结束日期,我无法根据交易日期对交易进行排名。

我的代码

select distinct account_id,order_id,order_validfrom_date as start_date,order_validto_date as end_date, 
 rank() OVER (PARTITION BY account_id ORDER BY order_validfrom_date desc ,order_validto_date desc  ) AS ranking, 

from  `datamart_dimsum.rpt_dly_dimsum_subscription_details` 

where order_validfrom_date <= '2020-01-14'  and  account_id in (216223
)  order by account_id, order_id,order_validfrom_date,order_validto_date  

输出

account_id | order_id |  start_date  | end_date   | ranking
  216223     482847      2017-10-09    2017-11-08      1
  216223     472121      2017-10-09    2017-11-08      1
  216223     312312      2017-10-09    2017-11-08      1

尽管开始日期和结束日期相同,是否有任何方法可以将第一笔交易排名为 1?我试过 ROW_NUMBER() 函数但失败了。

【问题讨论】:

你的预期输出是什么? 第一个account_id排名1,其余2和3 【参考方案1】:

使用row_number()rank() 应该返回重复项:

 row_number() over (partition by account_id
                    order by order_validfrom_date desc, order_validto_date desc
                   ) as ranking, 

【讨论】:

【参考方案2】:

您是否尝试在ORDER BY 子句中添加 order_id?

select distinct account_id,order_id,order_validfrom_date as start_date,order_validto_date as end_date, 
 rank() OVER (PARTITION BY account_id ORDER BY order_validfrom_date desc ,order_validto_date, order_id desc  ) AS ranking, 

from  `datamart_dimsum.rpt_dly_dimsum_subscription_details` 

where order_validfrom_date <= '2020-01-14'  and  account_id in (216223
)  order by account_id, order_id,order_validfrom_date,order_validto_date

【讨论】:

【参考方案3】:

以下是 BigQuery 标准 SQL

我为你看到了两个同样合理的选择

选项 1 - 只需添加另一个字段作为决胜局

在您的情况下,order_id 看起来应该可以工作,因为它很可能在您的表格中是独一无二的 - 所以下面应该可以工作

#standardSQL
SELECT DISTINCT 
  account_id,
  order_id,
  order_validfrom_date AS start_date,
  order_validto_date AS end_date, 
  RANK() OVER(
    PARTITION BY account_id 
    ORDER BY order_validfrom_date DESC, order_validto_date DESC, order_id DESC -- added order_id here < this is the only change 
  ) AS ranking, 
FROM `datamart_dimsum.rpt_dly_dimsum_subscription_details` 
WHERE order_validfrom_date <= '2020-01-14'  
AND account_id IN (216223)
ORDER BY account_id, order_id DESC, order_validfrom_date, order_validto_date    

选项 2 - 只需将 RANK 替换为 ROW_NUMBER,如下例所示

#standardSQL
SELECT DISTINCT 
  account_id,
  order_id,
  order_validfrom_date AS start_date,
  order_validto_date AS end_date, 
  ROW_NUMBER() OVER( -- ROW_NUMBER instead of RANK  is the only change here
    PARTITION BY account_id 
    ORDER BY order_validfrom_date DESC, order_validto_date DESC
  ) AS ranking, 
FROM `datamart_dimsum.rpt_dly_dimsum_subscription_details` 
WHERE order_validfrom_date <= '2020-01-14'  
AND account_id IN (216223)
ORDER BY account_id, order_id DESC, order_validfrom_date, order_validto_date     

这两个选项都会产生以下输出

Row account_id  order_id    start_date  end_date    ranking  
1   216223      482847      2017-10-09  2017-11-08  1    
2   216223      472121      2017-10-09  2017-11-08  2    
3   216223      312312      2017-10-09  2017-11-08  3    

【讨论】:

以上是关于BigQuery 重复的 rank() 数字的主要内容,如果未能解决你的问题,请参考以下文章

Big Query 着陆页数字与 Google Analytics 界面不一致

如何在 Big Query 中安排每日插入作业 [重复]

当Big Query加载失败并且CSV表遇到太多错误时,获取更多信息,放弃[重复]

在 BigQuery 中计算百分位数

Big Query 透视和聚合重复字段

BigQuery 将 rank / percent_rank 应用于带有 WHERE 子句的列