BigQuery 重复的 rank() 数字
Posted
技术标签:
【中文标题】BigQuery 重复的 rank() 数字【英文标题】:BigQuery duplicated rank() numbers 【发布时间】:2020-01-15 08:02:02 【问题描述】:我有一个包含 account_no、order_id、start_date、end_date 和排名的交易表。我正在尝试根据交易的开始日期和结束日期对交易进行排名。但问题是所有交易都有相似的开始和结束日期,我无法根据交易日期对交易进行排名。
我的代码
select distinct account_id,order_id,order_validfrom_date as start_date,order_validto_date as end_date,
rank() OVER (PARTITION BY account_id ORDER BY order_validfrom_date desc ,order_validto_date desc ) AS ranking,
from `datamart_dimsum.rpt_dly_dimsum_subscription_details`
where order_validfrom_date <= '2020-01-14' and account_id in (216223
) order by account_id, order_id,order_validfrom_date,order_validto_date
输出
account_id | order_id | start_date | end_date | ranking
216223 482847 2017-10-09 2017-11-08 1
216223 472121 2017-10-09 2017-11-08 1
216223 312312 2017-10-09 2017-11-08 1
尽管开始日期和结束日期相同,是否有任何方法可以将第一笔交易排名为 1?我试过 ROW_NUMBER() 函数但失败了。
【问题讨论】:
你的预期输出是什么? 第一个account_id排名1,其余2和3 【参考方案1】:使用row_number()
。 rank()
应该返回重复项:
row_number() over (partition by account_id
order by order_validfrom_date desc, order_validto_date desc
) as ranking,
【讨论】:
【参考方案2】:您是否尝试在ORDER BY
子句中添加 order_id?
select distinct account_id,order_id,order_validfrom_date as start_date,order_validto_date as end_date,
rank() OVER (PARTITION BY account_id ORDER BY order_validfrom_date desc ,order_validto_date, order_id desc ) AS ranking,
from `datamart_dimsum.rpt_dly_dimsum_subscription_details`
where order_validfrom_date <= '2020-01-14' and account_id in (216223
) order by account_id, order_id,order_validfrom_date,order_validto_date
【讨论】:
【参考方案3】:以下是 BigQuery 标准 SQL
我为你看到了两个同样合理的选择
选项 1 - 只需添加另一个字段作为决胜局
在您的情况下,order_id
看起来应该可以工作,因为它很可能在您的表格中是独一无二的 - 所以下面应该可以工作
#standardSQL
SELECT DISTINCT
account_id,
order_id,
order_validfrom_date AS start_date,
order_validto_date AS end_date,
RANK() OVER(
PARTITION BY account_id
ORDER BY order_validfrom_date DESC, order_validto_date DESC, order_id DESC -- added order_id here < this is the only change
) AS ranking,
FROM `datamart_dimsum.rpt_dly_dimsum_subscription_details`
WHERE order_validfrom_date <= '2020-01-14'
AND account_id IN (216223)
ORDER BY account_id, order_id DESC, order_validfrom_date, order_validto_date
选项 2 - 只需将 RANK 替换为 ROW_NUMBER,如下例所示
#standardSQL
SELECT DISTINCT
account_id,
order_id,
order_validfrom_date AS start_date,
order_validto_date AS end_date,
ROW_NUMBER() OVER( -- ROW_NUMBER instead of RANK is the only change here
PARTITION BY account_id
ORDER BY order_validfrom_date DESC, order_validto_date DESC
) AS ranking,
FROM `datamart_dimsum.rpt_dly_dimsum_subscription_details`
WHERE order_validfrom_date <= '2020-01-14'
AND account_id IN (216223)
ORDER BY account_id, order_id DESC, order_validfrom_date, order_validto_date
这两个选项都会产生以下输出
Row account_id order_id start_date end_date ranking
1 216223 482847 2017-10-09 2017-11-08 1
2 216223 472121 2017-10-09 2017-11-08 2
3 216223 312312 2017-10-09 2017-11-08 3
【讨论】:
以上是关于BigQuery 重复的 rank() 数字的主要内容,如果未能解决你的问题,请参考以下文章
Big Query 着陆页数字与 Google Analytics 界面不一致