count(distinct) over (partition by... 在 Oracle SQL 中不起作用
Posted
技术标签:
【中文标题】count(distinct) over (partition by... 在 Oracle SQL 中不起作用【英文标题】:count(distinct) over (partition by... doesn't work in Oracle SQL 【发布时间】:2019-03-25 22:16:27 【问题描述】:我想统计过去 30 天内的 distinct
day_number
。但是,distinct函数不能与over
一起使用
如果我删除distinct
,它会给我day_number
的总数,但day_number
可以有很多重复。所以这就是我想添加distinct
的原因。
select tr.*,
count( distinct day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as result
from table tr;
谁能告诉我如何计算over(partition by..)
语句中的不同数字?提前致谢。
【问题讨论】:
如果您提供数据会更容易,但请查看 COLLECT(组装值集合的聚合函数)、SET(通过删除重复项将集合变成集合)和 CARDINALITY (它返回集合中的行数)。您可能需要显式处理空值。 感谢您的回复。更具体地说,day_number 是数字形式的天数列表,例如,1970.1.1 被认为是 1,依此类推。该表包含日期(day_number)每个帐户的所有交易。一个帐户可以有许多关联的 day_numbers。我的目标是计算每个帐户过去 30 天的不同 day_number。如果我想要总数,我提供的上述代码效果很好。但我想要不同的计数。 “1970.1.1 被认为是 1 等等”是什么意思?是 1970 年 1 月的第一天(所以 DAY_NUMBER 的数据类型是 DATE),还是……?它必须是分析计数吗?带有 WHERE 子句的聚合 COUNT 函数怎么样(将“天数”限制为持续 30 天)? 【参考方案1】:count(distinct ...)
可以很好地与 over 子句配合使用,主要问题是 order by。你不能做count (distinct ..) over (partition by ... order by ...)
,因为 DISTINCT 函数和 RATIO_TO_REPORT 不能有 ORDER BY。所以我这样做了:
select tr.*, count (distinct day_number) over (partition by account)
from (select t.*, row_number() over (partition by account order by day_number) row_number from table t) tr
where row_number < 30;
我已经在 HR 的员工计划中测试过(随处可以找到的免费预言机计划) 我不确定它是否适用于您的架构,因为我没有它的副本,但如果没有,它应该会给您一些想法:
select count (distinct manager_id) over (partition by department_id), department_id, manager_id
from (select e.*, row_number() over (partition by department_id order by employee_id) row_number from employees e)
where row_number < 30;
【讨论】:
【参考方案2】:您可以首先创建一个仅列出每个 id 一次的列,然后对该列进行范围计数,例如:
WITH sd AS (SELECT 1 ID, 10 val FROM dual UNION ALL
SELECT 1 ID, 20 val FROM dual UNION ALL
SELECT 2 ID, 30 val FROM dual UNION ALL
SELECT 2 ID, 40 val FROM dual UNION ALL
SELECT 4 ID, 50 val FROM dual UNION ALL
SELECT 4 ID, 60 val FROM dual UNION ALL
SELECT 6 ID, 70 val FROM dual)
SELECT ID,
val,
COUNT(id_distinct) OVER (ORDER BY ID RANGE 3 PRECEDING) cnt_disinct_ids
FROM (SELECT ID,
val,
CASE WHEN row_number() OVER (PARTITION BY ID ORDER BY val) = 1 THEN ID END id_distinct
FROM sd);
ID VAL CNT_DISINCT_IDS
---------- ---------- ---------------
1 10 1
1 20 1
2 30 2
2 40 2
4 50 3
4 60 3
6 70 2
ETA:证明上述技术适用于您的数据:
WITH your_table AS (SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 24 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 25 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 55 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 10 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 12 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual)
SELECT account_sk,
day_sk,
COUNT(day_sk_distinct) OVER (PARTITION BY account_sk ORDER BY day_sk RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) count_distinct_day_sk
FROM (SELECT account_sk,
day_sk,
CASE WHEN row_number() OVER (PARTITION BY account_sk, day_sk ORDER BY day_sk) = 1 THEN day_sk END day_sk_distinct
FROM your_table);
ACCOUNT_SK DAY_SK COUNT_DISTINCT_DAY_SK
---------- ---------- ---------------------
ABCDE 23 1
ABCDE 23 1
ABCDE 24 2
ABCDE 25 3
ABCDE 53 3
ABCDE 53 3
ABCDE 55 2
VWXYZ 10 1
VWXYZ 12 2
VWXYZ 40 2
VWXYZ 40 2
【讨论】:
感谢您的回复!我尝试了上面的代码,但它仍然不起作用。 select count(tr.day_sk) over (order by tr.ACCOUNT_SK range 29 before) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK结束 id_distinct 来自 aggTxn2 tr) “不起作用”是什么意思?你有错误吗?错误的结果?还有什么? 我收到类似这样的错误消息 '''''ERROR: ORACLE prepare error: ORA-00904: "TR"."ACCOUNT_SK": invalid identifier。 SQL语句:select count(tr.day_sk) over (order by tr.ACCOUNT_SK range 29 before) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK end id_distinct from aggTxn2 tr) ''''''' 不过我觉得不是“TR”的问题。 是的;你的外部查询没有“tr”的别名,只有你的内联视图有。从外部查询中删除tr.
s 即可。
谢谢! “TR”问题已修复,代码现在可以运行。我得到了过去 30 天内 day_sk 的总交易量。但我想要过去 30 天内 day_sk 的不同数量的计数。唯一的问题是“不同”。这是我的代码,你有什么修复它的建议吗?非常感谢! select count(day_sk) over (PARTITION BY ACCOUNT_SK ORDER BY DAY_SK range 29 PRECEDING) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK end id_distinct 来自 aggTxn2 tr)【参考方案3】:
您可以通过创建一个列来模拟这一点,当“新中断”出现在有序列表中时,该列将为 1,否则为 null。然后你只需要计算或总结这些突破指标。 count() 和 sum() 都支持“over(order by ...)”。 在您的示例中,它将是:
with TR as
(select 1 as PK, 'ABCDE' as ACCOUNT, 23 as DAY_number from dual
union all select 2, 'ABCDE', 23 from dual
union all select 3, 'ABCDE', 24 from dual
union all select 4, 'ABCDE', 25 from dual
)
select tr.*
, count( /*distinct*/ day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as wrong_result
, count(IS_NEW_BREAK) over(PARTITION BY ACCOUNT order by day_number range 29 PRECEDING) as desired_output
from
(select tr.*
, case min(case when day_number is not null then PK end) over(PARTITION BY ACCOUNT, day_number) when PK then 1 end as IS_NEW_BREAK
from tr
) tr;
【讨论】:
以上是关于count(distinct) over (partition by... 在 Oracle SQL 中不起作用的主要内容,如果未能解决你的问题,请参考以下文章
count(distinct) over (partition by... 在 Oracle SQL 中不起作用
SQL Server count() over() with distinct
HiveSql&SparkSql—COUNT(DISTINCT ) OVER (PARTITION BY )报错解决方案