count(distinct) over (partition by... 在 Oracle SQL 中不起作用

Posted

技术标签:

【中文标题】count(distinct) over (partition by... 在 Oracle SQL 中不起作用【英文标题】:count(distinct) over (partition by... doesn't work in Oracle SQL 【发布时间】:2019-03-25 22:16:27 【问题描述】:

我想统计过去 30 天内的 distinct day_number。但是,distinct函数不能与over一起使用

如果我删除distinct,它会给我day_number 的总数,但day_number 可以有很多重复。所以这就是我想添加distinct的原因。

select tr.*,
       count( distinct day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as result
from table tr;

谁能告诉我如何计算over(partition by..) 语句中的不同数字?提前致谢。

【问题讨论】:

如果您提供数据会更容易,但请查看 COLLECT(组装值集合的聚合函数)、SET(通过删除重复项将集合变成集合)和 CARDINALITY (它返回集合中的行数)。您可能需要显式处理空值。 感谢您的回复。更具体地说,day_number 是数字形式的天数列表,例如,1970.1.1 被认为是 1,依此类推。该表包含日期(day_number)每个帐户的所有交易。一个帐户可以有许多关联的 day_numbers。我的目标是计算每个帐户过去 30 天的不同 day_number。如果我想要总数,我提供的上述代码效果很好。但我想要不同的计数。 “1970.1.1 被认为是 1 等等”是什么意思?是 1970 年 1 月的第一天(所以 DAY_NUMBER 的数据类型是 DATE),还是……?它必须是分析计数吗?带有 WHERE 子句的聚合 COUNT 函数怎么样(将“天数”限制为持续 30 天)? 【参考方案1】:

count(distinct ...) 可以很好地与 over 子句配合使用,主要问题是 order by。你不能做count (distinct ..) over (partition by ... order by ...),因为 DISTINCT 函数和 RATIO_TO_REPORT 不能有 ORDER BY。所以我这样做了:

select tr.*, count (distinct day_number) over (partition by account)
from (select t.*, row_number() over (partition by account order by day_number) row_number from table t) tr
where row_number < 30;

我已经在 HR 的员工计划中测试过(随处可以找到的免费预言机计划) 我不确定它是否适用于您的架构,因为我没有它的副本,但如果没有,它应该会给您一些想法:

 select count (distinct manager_id) over (partition by department_id), department_id, manager_id
from (select e.*, row_number() over (partition by department_id order by employee_id) row_number from employees e)
where row_number < 30;

【讨论】:

【参考方案2】:

您可以首先创建一个仅列出每个 id 一次的列,然后对该列进行范围计数,例如:

WITH sd AS (SELECT 1 ID, 10 val FROM dual UNION ALL
            SELECT 1 ID, 20 val FROM dual UNION ALL
            SELECT 2 ID, 30 val FROM dual UNION ALL
            SELECT 2 ID, 40 val FROM dual UNION ALL
            SELECT 4 ID, 50 val FROM dual UNION ALL
            SELECT 4 ID, 60 val FROM dual UNION ALL
            SELECT 6 ID, 70 val FROM dual)
SELECT ID,
       val,
       COUNT(id_distinct) OVER (ORDER BY ID RANGE 3 PRECEDING) cnt_disinct_ids
FROM   (SELECT ID,
               val,
               CASE WHEN row_number() OVER (PARTITION BY ID ORDER BY val) = 1 THEN ID END id_distinct
        FROM   sd);

        ID        VAL CNT_DISINCT_IDS
---------- ---------- ---------------
         1         10               1
         1         20               1
         2         30               2
         2         40               2
         4         50               3
         4         60               3
         6         70               2

ETA:证明上述技术适用于您的数据:

WITH your_table AS (SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 24 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 25 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 55 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 10 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 12 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual)
SELECT account_sk,
       day_sk,
       COUNT(day_sk_distinct) OVER (PARTITION BY account_sk ORDER BY day_sk RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) count_distinct_day_sk
FROM   (SELECT account_sk,
               day_sk,
               CASE WHEN row_number() OVER (PARTITION BY account_sk, day_sk ORDER BY day_sk) = 1 THEN day_sk END day_sk_distinct
        FROM   your_table);

ACCOUNT_SK     DAY_SK COUNT_DISTINCT_DAY_SK
---------- ---------- ---------------------
ABCDE              23                     1
ABCDE              23                     1
ABCDE              24                     2
ABCDE              25                     3
ABCDE              53                     3
ABCDE              53                     3
ABCDE              55                     2
VWXYZ              10                     1
VWXYZ              12                     2
VWXYZ              40                     2
VWXYZ              40                     2

【讨论】:

感谢您的回复!我尝试了上面的代码,但它仍然不起作用。 select count(tr.day_sk) over (order by tr.ACCOUNT_SK range 29 before) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK结束 id_distinct 来自 aggTxn2 tr) “不起作用”是什么意思?你有错误吗?错误的结果?还有什么? 我收到类似这样的错误消息 '''''ERROR: ORACLE prepare error: ORA-00904: "TR"."ACCOUNT_SK": invalid identifier。 SQL语句:select count(tr.day_sk) over (order by tr.ACCOUNT_SK range 29 before) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK end id_distinct from aggTxn2 tr) ''''''' 不过我觉得不是“TR”的问题。 是的;你的外部查询没有“tr”的别名,只有你的内联视图有。从外部查询中删除 tr.s 即可。 谢谢! “TR”问题已修复,代码现在可以运行。我得到了过去 30 天内 day_sk 的总交易量。但我想要过去 30 天内 day_sk 的不同数量的计数。唯一的问题是“不同”。这是我的代码,你有什么修复它的建议吗?非常感谢! select count(day_sk) over (PARTITION BY ACCOUNT_SK ORDER BY DAY_SK range 29 PRECEDING) cnt_distinct_ids from (select tr.* , case when row_number() over (PARTITION BY tr.ACCOUNT_SK ORDER BY tr.DAY_SK)=1 then tr.ACCOUNT_SK end id_distinct 来自 aggTxn2 tr)【参考方案3】:

您可以通过创建一个列来模拟这一点,当“新中断”出现在有序列表中时,该列将为 1,否则为 null。然后你只需要计算或总结这些突破指标。 count() 和 sum() 都支持“over(order by ...)”。 在您的示例中,它将是:

with TR as
(select 1 as PK, 'ABCDE' as ACCOUNT, 23 as DAY_number from dual
union all select 2, 'ABCDE', 23 from dual
union all select 3, 'ABCDE', 24 from dual
union all select 4, 'ABCDE', 25 from dual
)
select tr.*
    ,  count( /*distinct*/ day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as wrong_result
    ,  count(IS_NEW_BREAK) over(PARTITION BY ACCOUNT order by day_number range 29 PRECEDING) as desired_output
    from
    (select tr.*
        ,   case min(case when day_number is not null then PK end) over(PARTITION BY ACCOUNT, day_number) when PK then 1 end as IS_NEW_BREAK
        from tr
    ) tr;

【讨论】:

以上是关于count(distinct) over (partition by... 在 Oracle SQL 中不起作用的主要内容,如果未能解决你的问题,请参考以下文章

count(distinct) over (partition by... 在 Oracle SQL 中不起作用

SQL Server count() over() with distinct

HiveSql&SparkSql—COUNT(DISTINCT ) OVER (PARTITION BY )报错解决方案

如何应用:大查询中的count(distinct ...)超过(partition by ... order by)?

Spark-shell交互式编程

计算 distinct 和 join over distinct 之间的差异