在case sql语句中对范围间隔求和

Posted

技术标签:

【中文标题】在case sql语句中对范围间隔求和【英文标题】:Sum over range interval within case sql statment 【发布时间】:2018-05-18 21:17:08 【问题描述】:

我正在尝试获取每个客户开始日期之后的每个日期的平均支出(这是为了新近-频率-货币分析的目的)。这是下面的货币价值元素,我希望得到客户开始日期之后所有交易的总和除以他们购买的天数。我正在使用 Oracle 12c。

我有以下工作,但包括完整的日期范围。

RFM AS (
SELECT SRC_USER_ID,
  COUNT(distinct PICKUP_DATE) -1 as frequency,
  (MAX(PICKUP_DATE) - MIN(PICKUP_DATE)) as recency,
  (TO_DATE ('2018/05/12', 'yyyy/mm/dd') - MIN(PICKUP_DATE)) as T,
  (CASE WHEN COUNT(distinct PICKUP_DATE)-1=0 THEN 0 ELSE
         SUM(PRICE_TOTAL)/COUNT(distinct PICKUP_DATE) END) AS monetary_value
FROM TRANSACTIONS
group by SRC_USER_ID

我认为我需要使用窗口聚合函数 (https://ss64.com/ora/syntax-analytic-aggregate.html)。但是,当我尝试以下方法时,它不起作用。

RFM AS (
SELECT SRC_USER_ID,
  COUNT(distinct PICKUP_DATE) -1 as frequency,
  (MAX(PICKUP_DATE) - MIN(PICKUP_DATE)) as recency,
  (TO_DATE ('2018/05/12', 'yyyy/mm/dd') - MIN(PICKUP_DATE)) as T,
  (CASE WHEN COUNT(distinct PICKUP_DATE)-1=0 THEN 0 ELSE
    SUM(PRICE_TOTAL) OVER (ORDER BY PICKUP_DATE) RANGE INTERVAL '1' DAY FOLLOWING UNBOUNDED/COUNT(distinct PICKUP_DATE) END) AS monetary_value
FROM TRANSACTIONS
group by SRC_USER_ID

任何帮助将不胜感激。

【问题讨论】:

如果您准备了样本测试输入数据和该数据的预期结果,这将很有帮助,最好使用以下站点之一:sqlfiddle.com 或 dbfiddle.uk/?rdbms=oracle_11.2 查看这些查询,很难猜猜是怎么回事,我不确定这个函数是否是最好的解决方案。 【参考方案1】:

在学习分析函数时,看看documentation 和oracle-base 中的示例可能是个好主意。这是一个小测试表,有 3 个列,其名称与您的查询中的名称相似。 (注意:日期和价格是随机值。)

create table transactions
as
select
  mod( level, 3 ) + 1 as srcuserid
, to_date( trunc( dbms_random.value( 2451925, 2458258 ) ), 'J' ) pickupdate
, round( dbms_random.value() * 10000, 2 ) pricetotal
from dual
connect by level <= 12 ;

select * from transactions order by srcuserid, pickupdate ;

SRCUSERID  PICKUPDATE  PRICETOTAL  
1          27-JUL-03   9447.05     
1          04-APR-05   9595.6      
1          28-SEP-07   408.09      
1          16-AUG-13   5643.33     
2          20-JAN-01   6253.87     
2          26-OCT-05   5981.7      
2          16-DEC-08   8138.03     
2          20-JUL-17   49.67       
3          08-AUG-03   7411.74     
3          29-OCT-06   2218.95     
3          11-FEB-10   111.07      
3          26-JUL-17   600.15  

12 rows selected. 

为了开发您的查询,请尝试使用分析函数来计算所有列的值(根据需要)。避免为此使用 GROUP BY,因为在这种情况下会抛出“不是 GROUP BY 表达式”错误。此外,您会发现结果集包含原始表中每一行的一行。您可以在此处使用 DISTINCT,因为我们只处理聚合。

select distinct -- without "distinct", you'll get a multiple identical rows "per window"
  srcuserid
, count( pickupdate ) over ( partition by srcuserid ) as frequency
, max( pickupdate ) over ( partition by srcuserid )   as max_date
, min( pickupdate ) over ( partition by srcuserid )   as min_date
, sum( pricetotal ) over ( partition by srcuserid )   as sum_pricetotal
from transactions 
-- group by srcuserid  -- ORA-00979: not a GROUP BY expression
;

SRCUSERID  FREQUENCY  MAX_DATE   MIN_DATE   SUM_PRICETOTAL  
2          4          20-JUL-17  20-JAN-01  20423.27        
3          4          26-JUL-17  08-AUG-03  10341.91        
1          4          16-AUG-13  27-JUL-03  25094.07 

一旦这种(某种)工作,将查询用作内联视图,并为外部 SELECT 添加一些收尾工作。请注意,此处的最终查询也使用 first_value() - 这可能是您查找“窗口”第一个条目的一种方式。

select
  srcuserid
, count_ - 1          as frequency
, max_date - min_date as recency
, trunc( sysdate - min_date )  as T
, case
    when count_ - 1 = 0 then 0
    else round( ( sum_pricetotal - firstpricetotal ) / ( count_ - 1 ), 2 ) 
  end as monetary_value 
from (
  select distinct
    srcuserid
  , count( pickupdate ) over ( partition by srcuserid ) as count_
  , max( pickupdate ) over ( partition by srcuserid )   as max_date
  , min( pickupdate ) over ( partition by srcuserid )   as min_date
  , sum( pricetotal ) over ( partition by srcuserid )   as sum_pricetotal
-- first_value(): find the first ie oldest "pricetotal" for each client
  , first_value( pricetotal ) over ( 
      partition by srcuserid order by pickupdate )      as firstpricetotal
  from transactions
) 
;

-- result
SRCUSERID  FREQUENCY  RECENCY  T     MONETARY_VALUE  
2          3          6025     6328  4723.13         
3          3          5101     5398  976.72          
1          3          3673     5410  5215.67 

另请参阅:dbfiddle here。

【讨论】:

太棒了,谢谢!我最终获得了 4 个 CTE,但这更好更快。剩下一个细节,你所拥有的与我正在寻找的不一致。每天发生多笔交易,所以我想要它们的汇总(例如 firstpricetotal 应该超过第一天)。包含它是一个小的变化吗? 用 ((SELECT srcuserid, sum(pricetotal) AS pricetotal, pickupdate FROM transactions GROUP BY srcuserid, pickupdate) 替换交易就可以了。再次感谢,非常感谢,现在意味着我可以将其投入生产不需要python。 太棒了!您已经了解了如何“使窗户变小”。 (如果时间允许,您也可以尝试(按 srcuserid、pickupdate 分区)...)感谢您的反馈。祝你好运!

以上是关于在case sql语句中对范围间隔求和的主要内容,如果未能解决你的问题,请参考以下文章

GCC 4.4:避免在 gcc 中对 switch/case 语句进行范围检查?

Spark SQL 中的 case 语句

SQL语句中Case 的用法

SQL语句中case,when,then的用法

sql语句求和

如何让SQL语句中的聚集函数sum不忽略NULL值