SQL 计算自当月第一天以来每天的累积 Distinct 计数
Posted
技术标签:
【中文标题】SQL 计算自当月第一天以来每天的累积 Distinct 计数【英文标题】:SQL Calculate a cumulative Distinct count per day since the first day of the month 【发布时间】:2019-04-01 13:34:46 【问题描述】:我有一张表,其中包含 Date、Segment、Area、Province 和 Billing_nbr 行。我还希望在Billing_nbr where total_revenue > 0
上的count distinct 基于该行的日期,该日期可以追溯到当月的第一天(即本月至今计数)。这意味着它应该是从当月第一天到观察日期(即每行的 ddate)的所有 billing_nbrs 的累积计数。这意味着如果 billing_nbr 发生在第一天,它也应该包括在第二天的计数中,即使它没有发生在第二天,但如果它在两天都出现,则应该计算一次。另一个按其他列分组的Billing_nbr where total_revenue > 0
的正常计数。以下是我拥有的数据:
我有下面的加入,但它在这两个方面给了我相似的数字:
SELECT
MAIN_TABLE.*,
TOT_SUBS_COUNT.N AS A1_SUBSCRIBERS_TOTAL,
TOT_SUBS_COUNT_MTD.N AS TOTAL_MTD_A1_SUBSCRIBERS
FROM (
select
ddate,
SUM(TOTAL_REVENUE) AS REVENUE_TOTAL,
segment,
province,
area
from CADA_PERMSISDN_DASHBOARD
GROUP BY province, area, segment, ddate
order by ddate
) MAIN_TABLE
JOIN(
select DDATE, count(DISTINCT BILLING_NBR) AS N, province, area, SEGMENT from CADA_PERMSISDN_DASHBOARD
WHERE TOTAL_REVENUE > 0
GROUP BY province, area, segment, ddate
ORDER BY DDATE
) TOT_SUBS_COUNT ON MAIN_TABLE.DDATE = TOT_SUBS_COUNT.DDATE
AND MAIN_TABLE.SEGMENT = TOT_SUBS_COUNT.SEGMENT
AND MAIN_TABLE.PROVINCE = TOT_SUBS_COUNT.PROVINCE
AND MAIN_TABLE.AREA = TOT_SUBS_COUNT.AREA
JOIN(
select DDATE, count(DISTINCT BILLING_NBR) AS N, province, area, SEGMENT from CADA_PERMSISDN_DASHBOARD
WHERE TOTAL_REVENUE > 0
AND DDATE BETWEEN trunc((DDATE),'month') AND DDATE
GROUP BY province, area, segment, ddate
ORDER BY DDATE
) TOT_SUBS_COUNT_MTD ON MAIN_TABLE.DDATE = TOT_SUBS_COUNT_MTD.DDATE
AND MAIN_TABLE.SEGMENT = TOT_SUBS_COUNT_MTD.SEGMENT
AND MAIN_TABLE.PROVINCE = TOT_SUBS_COUNT_MTD.PROVINCE
AND MAIN_TABLE.AREA = TOT_SUBS_COUNT_MTD.AREA
第一个连接用于分组计数,第二个连接用于从当月第一天到观察日期(即每行的日期)的累积计数,并且必须按其他分组列也是如此。计数列分别别名为 A1_SUBSCRIBERS_TOTAL 和 OTAL_MTD_A1_SUBSCRIBERS。下面是我得到的数据,你看不到我在两列上有相同的计数:
【问题讨论】:
为什么不在插入时通过选择 count(*) 进行计数? 【参考方案1】:我不知道您的数据与查询有什么关系。但是对于一个月内的总和,您可以使用分析函数,如下所示:
select ddate, segment, province, area,
sum(total_revenue) as revenue_total,
sum(sum(total_revenue)) over (partition by trunc(ddate, 'MON') order by ddate) as mtd_revenue_total
from CADA_PERMSISDN_DASHBOARD
group by province, area, segment, ddate
order by ddate
【讨论】:
谢谢,但我需要计数(尤其是每天的累计计数)而不是总和。总和是直截了当的。【参考方案2】:SELECT
TO_CHAR(A.ddate, 'YYYY-MM'),
A.segment,
A.province,
A.area,
SUM(CASE WHEN total_revenue > 0 THEN 1 ELSE 0 END) TOTAL_GT_ZERO,
COUNT(*) TOTAL_SUBSCRIBERS
from CADA_PERMSISDN_DASHBOARD A
group by TO_CHAR(A.ddate, 'YYYY-MM'), A.segment, A.province, A.area
据我了解。但是你能提供你所期望的吗?
【讨论】:
我想这就是我要找的。您能否解释一下 CASE 是如何累积到总数中的? case 将根据 total_revenu 的值返回 1 或 0。总和就是总数。 那么这只是一个计数,它不是每天累积的。不过谢谢你的帮助。以上是关于SQL 计算自当月第一天以来每天的累积 Distinct 计数的主要内容,如果未能解决你的问题,请参考以下文章