每天按列和行、计数和百分比分组
Posted
技术标签:
【中文标题】每天按列和行、计数和百分比分组【英文标题】:Group by in columns and rows, counts and percentages per day 【发布时间】:2018-08-13 12:26:10 【问题描述】:我有一个表,其中包含如下数据。
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
我想将它们分组并按 attr 列逐行计数,并在其中创建其他列以显示它们每天的计数和百分比,如下所示。
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
我可以使用 group by 显示一个计数,但无法找出如何将它们分成多个列。我尝试使用
生成第 1 天百分比SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
但这也没有给我正确的答案,我得到的百分比全为零并计为 1。感谢任何帮助。我正在尝试在遵循 postgresql 语法的 Redshift 中执行此操作。
【问题讨论】:
您只需要2天吗?如果您的表格包含超过 2 天怎么办? 我的要求是做6天。但是一旦我得到两个答案,我想扩展。 四次几乎相同的答案。你还期待别的吗? 不,我无法访问数据库以测试查询。我正在尝试运行查询。一旦我可以测试它,我会标记答案。 【参考方案1】:让我们在演示之前确定逻辑:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
如果您觉得需要,您可以从这里开始每天创作
【讨论】:
【参考方案2】:如果您需要 7 天,我正在尝试增强查询 @johnHC 顺便说一句,那么您必须到那些日子以防
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
【讨论】:
【参考方案3】:如果你只有 2 天:
http://sqlfiddle.com/#!17/3bdad/3(如您的示例中的天数从左到右递减)
http://sqlfiddle.com/#!17/3bdad/5(天数递增)
其他答案中已经提到了主要思想。我没有加入 CTE 来计算值,而是使用我认为更短且更具可读性的窗口函数。枢轴以相同的方式完成。
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
每天计算行数和每天计算行数 AND attr
B 为了提高可读性,我将日期转换为数字。在这里,我将行的当前日期与表中可用的最大日期之间的差异。所以我得到一个从 0(第一天)到 n - 1(最后一天)的计数器
C 计算百分比和四舍五入
D 通过过滤天数进行透视。 COALESCE
避免了 NULL
值并将它们切换为 0。要添加更多天数,您可以将这些列相乘。
编辑:使日期计数器在更多天数上更加灵活;新的 SQL 小提琴
【讨论】:
【参考方案4】:基本上,我认为这是条件聚合。但是您需要获取旋转日期的枚举器。所以:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here 是一个 SQL Fiddle。
【讨论】:
以上是关于每天按列和行、计数和百分比分组的主要内容,如果未能解决你的问题,请参考以下文章
Power Query M - 使用自定义聚合(百分位)按列值分组