每天按列和行、计数和百分比分组

Posted

技术标签:

【中文标题】每天按列和行、计数和百分比分组【英文标题】:Group by in columns and rows, counts and percentages per day 【发布时间】:2018-08-13 12:26:10 【问题描述】:

我有一个表,其中包含如下数据。

attr            |time         
----------------|--------------------------
abc             |2018-08-06 10:17:25.282546
def             |2018-08-06 10:17:25.325676
pqr             |2018-08-05 10:17:25.366823
abc             |2018-08-06 10:17:25.407941
def             |2018-08-05 10:17:25.449249

我想将它们分组并按 attr 列逐行计数,并在其中创建其他列以显示它们每天的计数和百分比,如下所示。

attr            |day1_count| day1_%| day2_count| day2_%     
----------------|----------|-------|-----------|-------
abc             |2         |66.6%  | 0         | 0.0%
def             |1         |33.3%  | 1         | 50.0%
pqr             |0         |0.0%   | 1         | 50.0%

我可以使用 group by 显示一个计数,但无法找出如何将它们分成多个列。我尝试使用

生成第 1 天百分比
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
    SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
    GROUP BY attr;

但这也没有给我正确的答案,我得到的百分比全为零并计为 1。感谢任何帮助。我正在尝试在遵循 postgresql 语法的 Redshift 中执行此操作。

【问题讨论】:

您只需要2天吗?如果您的表格包含超过 2 天怎么办? 我的要求是做6天。但是一旦我得到两个答案,我想扩展。 四次几乎相同的答案。你还期待别的吗? 不,我无法访问数据库以测试查询。我正在尝试运行查询。一旦我可以测试它,我会标记答案。 【参考方案1】:

让我们在演示之前确定逻辑:

with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday

如果您觉得需要,您可以从这里开始每天创作

【讨论】:

【参考方案2】:

如果您需要 7 天,我正在尝试增强查询 @johnHC 顺便说一句,那么您必须到那些日子以防

with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM  t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
) 
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2 
from  CTE3 group by CTE3.attr

http://sqlfiddle.com/#!17/54ace/20

【讨论】:

【参考方案3】:

如果你只有 2 天:

http://sqlfiddle.com/#!17/3bdad/3(如您的示例中的天数从左到右递减)

http://sqlfiddle.com/#!17/3bdad/5(天数递增)

其他答案中已经提到了主要思想。我没有加入 CTE 来计算值,而是使用我认为更短且更具可读性的窗口函数。枢轴以相同的方式完成。

 SELECT 
    attr, 
    COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count,         -- D
    COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent, 
    COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
    COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
    /*
       Add more days here
    */
FROM(
    SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent             -- C
    FROM (
        SELECT DISTINCT
            attr, 
            MAX(time::date) OVER () - time::date as day_number,                  -- B
            count(*) OVER (partition by time::date, attr) as count,              -- A
            count(*) OVER (partition by time::date) as count_per_day 
        FROM test_table
    )s
)s

GROUP BY attr
ORDER BY attr

每天计算行数和每天计算行数 AND attr

B 为了提高可读性,我将日期转换为数字。在这里,我将行的当前日期与表中可用的最大日期之间的差异。所以我得到一个从 0(第一天)到 n - 1(最后一天)的计数器

C 计算百分比和四舍五入

D 通过过滤天数进行透视。 COALESCE 避免了 NULL 值并将它们切换为 0。要添加更多天数,您可以将这些列相乘。

编辑:使日期计数器在更多天数上更加灵活;新的 SQL 小提琴

【讨论】:

【参考方案4】:

基本上,我认为这是条件聚合。但是您需要获取旋转日期的枚举器。所以:

SELECT attr, 
       COUNT(*) FILTER (WHERE day_number = 1) as day1_count, 
       COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent, 
       COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
       COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
             DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
             1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
      FROM test_table
     ) s
GROUP BY attr, cnt
ORDER BY attr;

Here 是一个 SQL Fiddle。

【讨论】:

以上是关于每天按列和行、计数和百分比分组的主要内容,如果未能解决你的问题,请参考以下文章

Power Query M - 使用自定义聚合(百分位)按列值分组

Pandas 数据框:按列和行替换值的性能是不是有差异?

SQL Server Reporting Services 2008 中的列和行分组

列和行操作 Python Pandas

如何将计数总结为百分比而不是绝对值?

(Presto)SQL:按列“A”和“B”以及计数列“C”分组,但也包括仅按“A”分组的“C”计数