使用 GROUP BY 查询计算百分比

Posted

技术标签:

【中文标题】使用 GROUP BY 查询计算百分比【英文标题】:Calculating percentages with GROUP BY query 【发布时间】:2011-09-06 15:00:50 【问题描述】:

我有一个包含 3 列的表格,如下所示:

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.

我想生成一个输出以下内容的查询(对于每个用户和评分,显示文件数和文件百分比):

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25

使用 Postgresql,我知道如何使用以下查询创建包含前 3 列的查询,但我不知道如何计算 GROUP BY 中的百分比:

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

在这里,我希望将百分比计算应用于每个用户/评级组。

【问题讨论】:

【参考方案1】:

最好的方法是使用window functions。

【讨论】:

您能详细说明一下吗?【参考方案2】:
WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

或者

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

我会看看其中一个或另一个是否使用适合您的 RDBMS 的工具产生更好的查询计划。

【讨论】:

谢谢安德鲁!我对您的第二个查询进行了稍微修改的版本:select user, rating, cnt, cnt::float * 100/(sum(cnt) over (partition by user)) from (select user, rating, count(*) as cnt from tbl group by user, rating) a order by user,rating 在这两个例子中都需要使用 SUM(COUNT(*)) 吗?第一个示例中的 (SUM(COUNT(*)) OVER (PARTITION BY User)) 之类的?使用 SUM 我得到预期值,否则我将除以评级数而不是它们的总和。 @TylerDeWitt,我想是的,请参阅上面 DeadMonkey 的评论。【参考方案3】:

或者,您可以采用老派的方式——可以说更容易理解:

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * item.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating

干杯!

【讨论】:

谢谢!这也可以在不使用窗口函数的情况下工作。我必须在上面的查询中将“item”更改为“usr”并删除第一个 count(*)。【参考方案4】:
WITH data AS 
 (SELECT User, Rating, Count(*) AS Count 
  FROM Results
  GROUP BY User, Rating)
SELECT User, Rating, Count, 
       (0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;

【讨论】:

【参考方案5】:

在 TSQL 中这应该可以工作

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

【讨论】:

以上是关于使用 GROUP BY 查询计算百分比的主要内容,如果未能解决你的问题,请参考以下文章

当查询有 GROUP BY 时如何获得总数的百分比?

将group_by和count()应用到变量后计算总计的百分比

在 GROUP BY 查询中除以值

SQL GROUP BY子句,使用AVG在DATESPAN计算中获得浮点数2精度

SQLAlchemy - 在查询中过滤func.count

BigQuery Legacy SQL(子查询?)中占总数的百分比