如何获得分组列的前 10 名?

Posted

技术标签:

【中文标题】如何获得分组列的前 10 名?【英文标题】:How to get Top 10 for a grouped column? 【发布时间】:2019-08-09 15:27:29 【问题描述】:

我的数据是客户和产品的列表,以及每种产品的成本

Member    Product    Cost
Bob       A123       $25
Bob       A123       $25
Bob       A123       $75
Joe       A789       $50
Joe       A789       $50
Bob       C321       $50
Joe       A123       $50
etc, etc, etc

我当前的查询获取每个客户、产品和成本,以及该客户的总成本。它给出的结果如下:

Member    Product    Cost    Total Cost
Bob       A123       $125    $275
Bob       A1433      $100    $275
Bob       C321       $50     $275
Joe       A123       $150    $250
Joe       A789       $100    $250

我如何才能获得总成本前 10 名,而不仅仅是总体上前 10 名的记录?我的查询是:

SELECT a.Member
    ,a.Product
    ,SUM(a.Cost)
    ,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
    ,a.Product
ORDER BY [Total Cost] DESC

如果我执行SELECT TOP 10,它只会给我前 10 行。实际的前 10 名最终将更像是 40 或 50 行。

谢谢!

【问题讨论】:

您是故意在查询中两次使用同一个表(MyTable)还是这是一个错字? 这是故意的。 MyTable 是我使用的唯一表。我在选择中使用选择来获取每个成员的总费用。 如果不同成员之间有联系怎么办? 谢谢大家的回复。显然我不能使用每个人的答案,但我确实从中学到了很多。 【参考方案1】:

试试这个。

SELECT tbl.member,
       tbl.product,
       Sum(tbl.cost)       AS cost,
       Max(stbl.totalcost) AS totalcost
FROM   mytable tbl
       INNER JOIN (SELECT member,
                          Sum(cost) AS totalcost,
                          Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
                   FROM   mytable
                   GROUP  BY member) stbl
               ON stbl.member = tbl.member
WHERE  stbl.rn <= 10
GROUP  BY tbl.member, tbl.product
ORDER  BY Max(stbl.rn)  

在线演示:http://sqlfiddle.com/#!18/87857/1/0


表结构和样本数据

CREATE TABLE mytable
(
 member  NVARCHAR(50),
 product NVARCHAR(10),
 cost    INT
)

INSERT INTO mytable
VALUES ('Bob','A123','25'),
       ('Bob','A123','25'),
       ('Bob','A123','75'),
       ('Joe','A789','50'),
       ('Joe','A789','50'),
       ('Bob','C321','50'),
       ('Joe','A123','50'),
       ('Rock','A123','50'),
       ('Anord','A100','50'),
       ('Jack','A123','50'),
       ('Anord','A123','50'),
       ('Joe','A123','50'),
       ('Karma','A123','50'),
       ('Seetha','A123','50'),
       ('Aruna','A123','50'),
       ('Jake','A123','50'),
       ('Paul','A123','50'),
       ('Logan','A123','50'),
       ('Joe','A123','50');

子查询 - 每个客户的总成本

SELECT member,
       Sum(cost) AS totalcost,
       Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM   mytable
GROUP  BY member

子查询:输出

+---------+------------+----+
| member  | totalcost  | rn |
+---------+------------+----+
| Joe     |       250  |  1 |
| Bob     |       175  |  2 |
| Anord   |       100  |  3 |
| Aruna   |        50  |  4 |
| Jack    |        50  |  5 |
| Jake    |        50  |  6 |
| Karma   |        50  |  7 |
| Logan   |        50  |  8 |
| Paul    |        50  |  9 |
| Rock    |        50  | 10 |
| Seetha  |        50  | 11 |
+---------+------------+----+
Record Count: 11

主查询

SELECT tbl.member,
       tbl.product,
       Sum(tbl.cost)       AS cost,
       Max(stbl.totalcost) AS totalcost,
       Max(stbl.rn)        AS rn
FROM   mytable tbl
       INNER JOIN (SELECT member,
                          Sum(cost) AS totalcost,
                          Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
                   FROM   mytable
                   GROUP  BY member) stbl
               ON stbl.member = tbl.member
GROUP  BY tbl.member, tbl.product
ORDER  BY Max(stbl.rn) 

主查询:输出

+---------+----------+-------+------------+----+
| member  | product  | cost  | totalcost  | rn |
+---------+----------+-------+------------+----+
| Joe     | A123     |  150  |       250  |  1 |
| Joe     | A789     |  100  |       250  |  1 |
| Bob     | C321     |   50  |       175  |  2 |
| Bob     | A123     |  125  |       175  |  2 |
| Anord   | A100     |   50  |       100  |  3 |
| Anord   | A123     |   50  |       100  |  3 |
| Aruna   | A123     |   50  |        50  |  4 |
| Jack    | A123     |   50  |        50  |  5 |
| Jake    | A123     |   50  |        50  |  6 |
| Karma   | A123     |   50  |        50  |  7 |
| Logan   | A123     |   50  |        50  |  8 |
| Paul    | A123     |   50  |        50  |  9 |
| Rock    | A123     |   50  |        50  | 10 |
| Seetha  | A123     |   50  |        50  | 11 |
+---------+----------+-------+------------+----+
Record Count: 14

【讨论】:

感谢您抽出宝贵时间来做这件事。这是我最终得到的答案。【参考方案2】:

您可以使用 rank() 和 partition by,但您可能还需要使用窗口函数:

with temp as (
     SELECT a.Member
    ,a.Product
    ,SUM(a.Cost)
    ,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) 
    as 'Total Cost'
    FROM MyTable a
    GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost] 
  desc) as rank
from temp a
order by rank desc limit 10

【讨论】:

这只会返回 10 行。 @VamsiPrabhala 我以为他只想要前 10 名,如果不是,他可以取消限制。【参考方案3】:

您可以将dense_rank()apply 一起使用:

select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
             dense_rank() over (order by TotalCost desc) as seq 
      from MyTable mt cross apply
           (select sum(mt1.Cost) as TotalCost
            from MyTable mt1 
            whete mt1.member = mt.member 
           ) mt1
    ) mt
where mt.seq <= 10;

【讨论】:

【参考方案4】:

使用子查询获取 TOP 10 总成本并加入您的查询:

SELECT
  t.Member, t.Product, t.Cost, g.[Total Cost]  
FROM (
  SELECT Member, Product, SUM(Cost) as Cost
  FROM MyTable 
  GROUP BY Member, Product
) t INNER JOIN (
  SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
  FROM MyTable 
  GROUP BY Member
  ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC

根据您的要求,您可以使用:

SELECT TOP (10) WITH TIES...

【讨论】:

【参考方案5】:
    您不必从同一个表中选择两次。使用SUM OVER 获取每个成员的总数。 使用 DENSE_RANK 获得总排名(最高总分 = 1,第二高总分 = 2,...)。 使用TOP(10) WITH TIES 获取总数前十的所有行。

查询:

select top(10) with ties *
from
(
  select
    member,
    product,
    sum(cost),
    sum(sum(cost)) over (partition by member) as total_cost
  from mytable
  group by member, product
) results
order by dense_rank() over (order by total_cost) desc;

【讨论】:

感谢SUM OVER 的提示。它也会在其他方面帮助我!【参考方案6】:

如果您希望恰好有 10 个客户,即使有平局,那么对 Thorsten 的方法稍作改动即可:

select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
             sum(sum(cost)) over (partition by member) as total_cost
       from t
       group by member, product
      ) t
order by dense_rank() over (order by total_cost) desc, member;

添加member 作为第二个键可能看起来像是一个小添加。但是,它确保dense_rank() 对于每个成员都是唯一的(当然是由total_cost 排序的)。这反过来又可以保证您获得正好 10 个客户。

【讨论】:

【参考方案7】:

您可以使用如下所示的 dense_rank()。在 SQL Server 2016 中工作。更改限制变量的值以过滤返回的行数。

declare @limit int = 10;
SELECT *
FROM
(
  select x.*,rn = dense_rank() over (order by x.TotalCost desc)
  from (

    SELECT a.Member
        ,a.Product
        ,SUM(a.Cost)
        ,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
    FROM MyTable a
    GROUP BY a.Member
        ,a.Product
    ORDER BY [Total Cost] DESC

  ) x
) y
where rn <= @limit
order by rn 

【讨论】:

2016,可能是一个错字。谢谢你的提醒。 @marc_s

以上是关于如何获得分组列的前 10 名?的主要内容,如果未能解决你的问题,请参考以下文章

如何根据不同的分组条件得到两个数量列的总和?

获得指定时间的前一个时间的分组

如何仅按某个列值的前几个字母对 SQL 查询进行分组?

获取每组分组结果的前 n 条记录

获取每组分组结果的前 n 条记录

您如何检索每个分组中的前两条记录