如何获得分组列的前 10 名?
Posted
技术标签:
【中文标题】如何获得分组列的前 10 名?【英文标题】:How to get Top 10 for a grouped column? 【发布时间】:2019-08-09 15:27:29 【问题描述】:我的数据是客户和产品的列表,以及每种产品的成本
Member Product Cost
Bob A123 $25
Bob A123 $25
Bob A123 $75
Joe A789 $50
Joe A789 $50
Bob C321 $50
Joe A123 $50
etc, etc, etc
我当前的查询获取每个客户、产品和成本,以及该客户的总成本。它给出的结果如下:
Member Product Cost Total Cost
Bob A123 $125 $275
Bob A1433 $100 $275
Bob C321 $50 $275
Joe A123 $150 $250
Joe A789 $100 $250
我如何才能获得总成本前 10 名,而不仅仅是总体上前 10 名的记录?我的查询是:
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
如果我执行SELECT TOP 10
,它只会给我前 10 行。实际的前 10 名最终将更像是 40 或 50 行。
谢谢!
【问题讨论】:
您是故意在查询中两次使用同一个表(MyTable)还是这是一个错字? 这是故意的。 MyTable 是我使用的唯一表。我在选择中使用选择来获取每个成员的总费用。 如果不同成员之间有联系怎么办? 谢谢大家的回复。显然我不能使用每个人的答案,但我确实从中学到了很多。 【参考方案1】:试试这个。
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
WHERE stbl.rn <= 10
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
在线演示:http://sqlfiddle.com/#!18/87857/1/0
表结构和样本数据
CREATE TABLE mytable
(
member NVARCHAR(50),
product NVARCHAR(10),
cost INT
)
INSERT INTO mytable
VALUES ('Bob','A123','25'),
('Bob','A123','25'),
('Bob','A123','75'),
('Joe','A789','50'),
('Joe','A789','50'),
('Bob','C321','50'),
('Joe','A123','50'),
('Rock','A123','50'),
('Anord','A100','50'),
('Jack','A123','50'),
('Anord','A123','50'),
('Joe','A123','50'),
('Karma','A123','50'),
('Seetha','A123','50'),
('Aruna','A123','50'),
('Jake','A123','50'),
('Paul','A123','50'),
('Logan','A123','50'),
('Joe','A123','50');
子查询 - 每个客户的总成本
SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member
子查询:输出
+---------+------------+----+
| member | totalcost | rn |
+---------+------------+----+
| Joe | 250 | 1 |
| Bob | 175 | 2 |
| Anord | 100 | 3 |
| Aruna | 50 | 4 |
| Jack | 50 | 5 |
| Jake | 50 | 6 |
| Karma | 50 | 7 |
| Logan | 50 | 8 |
| Paul | 50 | 9 |
| Rock | 50 | 10 |
| Seetha | 50 | 11 |
+---------+------------+----+
Record Count: 11
主查询
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost,
Max(stbl.rn) AS rn
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
主查询:输出
+---------+----------+-------+------------+----+
| member | product | cost | totalcost | rn |
+---------+----------+-------+------------+----+
| Joe | A123 | 150 | 250 | 1 |
| Joe | A789 | 100 | 250 | 1 |
| Bob | C321 | 50 | 175 | 2 |
| Bob | A123 | 125 | 175 | 2 |
| Anord | A100 | 50 | 100 | 3 |
| Anord | A123 | 50 | 100 | 3 |
| Aruna | A123 | 50 | 50 | 4 |
| Jack | A123 | 50 | 50 | 5 |
| Jake | A123 | 50 | 50 | 6 |
| Karma | A123 | 50 | 50 | 7 |
| Logan | A123 | 50 | 50 | 8 |
| Paul | A123 | 50 | 50 | 9 |
| Rock | A123 | 50 | 50 | 10 |
| Seetha | A123 | 50 | 50 | 11 |
+---------+----------+-------+------------+----+
Record Count: 14
【讨论】:
感谢您抽出宝贵时间来做这件事。这是我最终得到的答案。【参考方案2】:您可以使用 rank() 和 partition by,但您可能还需要使用窗口函数:
with temp as (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member)
as 'Total Cost'
FROM MyTable a
GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost]
desc) as rank
from temp a
order by rank desc limit 10
【讨论】:
这只会返回 10 行。 @VamsiPrabhala 我以为他只想要前 10 名,如果不是,他可以取消限制。【参考方案3】:您可以将dense_rank()
与apply
一起使用:
select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
dense_rank() over (order by TotalCost desc) as seq
from MyTable mt cross apply
(select sum(mt1.Cost) as TotalCost
from MyTable mt1
whete mt1.member = mt.member
) mt1
) mt
where mt.seq <= 10;
【讨论】:
【参考方案4】:使用子查询获取 TOP 10 总成本并加入您的查询:
SELECT
t.Member, t.Product, t.Cost, g.[Total Cost]
FROM (
SELECT Member, Product, SUM(Cost) as Cost
FROM MyTable
GROUP BY Member, Product
) t INNER JOIN (
SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
FROM MyTable
GROUP BY Member
ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC
根据您的要求,您可以使用:
SELECT TOP (10) WITH TIES...
【讨论】:
【参考方案5】:-
您不必从同一个表中选择两次。使用
SUM OVER
获取每个成员的总数。
使用 DENSE_RANK
获得总排名(最高总分 = 1,第二高总分 = 2,...)。
使用TOP(10) WITH TIES
获取总数前十的所有行。
查询:
select top(10) with ties *
from
(
select
member,
product,
sum(cost),
sum(sum(cost)) over (partition by member) as total_cost
from mytable
group by member, product
) results
order by dense_rank() over (order by total_cost) desc;
【讨论】:
感谢SUM OVER
的提示。它也会在其他方面帮助我!【参考方案6】:
如果您希望恰好有 10 个客户,即使有平局,那么对 Thorsten 的方法稍作改动即可:
select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
sum(sum(cost)) over (partition by member) as total_cost
from t
group by member, product
) t
order by dense_rank() over (order by total_cost) desc, member;
添加member
作为第二个键可能看起来像是一个小添加。但是,它确保dense_rank()
对于每个成员都是唯一的(当然是由total_cost
排序的)。这反过来又可以保证您获得正好 10 个客户。
【讨论】:
【参考方案7】:您可以使用如下所示的 dense_rank()。在 SQL Server 2016 中工作。更改限制变量的值以过滤返回的行数。
declare @limit int = 10;
SELECT *
FROM
(
select x.*,rn = dense_rank() over (order by x.TotalCost desc)
from (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
) x
) y
where rn <= @limit
order by rn
【讨论】:
2016,可能是一个错字。谢谢你的提醒。 @marc_s以上是关于如何获得分组列的前 10 名?的主要内容,如果未能解决你的问题,请参考以下文章