如何按前 N 个类别与“所有其他”和总计进行汇总?

Posted

技术标签:

【中文标题】如何按前 N 个类别与“所有其他”和总计进行汇总?【英文标题】:How can I aggregate by the top N categories with an "all others" and totals? 【发布时间】:2018-12-17 19:31:33 【问题描述】:

我有按类别列出用户销售的表格(每个销售至少有一个并且可能有多个类别)。

我可以获取用户的热门类别,但我需要两个他/她的前 N ​​个类别和其余类别的用户的统计信息。

我将问题归结为MCVE 如下...

MCVEData Summary:

推销员 SaleID 金额 类别 -------- ------ ------ ------------------ 1 1 2 服务 2 2 2 软件, Support_Contract 2 3 3 服务 2 4 1 零件、服务、软件 2 5 3 Support_Contract 2 6 4 Promo_Gift, Support_Contract 2 7 -2 回扣, Support_Contract 3 8 2 软件, Support_Contract 3 9 3 服务 3 10 1 零件、软件 3 11 3 Support_Contract 3 12 4 Promo_Gift, Support_Contract 3 13 -2 回扣,Support_Contract

MCVE 设置 SQL:

CREATE TABLE Sales      ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags  ([SaleID] int, [TagId] int);
CREATE TABLE Tags       ([TagId] int, [TagName] varchar(100) );

INSERT INTO Sales
    ([Salesman], [SaleID], [Amount])
VALUES
    (1, 1, 2),        (2, 6, 4),        (3, 10, 1),
    (2, 2, 2),        (2, 7, -2),       (3, 11, 3),
    (2, 3, 3),        (3, 8, 2),        (3, 12, 4),
    (2, 4, 1),        (3, 9, 3),        (3, 13, -2),
    (2, 5, 3)
;
INSERT INTO SalesTags
    ([SaleID], [TagId])
VALUES
    (1, 3),           (6, 4),           (10, 1),
    (2, 1),           (6, 5),           (10, 2),
    (2, 4),           (7, 4),           (11, 4),
    (3, 3),           (7, 6),           (12, 4),
    (4, 1),           (8, 1),           (12, 5),
    (4, 2),           (8, 4),           (13, 4),
    (4, 3),           (9, 3),           (13, 6),
    (5, 4)
;
INSERT INTO Tags
    ([TagId], [TagName])
VALUES
    (1, 'Software'),
    (2, 'Parts'),
    (3, 'Service'),
    (4, 'Support_Contract'),
    (5, 'Promo_Gift'),
    (6, 'Rebate')
;

见this SQL Fiddle,我可以得到用户的top N标签like:

WITH usersSales AS (  -- actual base CTE is much more complex
    SELECT  s.SaleID
            , s.Amount
    FROM    Sales s
    WHERE   s.Salesman = 2
)
SELECT Top 3  -- N can be 3 to 10
            t.TagName
            , COUNT (us.SaleID)     AS tagSales
            , SUM (us.Amount)       AS tagAmount
FROM        usersSales us
INNER JOIN  SalesTags st    ON st.SaleID = us.SaleID
INNER JOIN  Tags t          ON t.TagId   = st.TagId
GROUP BY    t.TagName
ORDER BY    tagAmount DESC
            , tagSales DESC
            , t.TagName

-- 显示用户的***类别是:

    “Support_Contract” “服务” “Promo_Gift”

按此顺序,用于用户 2。(以及 Support_Contract、Promo_Gift、用户 3 的软件。)

但是对于 N=3,需要的结果是:

用户 2:

Top Category        Amount    Number of Sales
----------------    ------    ---------------
Support Contract       7             4
Service                4             2
Promo Gift             0             0
- All Others -         0             0
============================================
Totals                11             6

用户 3:

Top Category        Amount    Number of Sales
----------------    ------    ---------------
Support Contract       7             4
Promo_Gift             0             0
Software               1             1
- All Others -         3             1
============================================
Totals                11             6

地点:

    Top Category 是用户在给定销售中排名最高的类别(根据上述查询)。 第 2 行的 Top Category 不包括第 1 行中已计入的销售额。 第 3 行的***类别不包括已在第 1 行和第 2 行中计算的销售额。 等 所有未计入前 N 个类别的剩余销售额都归入- All Others - 组。 底部的总数与用户的整体销售数据相符。

如何汇总这样的结果?

请注意,这是在 MS SQL-Server 2017 上运行的,我无法更改表架构。

【问题讨论】:

这是我很久以来在这里看到的最好的 SQL 相关问题。我希望更多的人会写出这样格式良好且清晰的问题。荣誉。 【参考方案1】:

这是一种方法。 逐步、逐个 CTE 运行查询并检查中间结果以了解其工作原理。

这不是最有效的方法,因为我最终将表连接到自身以消除之前汇总的销售额,但我现在不知道如何避免它。

WITH usersSales 
AS 
(  -- actual base CTE is much more complex
    SELECT
        s.SaleID
        , s.Amount
    FROM Sales s
    WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
    SELECT
        t.TagName
        ,us.Amount
        ,us.SaleID
        ,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
        ,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
    FROM
        usersSales us
        INNER JOIN SalesTags st ON st.SaleID = us.SaleID
        INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
    SELECT
        TagName
        ,Amount
        ,SaleID
        ,TagAmount
        ,TagSales
        ,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
    FROM CTE_Sums
)
,CTE_Final
AS
(
    SELECT
        Main.TagName
        ,Main.Amount
        ,Main.SaleID
        ,Main.TagAmount
        ,Main.TagSales
        ,Main.rnk
        ,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
        ,A.FinalTagSales
    FROM
        CTE_Rank AS Main
        OUTER APPLY
        (
            SELECT
                SUM(Detail.Amount) AS FinalTagAmount
                ,COUNT(*) AS FinalTagSales
            FROM CTE_Rank AS Detail
            WHERE
                Detail.rnk = Main.rnk
                AND Detail.SaleID NOT IN
                (
                    SELECT PrevRanks.SaleID
                    FROM CTE_Rank AS PrevRanks
                    WHERE PrevRanks.rnk < Detail.rnk
                )
        ) AS A
)
SELECT
    TagName
    ,MIN(FinalTagAmount) AS FinalTagAmount
    ,MIN(FinalTagSales) AS FinalTagSales
    ,rnk
    ,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
    TagName
    ,rnk

UNION ALL

SELECT
    '- All Others -' AS TagName
    ,SUM(FinalTagAmount) AS FinalTagAmount
    ,SUM(FinalTagSales) AS FinalTagSales
    ,0 AS rnk
    ,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3

ORDER BY
    SortOrder
    ,rnk
;

CTE_Rank

暂时不要对行进行分组和汇总,而是使用窗口聚合来获取每个标签的排名。稍后我们将需要带有单独数量的单独行 (SaleID) 来过滤正在使用的那些。

+------------------+--------+--------+-----------+----------+-----+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |
| Service          |      1 |      4 |         4 |        2 |   2 |
| Service          |      3 |      3 |         4 |        2 |   2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |
| Software         |      1 |      4 |         3 |        2 |   4 |
| Software         |      2 |      2 |         3 |        2 |   4 |
| Parts            |      1 |      4 |         1 |        1 |   5 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |
+------------------+--------+--------+-----------+----------+-----+

CTE_Final

OUTER APPLY 主要计算过滤在排名较高的标签中遇到的销售。

+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |              7 |             4 |
| Service          |      1 |      4 |         4 |        2 |   2 |              4 |             2 |
| Service          |      3 |      3 |         4 |        2 |   2 |              4 |             2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |              0 |             0 |
| Software         |      1 |      4 |         3 |        2 |   4 |              0 |             0 |
| Software         |      2 |      2 |         3 |        2 |   4 |              0 |             0 |
| Parts            |      1 |      4 |         1 |        1 |   5 |              0 |             0 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |              0 |             0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+

查询结果

只需将排名前 3 的标签加上所有其他标签放在一起。

+------------------+----------------+---------------+-----+-----------+
|     TagName      | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract |              7 |             4 |   1 |         0 |
| Service          |              4 |             2 |   2 |         0 |
| Promo Gift       |              0 |             0 |   3 |         0 |
| - All Others -   |              0 |             0 |   0 |         1 |
+------------------+----------------+---------------+-----+-----------+

【讨论】:

@BrockAdams,请将此示例添加到问题中(例如,作为销售员 3)并显示您的预期结果。一般来说,问题中的重要示例越多越好。我现在得走了,很可能明天我可以看看。 忽略前面的;这种方法似乎给出了正确的结果。谢谢!

以上是关于如何按前 N 个类别与“所有其他”和总计进行汇总?的主要内容,如果未能解决你的问题,请参考以下文章

SQL SEVER分类汇总后如何让小计和总计放到下面

Bash:按前 4 列对 csv 文件进行排序

如何汇总 MongoDB 中的总和以获得总计数?

使用汇总(或行总计)进行 SQL 透视

Python / Pandas是否可以向量化与相对类别中所有其他点的比较?

使用 TSQL 对简单的汇总总计进行 UNPIVOT……这甚至可能吗?