按同一列分组，但以两种不同的方式聚合

Posted 2023-02-16

技术标签:

【中文标题】按同一列分组，但以两种不同的方式聚合【英文标题】：Grouping by same column but aggregating in two different ways 【发布时间】：2019-12-10 18:02:42 【问题描述】：

我有一张带有 schema 的表。我想要一个输出表，其中包含每个帐户的所有交易（T）的计数，以及在某个日期（比如今天 30 天）之后完成的那些交易（每个帐户）的计数。分组列很常见，即Account，但计数策略不同。使用两个不同的查询并连接结果很容易做到这一点，但是是否可以在单个 sql 查询中做到这一点？

输入：

  Account |  T_id |  T_date 
 ---------|-------|--------- 
  A1      |  t1   |     205 
  A1      |  t2   |     420 
  A1      |  t3   |     180 
  A1      |  t5   |     290 
  A2      |  t6   |     100

预期输出 (c=200)：

  Account |  T_count |  T_count_greater_than_c 
 ---------|----------|------------------------- 
  A1      |        3 |                       2 
  A2      |        2 |                       1

要实现计数，我们可以这样做

SELECT Account, COUNT(T_id) 
FROM T 
GROUP BY Account

要实现count>c，我们可以这样做

SELECT Account, COUNT(T_id) 
FROM T 
GROUP BY Account 
HAVING T_date > c

如何将两者结合在一个查询中并防止连接操作？

【问题讨论】：

以表格格式提供您的样本数据和预期输出 Its pretty easy to do this using two different queries and join the results 编写您的 2 个查询，通过这些查询您可以获得您的答案，以及示例输入和输出。第二个查询无效。聚合后没有T_date。必须是SELECT Account, COUNT(*) FROM T WHERE T_date > c GROUP BY Account。顺便说一句：您应该始终使用您正在使用的 DBMS 标记 SQL 问题。你问这个是为了 mysql 吗？ SQL 服务器？甲骨文？ ... 如何使用该样本数据获得 A2 的 2 计数？为什么你谈论某个日期范围内的交易计数，但在你的样本中做一些完全不同的事情？ 【参考方案1】：

在 sum() 中使用 case 或 IF 语句应用条件聚合：

with mydata as(--Replace this with your table
select stack(6,
             1, '2019-08-01', 100,
             1, '2019-08-01', 100,
             1, '2019-07-01', 200,
             2, '2019-08-01', 100,
             2, '2019-08-01', 100,
             2, '2019-07-01', 200
             ) as (account, transaction_date, amount)
)
select account, sum(amount) amount, 
       sum(case when transaction_date < date_sub(current_date,30) then amount else 0 end) amount_beyond_30
  from mydata
 group by account;

结果：

account   amount  amount_beyond_30
    1       400     200
    2       400     200
    Time taken: 40.716 seconds, Fetched: 2 row(s)

抱歉，我的示例是针对 Hive SQL 的，您的数据库中的某些函数可能会有所不同，但希望您现在了解如何在 SQL 中进行条件聚合。

添加示例和 SQ 后更新L：

SELECT Account, COUNT(T_id) as cnt,
       count(case when T_date > 200 then 1 else null end) as T_count_greater_than_c
  FROM T 
 GROUP BY Account

【讨论】：

以上是关于按同一列分组，但以两种不同的方式聚合的主要内容，如果未能解决你的问题，请参考以下文章