对生成的列进行分组的案例查询

Posted 2023-03-29

技术标签:

【中文标题】对生成的列进行分组的案例查询【英文标题】：Case query with groupings on generated column 【发布时间】：2018-08-01 23:22:45 【问题描述】：

这是我正在处理的一些伪 SQL 的示例。

select count(*) as "count", time2.iso_timestamp - time1.iso_timestamp 
as "time_to_active",
case
when ("time_to_active" >= 1day and "time_to_active" <= 5days) then '1'
when ("time_to_active" >= 6days and "time_to_active" <= 11days) then 
'2'
when ("time_to_active" >= 12days and "time_to_active" <= 20days) then 
'3'
when ("time_to_active" >= 21days and "time_to_active" <= 30days) then 
'4'
when ("time_to_active" >= 31days) then '5'
end as timetoactivegroup
from t
inner join t1 on t.p_id = t1.p_id
join timestamp time1 on t.timestamp_id = t1.id
join timestamp time2 on t1.timestamp_id = t2.id

我实际上是在尝试查询计算列适合某个范围的组。 n 和 y 天之间的订单。我遇到的主要问题是根据分组生成计数。

我可以毫无问题地让选择查询显示计算值。

【问题讨论】：

我不确定您要做什么。您能否详细说明您对样本数据和预期输出的问题？您正在尝试将一些持续时间分组到不同的组中，然后您想计算每个组包含多少个元素？曾经我遇到过类似的问题。我将组作为范围类型放入一个表中，并使用JOIN ON a.range @> b.element 加入它，结果为a.id as group_id。第二步：group by group_id 【参考方案1】：

postgresql 不允许您按别名进行分组，因此您需要在 group by 子句中重复分组表达式。

GROUP BY case
when ("time_to_active" >= 1day and "time_to_active" <= 5days) then '1'
when ("time_to_active" >= 6days and "time_to_active" <= 11days) then 
'2'
when ("time_to_active" >= 12days and "time_to_active" <= 20days) then 
'3'
when ("time_to_active" >= 21days and "time_to_active" <= 30days) then 
'4'
when ("time_to_active" >= 31days) then '5'
end

或者您可以按列号分组：

 GROUP BY 3

【讨论】：

在雪花数据库中，您可以按列别名进行分组。但由于命名了两个数据库，因此令人困惑。为什么这个问题被标记为 postgresql？【参考方案2】：

忽略伪 SQL（时间码），也忽略表连接，这里指的是一个未命名的表 T2

因此，如果您有一些带有两个时间戳的行 timestamp_a 早于 timestamp_b 那么我看到您可能遇到的错误是通过将差异作为选定列 time2.iso_timestamp - time1.iso_timestamp as "time_to_active", 您有两列您需要分组，但您实际上并不希望在您的答案中使用time_to_active，否则聚合答案的案例块没有多大意义。

因此，如果我有一个有几行的表（这只是代表您连接的表的外观......）

create or replace table t (timestamp_a timestamp_ntz, timestamp_b timestamp);

insert into t values ('2018-11-10','2018-11-11')
   ,('2018-11-08','2018-11-11')
   ,('2018-10-08','2018-11-11');

select datediff('day', timestamp_a, timestamp_b) as time_to_active from t;

给出1,3,34，从而将它们包装到子选择中（也可以表示为 CTE）

select case when (time_to_active >= 1 and time_to_active < 6) then '1'
          when (time_to_active >= 6 and time_to_active < 12) then '2'
          when (time_to_active >= 12 and time_to_active < 21) then '3'
          when (time_to_active >= 21 and time_to_active < 31) then '4'
          when (time_to_active >= 31) then '5'
    end as time_to_active_group
    ,count(*) as count 
from (
    select datediff('day', timestamp_a, timestamp_b) as time_to_active from t
) as A
group by time_to_active_group;

给予：

 1, 2
 5, 1

因为我们在 >= 31 存储桶中有 2 行介于 1-5 和 1 之间。

另一个问题，您是否没有处理“同一天”或结束时间早于开始时间的时间戳，即time_to_active <= 0

【讨论】：

以上是关于对生成的列进行分组的案例查询的主要内容，如果未能解决你的问题，请参考以下文章

对查询结果分组

数据量太大，分页查询变慢，有啥优化查询的方法吗

COUNT(*)分组查询，男女两组人数各多少对where进行筛选

【MySQL】分组查询(GROUP BY)

对不在有子句中的列进行分组

分组查询和连接查询