返回跨不同组的时间增量的中值
Posted
技术标签:
【中文标题】返回跨不同组的时间增量的中值【英文标题】:Returning median values of time deltas across different groups 【发布时间】:2019-05-16 14:00:41 【问题描述】:尝试计算我的数据表中不同步骤之间的范围,并使用此 SQL 代码返回每个计算的中位数:
SELECT median(datediff(seconds,one,two)) as step_one,
median(datediff(seconds,two,three)) as step_two,
FROM Table
这将返回以下错误消息:
[0A000][500310] 亚马逊无效操作:在组 ORDER 内 聚合函数的 BY 子句必须相同; java.lang.RuntimeException: com.amazon.support.exceptions.ErrorException:亚马逊无效 操作:组内 ORDER BY 子句的聚合函数必须 一样的;
注意:不过,我可以返回一个中值。
这是我的数据框示例:
one two three
2015-12-14 19:01:58.014247 2015-12-21 17:36:06.187302 2015-12-14 19:10:00.040057 2015-12-14 19:03:18.153519
2016-01-02 05:18:50.351975 2016-01-02 05:26:10.660299 2016-01-02 05:22:58.353365 2016-01-02 05:19:34.915794
2016-02-08 07:29:23.938046 2016-02-08 07:41:42.016819 2016-02-08 07:31:23.899776 2016-02-08 07:30:03.168844
2016-02-25 18:25:39.223014 2016-02-25 18:31:07.087808 2016-02-25 18:29:02.490969 2016-02-25 18:26:20.188472
2015-11-26 12:02:27.033141 2015-11-26 12:07:52.813699 2015-11-26 12:06:33.106484 2015-11-26 12:03:09.152853
2015-12-18 08:44:13.184319 2015-12-18 13:10:51.707354 2015-12-18 13:09:35.938711 2015-12-18 13:02:22.650966
2016-01-31 06:41:55.165849 2016-01-31 06:44:58.004319 2016-01-31 06:43:25.923505 2016-01-31 06:42:29.955232
2016-02-15 12:22:29.051259 2016-02-22 09:29:15.649721 2016-02-22 08:40:45.221558 2016-02-16 06:52:52.368139
期望的结果是一到二和二到三之间的中值时间增量(实际数据中有更多列)
【问题讨论】:
median 是来自 amazon-redshift 的窗口函数:docs.aws.amazon.com/redshift/latest/dg/r_WF_MEDIAN.html 你需要放置分区。 【参考方案1】:如果语句包含对基于排序的聚合函数(LISTAGG、PERCENTILE_CONT 或 MEDIAN)的多次调用,则它们都必须使用相同的 ORDER BY 值。请注意,MEDIAN 对表达式值应用了隐式 order by。
从 https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html
【讨论】:
【参考方案2】:对于此查询,由于没有分组依据,您可以简单地将查询分为两部分:
Select step_one,
step_two
From
(SELECT median(datediff(seconds,one,two)) as step_one
FROM Table) as a,
(SELECT median(datediff(seconds,two,three)) as step_two,
FROM Table) as b
但在更复杂的情况下,在 select 中有一个 group by part 的情况下,我找到了解决这个问题的方法。考虑下表:
create table test321 (i int, j int, k int, l int);
insert into test321 values(null, null, null, null);
insert into test321 values(null, 13, null, null);
insert into test321 values(17, null, null, null);
insert into test321 values(null, 15, null, 14);
insert into test321 values(15, null, null, 15);
insert into test321 values(null, 14, 10, null);
insert into test321 values(14, null, 11, null);
insert into test321 values(null, 16, 12, 12);
insert into test321 values(16, null, 13, 13);
insert into test321 values(1, 1, 1, 1);
insert into test321 values(2, 2, 1, 2);
insert into test321 values(3, 3, 1, 3);
insert into test321 values(4, 4, 2, 1);
insert into test321 values(5, 5, 2, 2);
insert into test321 values(6, 6, 2, 3);
insert into test321 values(7, 7, 3, 1);
insert into test321 values(8, 8, 3, 2);
insert into test321 values(9, 9, 3, 3);
insert into test321 values(10, 10, 4, 1);
insert into test321 values(11, 11, 4, 2);
insert into test321 values(12, 12, 4, 3);
假设我们正在寻找:
select k, l, medin(i), median(j)
from test321
group by k, l
那么一般的解决方案是:
Select case when a1.kstatus = -1 then null else a1.k end k,
case when a1.lstatus = -1 then null else a1.l end l,
medi,
medj
From ( Select coalesce(k, (select max(k) k from test321 where k is not null)) k,
case when a.k is not null then 0 else -1 end kstatus,
coalesce(l, (select max(l) l from test321 where l is not null)) l,
case when a.l is not null then 0 else -1 end lstatus,
median(i) medi
From (
select i, j, k, l
from test321
) as a
group by k, l
) as a1
inner join
( Select coalesce(k, (select max(k) l from test321 where k is not null)) k,
case when a.k is not null then 0 else -1 end kstatus,
coalesce(l, (select max(l) l from test321 where l is not null)) l,
case when a.l is not null then 0 else -1 end lstatus,
median(j) medj
From (
select i, j, k, l
from test321
) as a
group by k, l
) as a2
on (
a1.k = a2.k and
a1.l = a2.l and
a1.kstatus = a2.kstatus and
a1.lstatus = a2.lstatus
)
;
希望这会有所帮助。
【讨论】:
以上是关于返回跨不同组的时间增量的中值的主要内容,如果未能解决你的问题,请参考以下文章
Python实现对于给定的输入,保证和为 target 的不同组合数
Python实现对于给定的输入,保证和为 target 的不同组合数