在clickhouse中与max()聚合时如何连续选择相应的值?

Posted

技术标签:

【中文标题】在clickhouse中与max()聚合时如何连续选择相应的值?【英文标题】:How to select corresponding value in a row when aggregating with max() in clickhouse? 【发布时间】:2021-09-30 12:49:36 【问题描述】:

我在数据库中有一个这样的表:

这是一个UserID的一部分,但实际上有很多。

create table MY_TABLE
(
    UserID Nullable(String),
    OID int,
    TotalHits Nullable(int),
    DaysOfHits Nullable(int),
    UniqPrimaryEvents Nullable(int)
)
engine = Memory;

insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6564023, 4, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6546504, 9, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6538286, 12, 2, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6536273, 8, 2, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6534195, 57, 6, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6528643, 4, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6496311, 7, 2, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6492524, 7, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6475804, 9, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6424164, 5, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6403817, 8, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6403592, 9, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6400394, 13, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6383627, 8, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6364163, 4, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6349018, 7, 1, 0);
insert into MY_TABLE (UserID, OID, TotalHits, DaysOfHits, UniqPrimaryEvents) values ('1000c666-04db-4447-9ea1-ecf1e2275c81', 6270551, 6, 1, 0);

我需要聚合表格,并在结果中包含 agg funcs 的结果和来自 OID 的相应值:

我正在做类似的事情:

SELECT
    UserID,
    uniq(OID) AS UniqObjects,
    sum(TotalHits) AS TotalHits,
    round(avg(DaysOfHits), 2) AS AvgObjectHitDays,
    max(DaysOfHits) AS MaxHitPeriod,
-- here I need OID corresponding to max(DaysOfHits) value
    round((avg(DaysOfHits) / max(DaysOfHits)) * 100, 2) AS PerOfMaxHit

我试过像anyIf(OID, DaysOfHits = max(DaysOfHits)), 这样的东西,但你不能在 agg func 里面有 agg func。

PS Select 的来源是另一个joined select,而不是单个表。

请帮忙!

【问题讨论】:

你能告诉我们你希望你的输出在 desc 中的样子吗? 【参考方案1】:

argMax 是个好东西!我只是一个初学者,对不起)

答案是 argMax(OID, DaysOfHits) - 它会根据要求返回最大 DaysOfHits 的 OID

【讨论】:

正如目前所写,您的答案尚不清楚。请edit 添加其他详细信息,以帮助其他人了解这如何解决所提出的问题。你可以找到更多关于如何写好答案的信息in the help center。

以上是关于在clickhouse中与max()聚合时如何连续选择相应的值?的主要内容,如果未能解决你的问题,请参考以下文章

进行聚合时如何忽略数据框中的特定列

由于 R 中的数据集大小而需要聚合时如何访问未聚合的结果

在 SQLite 中计算多个聚合时可以消除子查询吗?

在 Google BigQuery 中进行聚合时是不是可以运行计算

当我在 R 中使用聚合时,我可以将总和应用于每一行吗?

在 Spark 数据框中聚合时访问窗口外的行