雅典娜 DISTINCT 在一行

Posted 2023-03-21

技术标签:

【中文标题】雅典娜 DISTINCT 在一行【英文标题】：Athena DISTINCT on ONE row 【发布时间】：2021-11-19 22:33:24 【问题描述】：

我有一张如下表

| date_partition|   app_id | location_id  | a    | b    |  c       |      d     |app_type   |   ru_today|   ru_this_week|   ru_this_month   |mau    |execution_count|
|   9/20/21     | 17ccc103 |    56a7d682  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   TRUE  |      402    |
|   9/20/21     | 17ccc103 |    56a7d682  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   FALSE |      402    |
|   9/20/21     | 9056ac49 |    f4494101  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   TRUE  |      291    |
|   9/20/21     | 9056ac49 |    f4494101  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   FALSE |      291    |
|   9/20/21     | cf98b87d |    59a8f889  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   FALSE  |  FALSE        |  TRUE        | TRUE  |      1      |
|   9/20/21     | cf98b87d |    59a8f889  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   TRUE  |      1      |
|   9/20/21     | cf98b87d |    59a8f889  | TRUE | TRUE |   FALSE  |    FALSE   |   WEBHOOK |   TRUE   |    TRUE         |  TRUE        |   FALSE |      1      |

我想获得每个项目和每个标志的唯一计数（a、b 和 c 为真）。

例如

9056ac49-5c29-4366-9eb2-64576cb2a9af | f4494101 | 291 | 291 | 291
cf98b87d-3605-42a2-85b9-2993956fe927 | 59a8f889 | 1   | 1   | 1

问题是当我进行 Group BY 时，我有双重计数（因为 2 行的值为 TRUE，但我希望它们只计算一次）。

我如何只为每个获得一行？不确定我是否可以在这里使用WINDOW BY

[更新] 问题是双重计数。例如

这里我只需要计算一次（即只有 402）。如果我进行条件计数，我仍然会计数两次（804）

【问题讨论】：

能否以文本形式提供数据和所需的输出？我更新了。完成:) 【参考方案1】：

您可以在 case 语句中使用“条件”求和：

SELECT date_partition, app_id, location_id, 
    sum(case when a then execution_count else 0 end) as count_a,
    sum(case when b then execution_count else 0 end) as count_b,
    sum(case when c then execution_count else 0 end) as count_c
FROM dataset
GROUP BY date_partition, app_id, location_id

可能在子选择中使用它，distinct 在感兴趣的字段上。

UPD

要排除“重复项”，您需要先将它们过滤掉。例如使用我之前提到的子选择：

SELECT date_partition, app_id, location_id, 
    sum(case when a then execution_count else 0 end) as count_a,
    sum(case when b then execution_count else 0 end) as count_b,
    sum(case when c then execution_count else 0 end) as count_c,
FROM (
   SELECT distinct date_partition, app_id, location_id, a, b, c, execution_count 
   FROM dataset
)
GROUP BY date_partition, app_id, location_id

【讨论】：

问题在于这种方法的双重计数。我更新了问题陈述 @Yogi 检查我答案中的最后一行 - “可能在子选择中使用它，并在感兴趣的领域有所不同。”基本上你应该使用FROM (select date_partition, app_id, location_id,app_id, a,b,c from dataset) 而不是FROM dataset 但这只有在所有值都相同的情况下才有效。但在我的情况下，只有一列被切换，我想忽略那个 @Yogi TBH 没有完全关注。你能提供文本输入吗？我用文本更新了表格。我不能做 DISTINCT。如果您查看我的更新，如果我做 DISTINCT，我会得到两行，但我只想要 1

以上是关于雅典娜 DISTINCT 在一行的主要内容，如果未能解决你的问题，请参考以下文章

SELECT DISTINCT 仅用于 Google 表格查询中的一列

如何计算每个DISTINCT？

SQL 查询 Distinct / Group By 不工作

在红移中具有 DISTINCT 的 listagg

如何在 Oracle SQL 中不使用 distinct 选择从多个 max(case when) 派生的唯一行

二，本章讲解 SELECT DISTINCT 语句(distinct)