如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值

Posted

技术标签:

【中文标题】如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值【英文标题】:How to get distinct values on GROUP_CONCAT using Google Big Query 【发布时间】:2015-04-22 16:21:27 【问题描述】:

在 BigQuery 中使用 GROUP_CONCAT 时,我试图获取不同的值。

我将使用一个更简单的静态示例重新创建这种情况:

编辑:我修改了示例以更好地代表我的实际情况:2 列带有 group_concat 需要区分:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category

这个例子返回:

Row category    ids products
1   a   1,2,3,1 car,car,car,truck
2   b   4,5,6   car,car,bike

我想去掉找到的重复值,返回如下:

Row category    ids products 
1   a   1,2,3   car,truck
2   b   4,5,6   car,bike

mysql 中,GROUP_CONCAT 有一个 DISTINCT OPTION,但在 BigQuery 中没有。

有什么想法吗?

【问题讨论】:

Syntax to run a distinct GROUP_CONCAT in Google Bigquery 的可能重复项 我觉得有点像但又不完全一样,不过感谢指点@Pentium10 【参考方案1】:

这是使用UNIQUE 范围聚合函数删除重复项的解决方案。注意,为了使用它,首先我们需要使用NEST 聚合构建一个REPEATED

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)

【讨论】:

完美的莫莎!我从未听说过 UNIQUE 函数。它完美无缺!谢谢! 我认为你不需要做 NEST 子选择 我认为这个解决方案是旧版 SQL 语法,最好提一下。此解决方案不适用于 BigQuery 的标准 SQL。有没有人有标准 SQL 的解决方案?【参考方案2】:

在应用 group_concat 之前删除重复项将达到您想要的结果:

    SELECT 
      category, 
      GROUP_CONCAT(id) as ids
    FROM (  
    SELECT category, id
    FROM 
     (SELECT "a" as category, "1" as id),
     (SELECT "a" as category, "2" as id),
     (SELECT "a" as category, "3" as id),
     (SELECT "b" as category, "4" as id),
     (SELECT "b" as category, "5" as id),
     (SELECT "b" as category, "6" as id),
     (SELECT "a" as category, "1" as id),
    GROUP BY 
      category, id
    )
    GROUP BY 
      category

【讨论】:

感谢 Ahmed,它适用于单列,但在我的实际情况下,我需要 2 个不同的不同列。我已经编辑了问题以显示问题。【参考方案3】:

在标准 SQL(首选 BigQuery 方言)中,解决方案是:

SELECT 
    string_agg(distinct(q.product), ', ') as products_distinct

FROM 
    (
        (SELECT "a" as category, "1" as id, "car" as product)
        union all
        (SELECT "a" as category, "2" as id, "car" as product)
        union all
        (SELECT "a" as category, "3" as id, "car" as product)
        union all
        (SELECT "b" as category, "4" as id, "car" as product)
        union all
        (SELECT "b" as category, "5" as id, "car" as product)
        union all
        (SELECT "b" as category, "2" as id, "bike" as product)
        union all
        (SELECT "a" as category, "1" as id, "truck" as product)
    ) as q

【讨论】:

以上是关于如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Google Big Query 中正确使用 GROUP BY 命令?

如何使用计划查询刷新 Google Big Query 中的现有表?

如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值

使用 Google Big Query 在 Google App 脚本上超过最大执行时间

如何在使用 Python (PyCharm) 查询 Google Big Query 时修复 CERTIFICATE_VERIFY_FAILED 错误

如何在 Google Big Query 中的多个列上执行模式功能