不允许聚合聚合 Bigquery

Posted

技术标签:

【中文标题】不允许聚合聚合 Bigquery【英文标题】:Aggregations of aggregations are not allowed Bigquery 【发布时间】:2019-05-07 19:57:56 【问题描述】:

我正在尝试查询我的 google bigquery 分析表。 我感兴趣的字段是嵌套的。 我要检索的结构适合:类别 > 子类别 > 子子类别。

我尝试执行以下操作:

select 
event_param1.value.string_value AS category,
event_param2.value.string_value AS action,
ARRAY_AGG(DISTINCT event_param3.value.string_value) AS label
FROM `analytics.events_20*` AS t,
UNNEST(event_params) as event_param1,
UNNEST(event_params) as event_param2,
UNNEST(event_params) as event_param3
where
parse_date('%y%m%d', _table_suffix) between DATE_sub(current_date(), interval 30 day) and DATE_sub(current_date(), interval 1 day) AND
event_param1.key = 'category' and
event_param2.key = 'action' and
event_param3.key = 'label'
group by category, action
order by category, action

但这会返回一行,其中包含一个类别、一个子类别和一个包含所有子类别的数组。

我希望一行有一个类别,所有子类别,每个子类别的所有子子类别。

这是我得到的一个例子:


    "category": "Apple Watch",
    "action": "Apple Badge Clicked",
    "label": [
      "User Landing Page",
      "Attract",
      "Guest Landing Page",
      "Guest In Workout",
      "User In Workout"
    ]
  ,
  
    "category": "Apple Watch",
    "action": "CONNECTED",
    "label": [
      "User Landing Page",
      "Attract",
      "Guest Landing Page",
      "Guest In Workout",
      "User In Workout"
    ]
  

这就是我想要的:


    "category": "Apple Watch",
    "action": 
        "Apple Badge Clicked": 
            "label": [
                "User Landing Page",
                "Attract",
                "Guest Landing Page",
                "Guest In Workout",
                "User In Workout"
            ]
        ,
        "CONNECTED": 
            "label": [
                "User Landing Page",
                "Attract",
                "Guest Landing Page",
                "Guest In Workout",
                "User In Workout"
            ]
        
    

如果我在另一个 ARRAY_AGG 中尝试一个 ARRAY_AGG,我会得到 Aggregations of aggregations are not allowed Bigquery。 我知道我要问的并不是那么简单,但类似的解决方案也可以。

【问题讨论】:

虽然我对 BigQuery 一无所知,但您似乎需要嵌套 action 并再次嵌套 label (即三级查询)以实现结束结果。用 CTE 试试这个查询:pastebin.com/DEacBHtV. 【参考方案1】:

您需要首先在***别聚合成一个数组。之后,您可以使用子查询重新排列数据:

这是一个不能准确反映您想要的输出,但可以灵活处理各种操作类型:

WITH test AS (
  SELECT * FROM UNNEST([
    STRUCT('Apple Watch' AS category, 'Apple Badge Clicked' as action, 'User Landing Page' as label),
    ('Apple Watch','Apple Badge Clicked','Attract'),
    ('Apple Watch','Apple Badge Clicked','Guest Landing Page'),
    ('Apple Watch','CONNECTED','User Landing Page'),
    ('Apple Watch','CONNECTED','Attract'),
    ('Apple Watch','CONNECTED','User In Workout')
  ])  
),
-- first level of aggregation, prepare for fine tuning
catAgg as (
  SELECT 
    category,
    ARRAY_AGG(struct(action, label)) AS catInfo
  FROM test
  GROUP BY 1
)

SELECT 
  category,
  -- feed sub-query output into an array "action"
  array(SELECT AS STRUCT 
     action as actionType, -- re-group data within the array by field "action"
     array_agg(distinct label) as label
   FROM UNNEST(catInfo)
   GROUP BY 1
   ) as action
FROM catAgg

希望对你有帮助

【讨论】:

试了一天的数据查询。工作!但是当我尝试过去 30 天的数据时,Big Query 抱怨:Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 119% of limit. Top memory consumer(s): aggregate functions and GROUP BY clauses: 98% other/unattributed: 2%

以上是关于不允许聚合聚合 Bigquery的主要内容,如果未能解决你的问题,请参考以下文章

递归查询中不允许使用聚合函数。有没有另一种方法来编写这个查询?

PLS-00653 空行错误(PL/SQL 范围内不允许聚合/表函数)

帐户聚合器/API - 提供信用卡账单到期日并允许跨方付款?

嵌套聚合函数

为啥索引视图不能有 MAX() 聚合?

MongoDB 聚合操作