Big Query 透视和聚合重复字段

Posted

技术标签:

【中文标题】Big Query 透视和聚合重复字段【英文标题】:Big Query pivot and aggregate repeated fields 【发布时间】:2017-08-11 19:05:06 【问题描述】:

我想旋转“unitId”、“firebase_screen_class”字段,以便每个字段出现在单独的列中:

SELECT
  event.name,
  event_param.value.string_value AS ad_unit,
  COUNT(*) AS event_count
FROM
  `app_events_20170510`, 
  UNNEST(event_dim) AS event, 
  UNNEST(event.params) as event_param
WHERE
  event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
  and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2

我使用旧版 SQL 使用了以下查询,但它没有显示正确的聚合结果:

SELECT event_name, ad_unit, count(*) FROM
(
SELECT
  event_dim.name as event_name,
  MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
  MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
  [app_events_20170510]
WHERE
  event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
  and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2

我正在寻找以下输出:

_________________________________________________________________________
| event_dim.name   | unitId         | screen_class         | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested     | hpg            | socialFeed           |    520      |
|__________________|________________|______________________|_____________|
| Ad_shown         | hpg            | chat                 |    950      |
|__________________|________________|______________________|_____________|
| Ad_requested     | hni            | chat                 |    740      |
|__________________|________________|______________________|_____________|

所有事件Ad_requestedAd_shownAd_clicked 的参数具有相同的键(unitIdscreen_class),并且每个键的值也相同(unitIdhpg、@987654332 @/screen_class:socialFeed,chat)

【问题讨论】:

Select several event params in a single row for Firebase events stored in Google BigQuery的可能重复 【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH `aggregation` AS (
  SELECT
    event.name,
    event_param.key,
    COUNT(*) AS event_count
  FROM
    `app_events_20170510`, 
    UNNEST(event_dim) AS event, 
    UNNEST(event.params) AS event_param
  WHERE
    event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
    AND event_param.key IN ('unitId', 'firebase_screen_class','house')
  GROUP BY 1, 2
)
SELECT 
  name,
  MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
  MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
  MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name  

根据 cmets 中的说明进行更新:

#standardSQL
SELECT
  event.name,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
  COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4

... 出于好奇,我尝试使用旧版 SQL 复制查询 ... -

为 BigQuery Legacy SQL 添加了版本(纯粹出于学习目的,希望帮助那些考虑迁移到标准 SQL 的人,因为这里现在提供了相同任务的两个版本

#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
  SELECT event_dim.name AS name,
    MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
    MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
    MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
  FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
  WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4

【讨论】:

感谢您的回答。我运行了查询,但这不是我想要实现的。我在找:event_name | unitId.values | house.values | firebase_screen_class.values| count_event 我现在明白你的意思了 是的,对不起。我在完成之前不小心发布了我的评论:-) 我刚刚意识到 - 我不知道你所说的 xxx.values 是什么意思。它是各个键的 string_values 列表还是其他东西。我认为您应该提供更多详细信息/输出示例! @DorianRoy - 尝试将 `app_events_20170510` 替换为 (SELECT FROM `ios.app_events_20171106` UNION ALL SELECT FROM `android.app_events_20171106`)

以上是关于Big Query 透视和聚合重复字段的主要内容,如果未能解决你的问题,请参考以下文章

查询 Big Query 重复模式

在没有聚合的熊猫数据透视表中重复条目并重命名列行

如何在 Big Query 中安排每日插入作业 [重复]

MYSQL插入数据INSERT INTO时如何检测某字段重复后再决定是不是执行?

BIG QUERY SQL:如何在具有相同唯一键但访问期间不同的访问中查找不同的重复集?

当Big Query加载失败并且CSV表遇到太多错误时,获取更多信息,放弃[重复]