Big Query 透视和聚合重复字段
Posted
技术标签:
【中文标题】Big Query 透视和聚合重复字段【英文标题】:Big Query pivot and aggregate repeated fields 【发布时间】:2017-08-11 19:05:06 【问题描述】:我想旋转“unitId”、“firebase_screen_class”字段,以便每个字段出现在单独的列中:
SELECT
event.name,
event_param.value.string_value AS ad_unit,
COUNT(*) AS event_count
FROM
`app_events_20170510`,
UNNEST(event_dim) AS event,
UNNEST(event.params) as event_param
WHERE
event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2
我使用旧版 SQL 使用了以下查询,但它没有显示正确的聚合结果:
SELECT event_name, ad_unit, count(*) FROM
(
SELECT
event_dim.name as event_name,
MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
[app_events_20170510]
WHERE
event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2
我正在寻找以下输出:
_________________________________________________________________________
| event_dim.name | unitId | screen_class | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested | hpg | socialFeed | 520 |
|__________________|________________|______________________|_____________|
| Ad_shown | hpg | chat | 950 |
|__________________|________________|______________________|_____________|
| Ad_requested | hni | chat | 740 |
|__________________|________________|______________________|_____________|
所有事件Ad_requested
、Ad_shown
和Ad_clicked
的参数具有相同的键(unitId
、screen_class
),并且每个键的值也相同(unitId
:hpg
、@987654332 @/screen_class
:socialFeed
,chat
)
【问题讨论】:
Select several event params in a single row for Firebase events stored in Google BigQuery的可能重复 【参考方案1】:以下是 BigQuery 标准 SQL
#standardSQL
WITH `aggregation` AS (
SELECT
event.name,
event_param.key,
COUNT(*) AS event_count
FROM
`app_events_20170510`,
UNNEST(event_dim) AS event,
UNNEST(event.params) AS event_param
WHERE
event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
AND event_param.key IN ('unitId', 'firebase_screen_class','house')
GROUP BY 1, 2
)
SELECT
name,
MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name
根据 cmets 中的说明进行更新:
#standardSQL
SELECT
event.name,
(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4
... 出于好奇,我尝试使用旧版 SQL 复制查询 ... -
为 BigQuery Legacy SQL 添加了版本(纯粹出于学习目的,希望帮助那些考虑迁移到标准 SQL 的人,因为这里现在提供了相同任务的两个版本)
#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
SELECT event_dim.name AS name,
MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4
【讨论】:
感谢您的回答。我运行了查询,但这不是我想要实现的。我在找:event_name | unitId.values | house.values | firebase_screen_class.values| count_event
我现在明白你的意思了
是的,对不起。我在完成之前不小心发布了我的评论:-)
我刚刚意识到 - 我不知道你所说的 xxx.values 是什么意思。它是各个键的 string_values 列表还是其他东西。我认为您应该提供更多详细信息/输出示例!
@DorianRoy - 尝试将 `app_events_20170510` 替换为 (SELECT 以上是关于Big Query 透视和聚合重复字段的主要内容,如果未能解决你的问题,请参考以下文章
MYSQL插入数据INSERT INTO时如何检测某字段重复后再决定是不是执行?