Bigquery 事件分析加入 subselect 语句
Posted
技术标签:
【中文标题】Bigquery 事件分析加入 subselect 语句【英文标题】:Bigquery event analytics join in subselect statement 【发布时间】:2016-07-11 19:50:08 【问题描述】:我正在尝试从 bigquery 返回一个查询结果,该结果返回会话发生的事件数。我一直在参考以下文章:
http://developer.streak.com/2013/11/using-google-bigquery-for-event-tracking.html
数据库架构非常简单 [sessionId, eventType, createdAt] 返回的结果集类似于谷歌分析中的事件工作流。就像是 [sessionId, num_event1, num_event2, ...]
该方法是按事件类型和时间戳生成子查询,然后创建附加子查询,将每个事件子查询的结果连接起来。我可以单独执行 Step1、step2、step3 子查询:
SELECT COUNT(first_event_timestamp) AS number_first_events,
COUNT(second_event_timestamp) AS number_second_events,
COUNT(third_event_timestamp) AS number_third_events
FROM
(SELECT eventUid AS eventUid1,
createdAt AS timestamp1
FROM [events_table]
WHERE eventType = 'first-event') step1,
(SELECT eventUid AS eventUid2,
createdAt AS timestamp2
FROM [events_table]
WHERE eventType = 'second-event') step2,
(SELECT
eventUid as sessionId3,
createdAt as timestamp3
FROM
[events_table]
WHERE
eventType = "third_event") step3
添加步骤 1_2,步骤 1_2_3 是我碰壁的地方。我收到表中缺少数据集名称的错误。这是完整的查询:
SELECT COUNT(first_event_timestamp) AS num_first,
COUNT(second_event_timestamp) AS num_second,
COUNT(third_event_timestamp) AS num_third
FROM (SELECT
sessionId
first_event_timestamp,
second_event_timestamp,
third_event_timestamp
FROM steps1_2_3
GROUP BY sessionId),
(SELECT
sessionId AS sessionId1,
createdAt AS timestamp1
FROM
[events_table]
WHERE
eventType = "first_event") step1, (SELECT
eventUid AS sessionId2,
createdAt AS timestamp2
FROM
[events_table]
WHERE
eventType = "second_event") step2, (SELECT
eventUid AS sessionId3,
createdAt AS timestamp3
FROM
[events_table]
WHERE
eventType = "third_Event") step3, (SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) AS timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2) ) steps1_2, (SELECT sessionId1 as sessionId,
timestamp1 as first_event_timestamp,
timestamp2 as second_event_timestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as third_event_timestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) steps1_2_3
理想的结果集如下所示: sessionId num_first_event num_second_event num_third_event S1 1 空 空 S2 2 3 空 S3 4 5 6
我的第一个问题是是否可以加入子查询step1_2,steps1_2_3?
在 bigquery 中实现诸如工作流之类的事件的替代方法,而不是计算时间戳的数量?
非常感谢任何提示或建议的文档 此外,感谢您的时间和考虑。
【问题讨论】:
【参考方案1】:怎么样
SELECT
sessionId,
SUM(eventType = 'first-event') AS number_first_events,
SUM(eventType = 'second-event') AS number_second_events,
SUM(eventType = 'third-event') AS number_third_events
FROM [events_table]
GROUP BY sessionId
【讨论】:
你有机会让它工作吗?还是有问题?以上是关于Bigquery 事件分析加入 subselect 语句的主要内容,如果未能解决你的问题,请参考以下文章