BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”
Posted
技术标签:
【中文标题】BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”【英文标题】:BigQuery (Google Analytics data):query two different 'hits.customDimensions.index' in the same 'hits.hitNumber' 【发布时间】:2020-01-09 16:55:22 【问题描述】:我的目标:
如果以下两个 hits.customDimensions.index 和关联的 hits.customDimensions.value 出现在相同的 hits.hitNumber 中,则会话计数为 1(如果主查询仍然嵌套,则每行都是 1 个会话):
['hits.customDimensions.index' = 43 关联 'hits.customDimensions.value' IN ('login', 'payment', 'order', 'thankyou')] AND ['hits.customDimensions.index' = 10 关联 'hits.customDimensions.value' = 'checkout' [在同一个 hits.hitNumber]
我的问题:
我不知道如何在没有不同 WITH 表的情况下在一个子查询中查询相同 hits.hitNumber 中的两个不同 hits.customDimensions.value。如果可能的话,我敢肯定,查询将非常简单和简短。因为我不知道如何在子查询中查询这个用例,所以我使用了一个总计为 5 个 WITH 表的解决方法。 我希望有一种查询此用例的简单方法
解释变通方法查询:
Table1:查询除“problem-metric”之外的所有内容
表 2-3:每个表查询一个 hits.customDimensions.index,关联的 hits.customDimensions.value 过滤为正确的值,sessionId 和 hitNumber
table4:根据日期、sessionID 和 hitNumber 将表 2 与表 3 左连接。基本上,如果 hitNumber 与 table2 和 table3 中的 sessionId 相结合,我算 1
table5:left join table1和table4来合并数据
#Table1 - complete data except session_atleast_loginCheckout
WITH
prepared_data AS (
SELECT
date,
SUM((SELECT 1 FROM UNNEST(hits) WHERE CAST(eCommerceAction.action_type AS INT64) BETWEEN 4 AND 6 LIMIT 1)) AS sessions_atleast_basket,
#insert in this row query for sessions_atleast_loginCheckout
SUM((SELECT 1 FROM UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd WHERE index = 43 AND value IN ('payment', 'order', 'thankyou') LIMIT 1)) AS sessions_atleast_payment,
FROM
`big-query-221916.172008714.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND totals.visits = 1
GROUP BY
date
#Table2 - data for hits.customDimensions.index = 10 AND associated hits.customDimensions.value = 'checkout' with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index10_pagetype_data AS (
SELECT
date AS date,
CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
h.hitNumber AS hitNumber,
IF(hcd.value IS NOT NULL, 1, NULL) AS pagetype_checkout
FROM
`big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 10 AND VALUE = 'checkout' AND h.type = 'PAGE' AND totals.visits = 1),
#Table3 - data for hits.customDimensions.index = 43 AND associated hits.customDimensions.value IN ('login', 'register', 'payment', 'order','thankyou') with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index43_pagelevel1_data AS (
SELECT
date AS date,
CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
h.hitNumber AS hitNumber,
IF(hcd.value IS NOT NULL, 1, NULL) AS pagelevel1_login_to_thankyou
FROM
`big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 43 AND VALUE IN ('login', 'register', 'payment', 'order', 'thankyou') AND h.type = 'PAGE'
),
#table4 - left join table2 and table 3 on sessionId and hitNumber to get sessions_atleast_loginCheckout
loginChackout_output_data AS(
SELECT
a.date AS date,
COUNT(DISTINCT a.sessionId) AS sessions_atleast_loginCheckout
FROM
loginCheckout_index10_pagetype_data AS a
LEFT JOIN
loginCheckout_index43_pagelevel1_data AS b
ON
a.date = b.date AND
a.sessionId = b.sessionId AND
a.hitNumber = b.hitNumber
WHERE
pagelevel1_login_to_thankyou IS NOT NULL
GROUP BY
date
#table5 - leftjoin table1 with table4 to get all data together
SELECT
prep.date,
prep.sessions_atleast_basket,
log.sessions_atleast_loginCheckout,
prep.sessions_atleast_payment
FROM
prepared_data AS prep
LEFT JOIN
loginChackout_output_data as log
ON
prep.date = log.date AND
【问题讨论】:
为什么要减少 CTE 的数量?在我看来,你在每一个中都封装了不同的逻辑,这将使你的队友(或 6 个月后的你)更容易阅读。新线路很便宜,脑力很贵。 为了安全起见,我目前一个人,经验不足。我也不确定是否有其他方法(可以理解和合乎逻辑)。 【参考方案1】:这有点像《盗梦空间》,但记住unnest()
的输入是一个数组而输出是表格行可能会有所帮助...
SELECT
SUM(totals.visits) as sessions
FROM
`big-query-221916.172008714.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND -- the following two hits.customDimensions.index and associated hits.customDimensions.value appear in the same hits.hitNumber
(SELECT COUNT(1)>0 as hitsCountMoreThanZero FROM UNNEST(hits) AS h
WHERE
-- index 43, value IN ('login', 'payment', 'order', 'thankyou')
(select count(1)>0 from unnest(h.customdimensions) where index=43 and value IN ('login', 'payment', 'order', 'thankyou'))
AND
-- index 10, value = 'checkout'
(select count(1)>0 from unnest(h.customdimensions) where index=10 and value='checkout')
)
GROUP BY
date
【讨论】:
以上是关于BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”的主要内容,如果未能解决你的问题,请参考以下文章
BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”