取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器
Posted
技术标签:
【中文标题】取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器【英文标题】:Unnest hits and Unnesting session scoped custom dimension BigQuery code filter 【发布时间】:2019-06-12 11:34:00 【问题描述】:我正在尝试根据具有特定自定义维度值的用户过滤漏斗。可悲的是,有问题的自定义维度是会话范围的,而不是基于命中的,所以我不能在这个特定的查询中使用 hits.customDimensions。做到这一点并达到预期结果的最佳方法是什么? 找到我到目前为止的进展:
#标准SQL 选择 SUM((SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page' LIMIT 1)) One_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/two - Page' LIMIT 1)) Two_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/three - Page' LIMIT 1)) Three_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/four - Page' LIMIT 1)) Four_Page FROM `xxxxxxx.ga_sessions_*`, UNNEST(命中) AS h, UNNEST(customDimensions) AS cusDim 在哪里 '20190320' 和 '20190323' 之间的 _TABLE_SUFFIX 和 h.hitNumber = 1 AND cusDim.index = 6 AND cusDim.value IN ('60','70)【问题讨论】:
【参考方案1】:使用自定义维度进行细分
您可以根据自定义维度中的条件过滤会话。只需编写一个子查询计数感兴趣的案例并设置为“>0”。示例数据示例:
SELECT
fullvisitorid,
visitstarttime,
customdimensions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
-- there should be at least one case with index=4 and value='EMEA' ... you can use your index and desired value
-- unnest() turns customdimensions into table format, so we can apply SQL to this array
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
limit 100
您注释 WHERE 语句以查看所有数据。
漏斗
首先,您可能想大致了解 hits 数组中发生的情况:
SELECT
fullvisitorid,
visitstarttime,
-- get an overview over relevant hits data
-- select as struct feeds hits fields into a new array created by array()-function
ARRAY(select as struct hitnumber, page from unnest(hits) where type='PAGE') hits
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
limit 100
现在您已确保数据有意义,您可以创建一个包含相关步骤的命中数的漏斗数组:
SELECT
fullvisitorid,
visitstarttime,
-- create array with relevant info
-- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
ARRAY(
select as struct * from
(select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
(select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
(select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
) AS funnel
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
limit 100
为了更清楚起见,将其放入 WITH 语句中,并通过总结相应的案例来运行您的分析:
WITH f AS (
SELECT
fullvisitorid,
visitstarttime,
totals.visits,
-- create array with relevant info
-- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
ARRAY(
select as struct * from
(select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
(select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
(select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
) AS funnel
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
(select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
and totals.pageviews>3
)
SELECT
COUNT(DISTINCT fullvisitorid) as users,
SUM(visits) as allSessions,
SUM( IF(array_length(funnel)>0,visits,0) ) sessionsWithFunnelPages,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null ) ,visits,0) ) sessionsWithStep1,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 ) ,visits,0) ) sessionsFunnelToStep2,
SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 and step2<step3 and step1<step3) ,visits,0) ) sessionsFunnelToStep3
FROM f
使用前请测试。
【讨论】:
马丁非常详细的解决方案。感谢您花时间整理。我有问题: 1. BQ 中返回的数字比 GA 中返回的数字低约 7%,我预计 BQ 中的数字会更高,有什么想法吗? 2. 如果这个漏斗扩展到 14 个步骤,这部分代码: where step1 is not null and step1以上是关于取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器的主要内容,如果未能解决你的问题,请参考以下文章
取消自定义维度的嵌套时,GA 网页浏览量与 BigQuery 不匹配?
如果 MATLAB Rb2020 中的行和列维度不一致,如何取消嵌套具有嵌套数据和文本内容的元胞数组?
在 Bigquery 中有效地取消嵌套值? (选择/案例/其他?)
Google Analytics 和 BigQuery 之间的会话不匹配,同时将 hits 和 hits.product 取消嵌套