取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器

Posted

技术标签:

【中文标题】取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器【英文标题】:Unnest hits and Unnesting session scoped custom dimension BigQuery code filter 【发布时间】:2019-06-12 11:34:00 【问题描述】:

我正在尝试根据具有特定自定义维度值的用户过滤漏斗。可悲的是,有问题的自定义维度是会话范围的,而不是基于命中的,所以我不能在这个特定的查询中使用 hits.customDimensions。做到这一点并达到预期结果的最佳方法是什么? 找到我到目前为止的进展:

#标准SQL 选择 SUM((SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page' LIMIT 1)) One_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/two - Page' LIMIT 1)) Two_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/three - Page' LIMIT 1)) Three_Page, SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE page.pagePath = '/one - Page') AND page.pagePath = '/four - Page' LIMIT 1)) Four_Page FROM `xxxxxxx.ga_sessions_*`, UNNEST(命中) AS h, UNNEST(customDimensions) AS cusDim 在哪里 '20190320' 和 '20190323' 之间的 _TABLE_SUFFIX 和 h.hitNumber = 1 AND cusDim.index = 6 AND cusDim.value IN ('60','70)

【问题讨论】:

【参考方案1】:

使用自定义维度进行细分

您可以根据自定义维度中的条件过滤会话。只需编写一个子查询计数感兴趣的案例并设置为“>0”。示例数据示例:

SELECT
  fullvisitorid,
  visitstarttime,
  customdimensions
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
  -- there should be at least one case with index=4 and value='EMEA' ... you can use your index and desired value
  -- unnest() turns customdimensions into table format, so we can apply SQL to this array
  (select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
limit 100

您注释 WHERE 语句以查看所有数据。

漏斗

首先,您可能想大致了解 hits 数组中发生的情况:

SELECT
  fullvisitorid,
  visitstarttime,
  -- get an overview over relevant hits data
  -- select as struct feeds hits fields into a new array created by array()-function
  ARRAY(select as struct hitnumber, page from unnest(hits) where type='PAGE') hits
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
  (select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
  and totals.pageviews>3
limit 100

现在您已确保数据有意义,您可以创建一个包含相关步骤的命中数的漏斗数组:

SELECT
  fullvisitorid,
  visitstarttime,
  -- create array with relevant info
  -- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
  ARRAY(
    select as struct * from
    (select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
    (select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
    (select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
    ) AS funnel
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
WHERE
  (select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
  and totals.pageviews>3
limit 100

为了更清楚起见,将其放入 WITH 语句中,并通过总结相应的案例来运行您的分析:

WITH f AS (
  SELECT
    fullvisitorid,
    visitstarttime,
    totals.visits,
    -- create array with relevant info
    -- cross join hit numbers from step pages to get all combinations so that we can check later which came after the other
    ARRAY(
      select as struct * from
        (select hitnumber as step1 from unnest(hits) where type='PAGE' and page.pagePath='/home') left join
        (select hitnumber as step2 from unnest(hits) where type='PAGE' and page.pagePath like '/google+redesign/%') on true left join
        (select hitnumber as step3 from unnest(hits) where type='PAGE' and page.pagePath='/basket.html') on true
      ) AS funnel
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170505` t
  WHERE
    (select count(1)>0 from unnest(customdimensions) where index=4 and value='EMEA')
    and totals.pageviews>3
)

SELECT 
  COUNT(DISTINCT fullvisitorid) as users,
  SUM(visits) as allSessions,
  SUM( IF(array_length(funnel)>0,visits,0) ) sessionsWithFunnelPages,
  SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null ) ,visits,0) ) sessionsWithStep1,
  SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 ) ,visits,0) ) sessionsFunnelToStep2,
  SUM( IF( (select count(1)>0 from unnest(funnel) where step1 is not null and step1<step2 and step2<step3 and step1<step3) ,visits,0) ) sessionsFunnelToStep3
FROM f

使用前请测试。

【讨论】:

马丁非常详细的解决方案。感谢您花时间整理。我有问题: 1. BQ 中返回的数字比 GA 中返回的数字低约 7%,我预计 BQ 中的数字会更高,有什么想法吗? 2. 如果这个漏斗扩展到 14 个步骤,这部分代码: where step1 is not null and step1 必须像这样重写: where step1 is not null和 step1 1. GA 检查页面紧随其后的是下一页,而这种方法允许其他页面位于渠道页面之间。 2.比较是漏斗-确保每个步骤都紧随其后-也许可以优化,14个步骤很多。 3. 计算前需要将array_concat_agg(hits)按fullvisitorid分组 抱歉回到这个 Martin,我仍然试图围绕比较的工作原理以及我的 8 步漏斗结构是否正确。如果您可以在子选择查询中再添加两个步骤(即sessionsFunnelToStep4、sessionsFunnelToStep5)随着漏斗的进展,它们应该如何工作的一些解释,这对我来说非常有用。非常感谢马丁。 只要从数学上想一想——如果你想在 c 之后 b 之后有 a,那么你需要确保 a 小于 b 并且 b 小于 c——但还要确保 a 小于 c!因为在 b 再次感谢 Martin 的详细回复,我完全明白了,更多的步骤会带来更多的比较,例如 12 步漏斗可能会变得有点混乱和复杂。与这位 Martin 合作得很好。

以上是关于取消嵌套命中和取消嵌套会话范围的自定义维度 BigQuery 代码过滤器的主要内容,如果未能解决你的问题,请参考以下文章

取消自定义维度的嵌套时,GA 网页浏览量与 BigQuery 不匹配?

如果 MATLAB Rb2020 中的行和列维度不一致,如何取消嵌套具有嵌套数据和文本内容的元胞数组?

在 Bigquery 中有效地取消嵌套值? (选择/案例/其他?)

Google Analytics 和 BigQuery 之间的会话不匹配,同时将 hits 和 hits.product 取消嵌套

将嵌套的自定义维度列数据转置为行 Bigquery

在 BigQuery 中取消嵌套多个嵌套字段