BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”

Posted

技术标签:

【中文标题】BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”【英文标题】:BigQuery (Google Analytics data):query two different 'hits.customDimensions.index' in the same 'hits.hitNumber' 【发布时间】:2020-01-09 16:55:22 【问题描述】:

我的目标:

如果以下两个 hits.customDimensions.index 和关联的 hits.customDimensions.value 出现在相同的 hits.hitNumber 中,则会话计数为 1(如果主查询仍然嵌套,则每行都是 1 个会话):

['hits.customDimensions.index' = 43 关联 'hits.customDimensions.value' IN ('login', 'payment', 'order', 'thankyou')] AND ['hits.customDimensions.index' = 10 关联 'hits.customDimensions.value' = 'checkout' [在同一个 hits.hitNumber]

我的问题

我不知道如何在没有不同 WITH 表的情况下在一个子查询中查询相同 hits.hitNumber 中的两个不同 hits.customDimensions.value。如果可能的话,我敢肯定,查询将非常简单和简短。因为我不知道如何在子查询中查询这个用例,所以我使用了一个总计为 5 个 WITH 表的解决方法。 我希望有一种查询此用例的简单方法

解释变通方法查询:

Table1:查询除“problem-metric”之外的所有内容

表 2-3:每个表查询一个 hits.customDimensions.index,关联的 hits.customDimensions.value 过滤为正确的值,sessionId 和 hitNumber

table4:根据日期、sessionID 和 hitNumber 将表 2 与表 3 左连接。基本上,如果 hitNumber 与 table2 和 table3 中的 sessionId 相结合,我算 1

table5:left join table1和table4来合并数据

#Table1 - complete data except session_atleast_loginCheckout
WITH
  prepared_data AS (
  SELECT
    date,
    SUM((SELECT 1 FROM UNNEST(hits) WHERE CAST(eCommerceAction.action_type AS INT64) BETWEEN 4 AND 6 LIMIT 1)) AS sessions_atleast_basket, 
    #insert in this row query for sessions_atleast_loginCheckout
    SUM((SELECT 1 FROM UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd WHERE index = 43 AND value IN ('payment', 'order', 'thankyou') LIMIT 1)) AS sessions_atleast_payment,
  FROM
    `big-query-221916.172008714.ga_sessions_*`
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND totals.visits = 1 
  GROUP BY
    date


#Table2 - data for hits.customDimensions.index = 10 AND associated hits.customDimensions.value = 'checkout' with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index10_pagetype_data AS (
  SELECT
    date AS date,
    CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
    h.hitNumber AS hitNumber,
    IF(hcd.value IS NOT NULL, 1, NULL) AS pagetype_checkout
  FROM
    `big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 10 AND VALUE = 'checkout'  AND h.type = 'PAGE' AND totals.visits = 1),


#Table3 - data for hits.customDimensions.index = 43 AND associated hits.customDimensions.value IN ('login', 'register', 'payment', 'order','thankyou') with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index43_pagelevel1_data AS (
  SELECT
    date AS date,
    CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
    h.hitNumber AS hitNumber,
    IF(hcd.value IS NOT NULL, 1, NULL) AS pagelevel1_login_to_thankyou
  FROM
    `big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 43 AND VALUE IN ('login', 'register', 'payment', 'order', 'thankyou') AND h.type = 'PAGE'  
),


#table4 - left join table2 and table 3 on sessionId and hitNumber to get sessions_atleast_loginCheckout
loginChackout_output_data AS(
  SELECT
    a.date AS date,
    COUNT(DISTINCT a.sessionId) AS sessions_atleast_loginCheckout 
  FROM
    loginCheckout_index10_pagetype_data AS a
  LEFT JOIN 
    loginCheckout_index43_pagelevel1_data AS b 
  ON
    a.date = b.date AND
    a.sessionId = b.sessionId AND
    a.hitNumber = b.hitNumber
  WHERE
    pagelevel1_login_to_thankyou IS NOT NULL
  GROUP BY
    date



#table5 - leftjoin table1 with table4 to get all data together
SELECT
  prep.date,
  prep.sessions_atleast_basket,
  log.sessions_atleast_loginCheckout,
  prep.sessions_atleast_payment
FROM
    prepared_data AS prep
  LEFT JOIN
    loginChackout_output_data as log
  ON
    prep.date = log.date AND


【问题讨论】:

为什么要减少 CTE 的数量?在我看来,你在每一个中都封装了不同的逻辑,这将使你的队友(或 6 个月后的你)更容易阅读。新线路很便宜,脑力很贵。 为了安全起见,我目前一个人,经验不足。我也不确定是否有其他方法(可以理解和合乎逻辑)。 【参考方案1】:

这有点像《盗梦空间》,但记住unnest() 的输入是一个数组而输出是表格行可能会有所帮助...

SELECT
  SUM(totals.visits) as sessions
FROM
  `big-query-221916.172008714.ga_sessions_*`
WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) 
  AND -- the following two hits.customDimensions.index and associated hits.customDimensions.value appear in the same hits.hitNumber
    (SELECT COUNT(1)>0 as hitsCountMoreThanZero FROM UNNEST(hits) AS h
     WHERE 
       -- index 43, value IN ('login', 'payment', 'order', 'thankyou')
       (select count(1)>0 from unnest(h.customdimensions) where index=43 and value IN ('login', 'payment', 'order', 'thankyou'))
       AND
       -- index 10, value = 'checkout'
       (select count(1)>0 from unnest(h.customdimensions) where index=10 and value='checkout')
    )
GROUP BY
  date

【讨论】:

以上是关于BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”的主要内容,如果未能解决你的问题,请参考以下文章

谷歌分析-BigQuery

如何在 BigQuery 中导出所有谷歌分析数据 [重复]

谷歌分析 Bigquery 导出

BigQuery(谷歌分析数据):在同一个“hits.hitNumber”中查询两个不同的“hits.customDimensions.index”

BigQuery 数据与谷歌分析

谷歌分析到 Bigquery 同一个项目的两个视图