在 Bigquery 中访问主页后找出用户的第二页

Posted

技术标签:

【中文标题】在 Bigquery 中访问主页后找出用户的第二页【英文标题】:Finding out user's second page after visiting homepage in Bigquery 【发布时间】:2019-10-14 10:15:42 【问题描述】:

我想知道用户在 BigQuery 中登陆首页后访问了哪个页面。

以下是我目前想到的查询,但是当我查看第二页时返回的结果与 Google Analytics 不一致(来自 [行为] --> [站点内容] --> [所有页面] -- > 主要维度:着陆页,次要维度:第二页)。

但是,当我查看 Next Pagepath 时,查询结果匹配(从 [Site Content] --> [All Pages] --> Primary Dimension: Landing Page, Secondary Dimension: Next Page path)。

多篇文章说使用[Second page]比[Next pagepath]更合适。

-- Total pageviews by pagepath after landing homepage
#standardSQL
SELECT
  next_page.pagePath AS pagePath,
  COUNT(*) as pageviews
FROM
  (
    SELECT
      CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS session_id,
      hits.page.pagePath AS pagePath,
      hits.hitNumber AS hitNumber,
      hits.type AS type
    FROM
      `GA_data.ga_sessions_*`,
      UNNEST(hits) as hits
    WHERE
      _TABLE_SUFFIX BETWEEN '20190814'
      AND '20191008'
      AND hits.type = 'PAGE'
      AND hits.page.pagePath = '/***/' -- Landing page URL
      AND hits.isEntrance = TRUE
      AND totals.visits = 1
  ) AS landing_hp
  INNER JOIN (
    SELECT
      CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS session_id,
      hits.page.pagePath AS pagePath,
      hits.hitNumber AS hitNumber,
      hits.type AS type
    FROM
      `GA_data.ga_sessions_*`,
      UNNEST(hits) as hits
    WHERE
      _TABLE_SUFFIX BETWEEN '20190814'
      AND '20191008'
      AND hits.type = 'PAGE'
      AND hits.isEntrance IS NULL
      AND totals.visits = 1
  ) AS next_page ON landing_hp.session_id = next_page.session_id
WHERE
  landing_hp.hitNumber < next_page.hitNumber
GROUP BY
  pagePath
ORDER BY
  pageviews DESC 

谁能告诉我为什么会发生这种情况以及我应该使用什么查询?

【问题讨论】:

【参考方案1】:

您的第二个表不一定返回第二页,不是吗?不过,在涉及所有连接的情况下这样做也有点效率低下。

最好切掉所有的连接,使用子查询:

SELECT 
  fullvisitorid -- identify user
  ,visitstarttime -- identify session per user
  -- visitid is timestamp of pre-midnight session
  ,visitstarttime<>visitid AS isMidnightSplitSession 

  -- get hitnumber and pagepath from hits where the type is not event 
  -- limit to one while sorting by hitnumber - offset 1 to get second page
  ,(SELECT AS STRUCT hitnumber, page.pagePath 
     FROM UNNEST(hits) 
     WHERE type<>'EVENT' 
     ORDER BY hitnumber ASC 
     LIMIT 1 OFFSET 1) AS secondPage

  ,(SELECT AS STRUCT hitnumber, page.pagePath FROM UNNEST(hits) 
     WHERE type<>'EVENT' ORDER BY hitnumber ASC LIMIT 1 OFFSET 2) AS thirdPage

  -- no need to left join with all those arrays and bloat up the table
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` t 

-- check that first page = '/home'
WHERE (SELECT page.pagePath FROM UNNEST(hits) WHERE isEntrance=true) = '/home'

  and totals.pageviews>1 -- for testing purpose
LIMIT 1000

【讨论】:

感谢您的浏览。我想详细查看数据,因此是否可以通过总浏览量或会话使用 COUNT(*) 而不是使用 visitid? 我找到了一篇关于如何Replicating The Google Analytics All Pages Report In BigQuery 的文章,但我不太确定是否是您所期望的行为。

以上是关于在 Bigquery 中访问主页后找出用户的第二页的主要内容,如果未能解决你的问题,请参考以下文章

如何通过单击主页中的按钮将数据从第二页传输到第三页

在访问报告的第二列打印第二页

在 word 模板的第二页插入页眉,而第二页尚不存在

访问报告第一页垂直,第二页水平

来自Sqlite的第二页JQM Listview不刷新

放大的弹出窗口在数据表的第二页上不起作用