在 BigQuery 中创建 Google Analytics“回访用户”指标时出现问题
Posted
技术标签:
【中文标题】在 BigQuery 中创建 Google Analytics“回访用户”指标时出现问题【英文标题】:Issue creating a Google Analytics "Returning Users" metric in BigQuery 【发布时间】:2018-02-14 14:25:06 【问题描述】:采取https://webmasters.stackexchange.com/a/87523上描述的内容
除了我自己的理解之外,我还提出了我认为会被视为“回访用户”的内容
1.首先查询显示在两年时间段内第一次“最近访问”的用户:
SELECT
parsedDate,
CASE
# return fullVisitorId when the first latest visit is between 2 years and today
WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
END fullVisitorId
FROM (
SELECT
# convert the date field from string to date and get the latest date
PARSE_DATE('%Y%m%d',
MAX(date)) parsedDate,
fullVisitorId
FROM
`project.dataset.ga_sessions_*`
WHERE
# only show fullVisitorId if first visit
totals.newVisits = 1
GROUP BY
fullVisitorId)
2.然后单独查询选择特定日期范围内的一些字段:
SELECT
PARSE_DATE('%Y%m%d',
date) parsedDate,
fullVisitorId,
visitId,
totals.newVisits,
totals.visits,
totals.bounces,
device.deviceCategory
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX = "20180118"
3.将这两个查询连接在一起找到“Returning Users”
SELECT
q1.parsedDate date,
COUNT(DISTINCT q1.fullVisitorId) users,
# Default way to determine New Users
SUM(q1.newVisits) newVisits,
# Number of "New Users" based on my queries (matches with default way above)
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, NULL, q2.fullVisitorId)) newUsers,
# Number of "Returning Users" based on my queries
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, q2.fullVisitorId, NULL)) returningUsers
FROM (
(SELECT
PARSE_DATE('%Y%m%d',
date) parsedDate,
fullVisitorId,
visitId,
totals.newVisits,
totals.visits,
totals.bounces,
device.deviceCategory
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX = "20180118") q1
LEFT JOIN (
SELECT
parsedDate,
CASE
# return fullVisitorId when the first latest visit is between 2 years and today
WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
END fullVisitorId
FROM (
SELECT
# convert the date field from string to date and get the latest date
PARSE_DATE('%Y%m%d',
MAX(date)) parsedDate,
fullVisitorId
FROM
`project.dataset.ga_sessions_*`
WHERE
# only show fullVisitorId if first visit
totals.newVisits = 1
GROUP BY
fullVisitorId)) q2
ON q1.fullVisitorId = q2.fullVisitorId)
GROUP BY
date
BQ 中的结果
GA 同期按用户报告划分的未抽样新访问者/回访者报告
问题/问题:
鉴于newVisits
(默认字段)和newUsers
(我的计算)给出的结果相同,这与 GA 报告新访客用户一致。为什么 GA 回访用户和我在 BQ 中计算的 returningUsers
不匹配?这两个甚至可以比较,我错过了什么?
我的方法是最有效、最简洁的方法吗?
有没有更好的方法来获取我缺少的数据?
解决方案
根据 Martin 在下面的回答,我设法在我正在运行的查询的上下文中创建了“回访用户”指标/字段:
SELECT
date,
deviceCategory,
# newUsers - SUM result if it's a new user
SUM(IF(userType="New Visitor", 1, 0)) newUsers,
# returningUsers - COUNT DISTINCT fullvisitorId if it's a returning user
COUNT(DISTINCT IF(userType="Returning Visitor", fullvisitorid, NULL)) returningUsers,
COUNT(DISTINCT fullvisitorid) users,
SUM(visits) sessions
FROM (
SELECT
date,
fullVisitorId,
visitId,
totals.visits,
device.deviceCategory,
IF(totals.newVisits IS NOT NULL, "New Visitor", "Returning Visitor") userType
FROM
`project.dataset.ga_sessions_20180118` )
GROUP BY
deviceCategory,
date
【问题讨论】:
【参考方案1】:Google Analytics(分析)使用用户的近似值 (fullvisitorid) - 即使它说“基于 100%”。使用非抽样报告时,您可以获得更好的用户数量。
另外要提一下:即使totals.visits != 1
也会考虑全访问者,而会话仅在totals.visits = 1
的情况下计算
如果用户是新用户然后又被退回,他们也会被重复计算。意思是,这应该给你正确的数字:
SELECT
totals.newVisits IS NOT NULL AS isNew,
COUNT(DISTINCT fullvisitorid) AS visitors,
SUM(totals.visits) AS sessions
FROM
`project.dataset.ga_sessions_20180214`
GROUP BY
1
如果您想避免重复计算,可以使用此方法,即使用户返回,也将其计为新用户:
WITH
visitors AS (
SELECT
fullvisitorid,
-- check if any visit of this visitor was new - will be used for grouping later
MAX(totals.newVisits ) isNew,
SUM(totals.visits) as sessions
FROM
`project.dataset.ga_sessions_20180214`
GROUP BY 1
)
SELECT
isNew IS NOT NULL AS isNew,
COUNT(1) AS visitors,
sum(sessions) as sessions
FROM
visitors
GROUP BY 1
当然,这些数字仅在总数上与 GA 匹配。
【讨论】:
谢谢马丁,详细的解释。这些计算会很方便!我希望能够有一个返回用户数量的字段。话虽如此,根据您的解释和计算,我设法做到了。也会用这个更新我的帖子。以上是关于在 BigQuery 中创建 Google Analytics“回访用户”指标时出现问题的主要内容,如果未能解决你的问题,请参考以下文章
保存视图无法在 Google BigQuery 中创建有效的输出架构
在 BigQuery 中创建 Google Analytics“回访用户”指标时出现问题
如何在Google Bigquery中创建按日期(每年)分区的表格
有没有办法在 Google Dataflow 中创建具有数据相关架构的 Bigquery 表?
如何在 Google Bigquery 中创建动态更改数据集的查询?
当我想在 Google BigQuery 中创建 StandardSQLTypeName.ARRAY 类型的字段时接收 NPE