在 BigQuery 中计算当前用户返回率

Posted

技术标签:

【中文标题】在 BigQuery 中计算当前用户返回率【英文标题】:Calculating current user return rate in BigQuery 【发布时间】:2019-03-14 13:44:33 【问题描述】:

我正在尝试使用从 Firebase 导入 BigQuery 的数据计算 CURR(当前用户返回率,请参阅 https://lloydmelnick.com/2019/02/05/lifetime-value-part-26-my-most-valuable-retention-kpis/)。

我尝试创建三个列来确定用户在 2 周前、1 周前和本周是否处于活动状态,但它似乎不起作用。我想查看本周活跃的用户以及 2 周和 3 周前还活跃的用户。

这是我尝试过的查询:

SELECT
  COUNT(DISTINCT user_pseudo_id)
FROM(SELECT
  user_pseudo_id,
  IF( days_from_today >13 AND days_from_today <21, 1, 0) AS prev_week,
  IF( days_from_today >6 AND days_from_today <14, 1, 0) AS last_week,
  IF( days_from_today <7, 1, 0) AS this_week
FROM(
SELECT
    DATE_DIFF(CURRENT_DATE(), DATE(TIMESTAMP_MICROS(event_timestamp)), day) AS days_from_today,
    user_pseudo_id
  FROM
    `dataset.events_2019*`
  WHERE
    event_name = 'user_engagement'
  GROUP BY
    days_from_today,
    user_pseudo_id))
    WHERE prev_week=1
    GROUP BY prev_week, last_week, this_week

【问题讨论】:

【参考方案1】:

对您的查询进行一些修改,这对我有用:

SELECT
  user_pseudo_id
FROM (
  SELECT
    user_pseudo_id,
    MAX(IF( days_from_today >13 AND days_from_today < 21, 1, 0)) AS prev_week,
    MAX(IF( days_from_today >6 AND days_from_today <14, 1, 0)) AS last_week,
    MAX(IF( days_from_today <7, 1, 0)) AS this_week
  FROM (
    SELECT
      DATE_DIFF(CURRENT_DATE(), DATE(TIMESTAMP_MICROS(event_timestamp)), day) AS days_from_today,
      user_pseudo_id
    FROM
      test_table
    WHERE
      event_name = 'user_engagement'
    GROUP BY
      days_from_today,
      user_pseudo_id)
  GROUP BY
    user_pseudo_id)
WHERE
  prev_week = 1

玩一些虚拟数据:

WITH test_table as (
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1552208299000000 as event_timestamp union all
  select 2 as user_pseudo_id, 'user_engagement' as event_name, 1552079299000000 as event_timestamp union all
  select 3 as user_pseudo_id, 'user_engagement' as event_name, 1552186299000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1551024899000000 as event_timestamp union all
  select 2 as user_pseudo_id, 'user_engagement' as event_name, 1551024899000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1551523899000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1552024899000000 as event_timestamp
)
SELECT
  DATE_DIFF(CURRENT_DATE(), DATE(TIMESTAMP_MICROS(event_timestamp)), day) AS days_from_today,
  user_pseudo_id
FROM
  test_table
WHERE
  event_name = 'user_engagement'
GROUP BY
  days_from_today,
  user_pseudo_id
ORDER BY 2, 1

这给出了这个数据集:

  days_from_today   user_pseudo_id   
1        4               1   
2        6               1   
3        12              1   
4        18              1   
5        6               2   
6        18              2   
7        4               3   

这里上周加入的用户是12

使用虚拟数据集运行查询:

WITH test_table as (
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1552208299000000 as event_timestamp union all
  select 2 as user_pseudo_id, 'user_engagement' as event_name, 1552079299000000 as event_timestamp union all
  select 3 as user_pseudo_id, 'user_engagement' as event_name, 1552186299000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1551024899000000 as event_timestamp union all
  select 2 as user_pseudo_id, 'user_engagement' as event_name, 1551024899000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1551523899000000 as event_timestamp union all
  select 1 as user_pseudo_id, 'user_engagement' as event_name, 1552024899000000 as event_timestamp
)
SELECT
  user_pseudo_id
FROM (
  SELECT
    user_pseudo_id,
    MAX(IF( days_from_today >13 AND days_from_today < 21, 1, 0)) AS prev_week,
    MAX(IF( days_from_today >6 AND days_from_today <14, 1, 0)) AS last_week,
    MAX(IF( days_from_today <7, 1, 0)) AS this_week
  FROM (
    SELECT
      DATE_DIFF(CURRENT_DATE(), DATE(TIMESTAMP_MICROS(event_timestamp)), day) AS days_from_today,
      user_pseudo_id
    FROM
      test_table
    WHERE
      event_name = 'user_engagement'
    GROUP BY
      days_from_today,
      user_pseudo_id)
  GROUP BY
    user_pseudo_id)
WHERE
  prev_week = 1

给用户12 作为结果。这应该是您想要的结果。您可以使用此查询来构建您想要的不同分析结果。

【讨论】:

以上是关于在 BigQuery 中计算当前用户返回率的主要内容,如果未能解决你的问题,请参考以下文章

使用 BigQuery 计算当前 7 天的活跃用户?

firebase 和 BigQuery 的保留结果不匹配

在bigquery中以编程方式更新/插入数据

查找要插入 BigQuery 的列名

将 10 MB 数据上传到 Bigquery

Bigquery:检查流期间的重复项