如何使用 BigQuery(参与度)计算 DAU/MAU

Posted

技术标签:

【中文标题】如何使用 BigQuery(参与度)计算 DAU/MAU【英文标题】:How to calculate DAU/MAU with BigQuery (engagement) 【发布时间】:2015-10-20 01:26:38 【问题描述】:

DAU 和 MAU(日活跃用户和月活跃用户)是衡量用户参与度的既定方法。

如何使用 SQL 和 Google BigQuery 获取这些数字?

【问题讨论】:

【参考方案1】:

2019标准SQL更新:

https://***.com/a/49866033/132438

(要了解 DAU/MAU 的实用性,请参阅 http://blog.compariscope.wefi.com/mobile-app-usage-dau-mau 等文章)

让我们来看看存储在 BigQuery 中的 reddit cmets 数据。我们想在 9 月以每日滚动的方式找出“AskReddit”子版块的 dau/mau 比率:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM (
    SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit') a
  JOIN (
    SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
    FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
    CROSS JOIN (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
      FROM [fh-bigquery:reddit_comments.2015_09]
      GROUP BY 1
    ) b
    WHERE subreddit='AskReddit'
    AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
    GROUP BY 1
  ) b
  ON a.day=b.stopday
  GROUP BY 1
)
ORDER BY 1

此查询获取 9 月每一天的 DAU,并查看 8 月数据以获取每个 DAU 日结束的每 30 天期间的 MAU。这需要大量处理(30 倍),如果我们只计算 9 月的一个 MAU,并继续使用该值作为分母,我们可以获得几乎相同的结果:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)
ORDER BY 1

这是一个更简单的查询,可以更快地为我们带来几乎相同的结果。

现在获取这个 subreddit 本月的平均值:

SELECT ROUND(100*AVG(dau/mau), 2) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)

这告诉我们,“AskReddit”在 9 月份的参与度为 8.95%。

最后一站,如何比较各个子版块中的参与度:

SELECT ROUND(100*AVG(dau)/MAX(mau), 2) avg_daumau, MAX(mau) mau, subreddit
FROM (
  SELECT a.subreddit, DATE(SEC_TO_TIMESTAMP(created_utc)) day,
         EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau, subreddit
    FROM [fh-bigquery:reddit_comments.2015_09]
    GROUP BY 2
  ) b
  ON a.subreddit=b.subreddit
  WHERE mau>50000
  GROUP BY 1, 2
)

GROUP BY subreddit
ORDER BY 1

【讨论】:

【参考方案2】:

为了在不等待“满月”的情况下分析趋势,有必要用它的前身 30 天来查看每一天...... 恐怕建议的解决方案(由 Felipe Hoffa 提出)会改变问题,而不仅仅是数据检索查询。

您可以在下面找到我对这个问题的看法。 我不确定它在性能方面做了什么,而且它不是很快(比 Felipe 的慢得多......),但它涵盖了我理解的业务需求。不过,如果您能提供优化这种方法的解决方案,那就太好了。

请注意:不使用任何连接和子聚合,只使用拆分、分组和日期操作。

SELECT
  *,
  DAU/WAU AS DAW_WAU,
  DAU/MAU AS DAW_MAU,
FROM (
  SELECT
    COALESCE(DAUDate,WAUDate,MAUDate) AS ReportDate,
    subreddit,
    EXACT_COUNT_DISTINCT(IF(DAUDate IS NOT NULL,author,NULL)) AS DAU,
    EXACT_COUNT_DISTINCT(IF(WAUDate IS NOT NULL,author,NULL)) AS WAU,
    EXACT_COUNT_DISTINCT(IF(MAUDate IS NOT NULL,author,NULL)) AS MAU,
  FROM (
    SELECT
      DDate,
      subreddit,
      author,
      Ind,
      DATE(IF(Ind=0,DDate,NULL)) AS DAUDate,
      DATE(IF(Ind<7,DATE_ADD(DDate,Ind,"Day"),NULL)) AS WAUDate,
      DATE(IF(Ind<30,DATE_ADD(DDate,Ind,"Day"),NULL)) AS MAUDate
    FROM (
      SELECT
        DATE(SEC_TO_TIMESTAMP(created_utc)) AS DDate,
        subreddit,
        author,
        INTEGER(SPLIT("0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30",",")) AS Ind
      FROM
        [fh-bigquery:reddit_comments.2015_09],
        [fh-bigquery:reddit_comments.2015_08] ))
  WHERE
    COALESCE(DAUDate,WAUDate,MAUDate)<DATE(TIMESTAMP("2015-10-01")/*Current_Timestamp()*/)
  GROUP EACH BY
    1,
    2)
HAVING
  MAU>50000
ORDER BY
  2,
  1 DESC

【讨论】:

以上是关于如何使用 BigQuery(参与度)计算 DAU/MAU的主要内容,如果未能解决你的问题,请参考以下文章

使用 T-SQL 滚动 DAU、MAU

使用子查询和分组依据每天计算每个国家/地区的 DAU 平均值

段的 DAU/MAU 计算

修复在 Amazon Redshift 上计算 DAU 和 MAU 时的 MAU 问题

Firebase + BigQuery - 唯一标识设备

Firebase Bigquery 集成和 Google Analytics 部分参与其中