BigQuery 上每周/每月滚动的活跃用户数
Posted
技术标签:
【中文标题】BigQuery 上每周/每月滚动的活跃用户数【英文标题】:Rolling weekly/ monthly active users on BigQuery 【发布时间】:2017-04-18 20:51:07 【问题描述】:我在 bigquery 上寻找滚动的每周/每月活跃用户。我已经尝试过之前的帖子,但是使用 CROSS JOIN 超出了 bigQuery 的限制。
我使用以下作为所需输出的非常基本的形式,但我每天都需要类似的输出,而不仅仅是月末日期。
SELECT
EXACT_COUNT_DISTINCT(id) AS uniqueInstalls,
STRFTIME_UTC_USEC(date, '%Y-%m') AS calendarYM
FROM Analytics.EventsTable2
GROUP BY calendarYM
任何帮助将不胜感激!
干杯!
【问题讨论】:
【参考方案1】:我每天都需要类似的输出
使用 BigQuery 标准 SQL 尝试以下操作
#standardSQL
WITH calendar AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT MIN(DATE) FROM `Analytics.EventsTable2`),
(SELECT MAX(DATE) FROM `Analytics.EventsTable2`),
INTERVAL 1 DAY)
) AS day
)
SELECT
c.day AS day,
COUNT(DISTINCT id) AS uniqueInstalls
FROM calendar AS c
JOIN `Analytics.EventsTable2` AS t
ON t.date BETWEEN DATE_TRUNC(c.day, MONTH) AND c.day
GROUP BY day
ORDER BY day
您可以使用如下的虚拟数据在上面进行测试/播放
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, DATE("2017-04-01") AS DATE UNION ALL
SELECT 1, DATE("2017-04-02") UNION ALL
SELECT 2, DATE("2017-04-02") UNION ALL
SELECT 1, DATE("2017-04-03") UNION ALL
SELECT 1, DATE("2017-04-04") UNION ALL
SELECT 2, DATE("2017-04-04") UNION ALL
SELECT 3, DATE("2017-04-04") UNION ALL
SELECT 4, DATE("2017-04-05") UNION ALL
SELECT 1, DATE("2017-03-02") UNION ALL
SELECT 2, DATE("2017-03-02") UNION ALL
SELECT 1, DATE("2017-03-03") UNION ALL
SELECT 1, DATE("2017-03-04") UNION ALL
SELECT 2, DATE("2017-03-04") UNION ALL
SELECT 3, DATE("2017-03-04") UNION ALL
SELECT 4, DATE("2017-03-05")
),
calendar AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT MIN(DATE) FROM yourTable),
(SELECT MAX(DATE) FROM yourTable),
INTERVAL 1 DAY)
) AS day
)
SELECT
c.day AS day,
COUNT(DISTINCT id) AS uniqueInstalls
FROM calendar AS c
JOIN yourTable AS t
ON t.date BETWEEN DATE_TRUNC(c.day, MONTH) AND c.day
GROUP BY day
ORDER BY day
第一个查询返回以下错误 Error: No matching signature for function GENERATE_DATE_ARRAY
以下是输入数据中 TIMESTAMP 的版本
#standardSQL
WITH calendar AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT MIN(DATE(date)) FROM `Analytics.EventsTable2`),
(SELECT MAX(DATE(date)) FROM `Analytics.EventsTable2`),
INTERVAL 1 DAY)
) AS day
)
SELECT
c.day AS day,
COUNT(DISTINCT id) AS uniqueInstalls
FROM calendar AS c
JOIN `Analytics.EventsTable2` AS t
ON DATE(t.date) BETWEEN DATE_TRUNC(c.day, MONTH) AND c.day
GROUP BY day
ORDER BY day
【讨论】:
谢谢 Mikhail,不过,我没有标准 SQL,而且 BigQuery 不接受您提供的查询。如果您有其他查询,请告诉我。非常感谢您的帮助:) @StevePereira - BigQuery 有两种 sql 方言:legacy 和 standard。因此,如果您确实“拥有”BigQuery - 您确实“拥有”两种方言,包括标准方言。只需在您的 Web UI 中按原样运行上述查询(包括第一行 #standardSQL) - 尝试让我知道结果! 我很感激这个教训!这是输出:行 |天 |唯一安装 | ------------------------------ 1 2017-03-02 2 2 2017-03-03 2 3 2017-03-04 3 4 2017-03-05 4 5 2017-03-06 4 6 2017-03-07 4 7 2017-03-08 4 8 2017-03-09 4 9 2017-03-10 4 10 2017-03-11 4 11 2017-03-12 4 12 2017-03-13 4 13 2017-03-14 4 14 2017-03-15 4 15 2017-03-16 4 16 2017-03-17 4 17 2017-03-18 4 18 2017-03-19 4 19 2017-03-20 4 20 2017-03-21 4 @StevePereira - 所以输出符合预期 - 您应该在技术上采用您的特定用例/数据 :o) - 第一个查询对您来说应该是一个好的开始!第二个查询只是为了看看它如何与虚拟示例/数据一起工作。现在有意义吗? @StevePereira - 哦 - 太棒了!至少我们现在取得了进展并且它有效。要使其滚动 30 天,只需将BETWEEN DATE_TRUNC(c.day, MONTH) AND c.day
替换为 BETWEEN DATE_SUB(c.day, INTERVAL 30 DAY) AND c.day
以上是关于BigQuery 上每周/每月滚动的活跃用户数的主要内容,如果未能解决你的问题,请参考以下文章