按时间范围谷歌选择不同的用户组 - bigquery SQL
Posted
技术标签:
【中文标题】按时间范围谷歌选择不同的用户组 - bigquery SQL【英文标题】:Select distinct users group by time range google - bigquery SQL 【发布时间】:2019-09-20 04:51:10 【问题描述】:Select distinct users group by time range
如何在 google big query 的 SQL 版本中执行上述链接?
更新详情:
我有一张包含以下信息的表格
|day| user_id
我想计算一个日期的不同 user_id 的数量:
-
在那一天
截至该日期的那一周(Week to date)
截至该日期的月份(Month to date)
输入表示例:
| day | user_id
| 2013-01-01 | 1
| 2013-01-03 | 3
| 2013-01-06 | 4
| 2013-01-07 | 4
预期输出:
| day | time_series | cnt |
| 2013-01-01 | D | 1 |
| 2013-01-01 | W | 1 |
| 2013-01-01 | M | 1 |
| 2013-01-03 | D | 1 |
| 2013-01-03 | W | 2 |
| 2013-01-03 | M | 2 |
| 2013-01-06 | D | 1 |
| 2013-01-06 | W | 1 |
| 2013-01-06 | M | 3 |
| 2013-01-07 | D | 1 |
| 2013-01-07 | W | 1 |
| 2013-01-07 | M | 3 |
附:类似的问题是询问 postgresql - 但我需要 BigQuery 的版本
【问题讨论】:
so bigquery 还是 sql-server?完全不同的东西! @MikhailBerlyant 对.. 我糟糕的大查询版本的 SQL* 我还建议您在帖子中提出您的具体问题,而不是引用其他人的问题 我也有同样的问题,就是用google big query.. 【参考方案1】:以下是 BigQuery 标准 SQL
选项#1
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2013-01-01' day, 1 user_id UNION ALL
SELECT '2013-01-03', 3 UNION ALL
SELECT '2013-01-06', 4 UNION ALL
SELECT '2013-01-07', 4
)
SELECT day, 'D' series, COUNT(DISTINCT user_id) users
FROM `project.dataset.table` GROUP BY day
UNION ALL
SELECT DISTINCT day, 'W', (SELECT COUNT(DISTINCT id) FROM UNNEST(users) id)
FROM (
SELECT day, ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, WEEK) ORDER BY day) users
FROM `project.dataset.table`
)
UNION ALL
SELECT DISTINCT day, 'M', (SELECT COUNT(DISTINCT id) FROM UNNEST(users) id)
FROM (
SELECT day, ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, MONTH) ORDER BY day) users
FROM `project.dataset.table`
)
ORDER BY day, CASE series WHEN 'D' THEN 1 WHEN 'W' THEN 2 ELSE 3 END
结果
Row day series users
1 2013-01-01 D 1
2 2013-01-01 W 1
3 2013-01-01 M 1
4 2013-01-03 D 1
5 2013-01-03 W 2
6 2013-01-03 M 2
7 2013-01-06 D 1
8 2013-01-06 W 1
9 2013-01-06 M 3
10 2013-01-07 D 1
11 2013-01-07 W 1
12 2013-01-07 M 3
选项#2 - 基于上述版本,但将三个查询合并为一个
#standardSQL
SELECT DISTINCT day, d_users,
(SELECT COUNT(DISTINCT id) FROM UNNEST(w_users) id) w_users,
(SELECT COUNT(DISTINCT id) FROM UNNEST(m_users) id) m_users
FROM (
SELECT day,
COUNT(DISTINCT user_id) OVER(PARTITION BY day) d_users,
ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, WEEK) ORDER BY day) w_users,
ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, MONTH) ORDER BY day) m_users
FROM `project.dataset.table`
)
ORDER BY day
如果应用于相同的数据 - 结果是
Row day d_users w_users m_users
1 2013-01-01 1 1 1
2 2013-01-03 1 2 2
3 2013-01-06 1 1 3
4 2013-01-07 1 1 3
选项 #3 - 如果由于某种原因您需要取消旋转/展平选项 #2 的结果
#standardSQL
SELECT day, series, users
FROM (
SELECT DISTINCT day, d_users,
(SELECT COUNT(DISTINCT id) FROM UNNEST(w_users) id) w_users,
(SELECT COUNT(DISTINCT id) FROM UNNEST(m_users) id) m_users
FROM (
SELECT day,
COUNT(DISTINCT user_id) OVER(PARTITION BY day) d_users,
ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, WEEK) ORDER BY day) w_users,
ARRAY_AGG(user_id) OVER(PARTITION BY DATE_TRUNC(day, MONTH) ORDER BY day) m_users
FROM `project.dataset.table`
)
), UNNEST([STRUCT('D' AS series, d_users AS users), ('W', w_users), ('M', m_users)])
ORDER BY day
wich 给出的结果与选项 #1 中的结果相同
【讨论】:
以上是关于按时间范围谷歌选择不同的用户组 - bigquery SQL的主要内容,如果未能解决你的问题,请参考以下文章