在 Google BigQuery 中获取某个时间范围内的每周订阅明细
Posted
技术标签:
【中文标题】在 Google BigQuery 中获取某个时间范围内的每周订阅明细【英文标题】:Get weekly breakdown of subscribtions from a time range in Google BigQuery 【发布时间】:2017-11-29 10:20:09 【问题描述】:如果我有每个订阅处于活动状态的时间段,是否有办法获取有关活动订阅的每周数据? 我在 BigQuery 中有一个包含订阅列表的表:
+-----------------+---------+---------------+--------------+
| subscription_id | user_id | subscribed_at | cancelled_at |
+-----------------+---------+---------------+--------------+
| 1 | 2 | 2017-01-05 | 2017-06-03 |
| 2 | 3 | 2017-01-07 | 2017-09-15 |
| 3 | 4 | 2017-01-09 | NULL |
| 4 | 1 | 2017-01-11 | 2017-05-27 |
| 5 | 3 | 2017-01-15 | NULL |
+-----------------+---------+---------------+--------------+
我需要为每个唯一的 subscription_id + active_week 组合获取记录。像这样的:
+-----------------+---------+---------------+--------------+-------------+
| subscription_id | user_id | subscribed_at | cancelled_at | active_week |
+-----------------+---------+---------------+--------------+-------------+
| 1 | 2 | 2017-01-05 | 2017-06-03 | 201701 |
| 2 | 3 | 2017-01-07 | 2017-09-15 | 201701 |
| 1 | 2 | 2017-01-05 | 2017-06-03 | 201702 |
| 2 | 3 | 2017-01-07 | 2017-09-15 | 201702 |
| 3 | 4 | 2017-01-09 | NULL | 201702 |
| 4 | 1 | 2017-01-11 | 2017-05-27 | 201702 |
| 1 | 2 | 2017-01-05 | 2017-06-03 | 201703 |
| 2 | 3 | 2017-01-07 | 2017-09-15 | 201703 |
| 3 | 4 | 2017-01-09 | NULL | 201703 |
| 4 | 1 | 2017-01-11 | 2017-05-27 | 201703 |
| 5 | 3 | 2017-01-15 | NULL | 201703 |
| ... | ...| ... |... | ... |
+-----------------+---------+---------------+--------------+-------------+
我尝试从this 出发,但没有成功。
SELECT
SPLIT(RPAD('', 1 + DATEDIFF(sub.ended_date, sub.started_date), '.'),'') AS weeks,
sub.subscription_Id,
sub.customer_id
FROM (
SELECT
subscribed_at AS started_date,
CASE
WHEN cancelled_at IS NULL THEN TIMESTAMP(CURRENT_DATE())
ELSE TIMESTAMP(cancelled_at)
END AS ended_date,
subscription_id,
customer_id
FROM
[subscriptions]) AS sub
非常感谢您的帮助!
最好, 拒绝
【问题讨论】:
欢迎来到 Stack Overflow!纯粹的代码编写请求在 Stack Overflow 上是题外话——我们希望这里的问题与特定的编程问题相关——但我们很乐意帮助您自己编写!告诉我们what you've tried,以及您遇到的问题。这也将有助于我们更好地回答您的问题。 这应该可以帮助您入门:cloud.google.com/bigquery/docs/reference/standard-sql/… 只显示您尝试过的内容和遇到的问题,以便我们从中挑选:o) @MikhailBerlyant 我尝试使用来自here 的查询并对其进行修改以从我的表中获取数据,但没有运气。 那么就展示一下吧! 【参考方案1】:以下是 BigQuery 标准 SQL
#standardSQL
WITH temp AS (
SELECT subscription_id, user_id,
PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at,
PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
FROM `project.dataset.subscriptions`
), weeks AS (
SELECT
wk week_start,
DATE_ADD(wk, INTERVAL 6 DAY) week_end
FROM (
SELECT GENERATE_DATE_ARRAY(
DATE_TRUNC(MIN(subscribed_at), WEEK),
CURRENT_DATE(), INTERVAL 1 WEEK) weeks
FROM temp
), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start
请注意:我假设您的 subscribed_at
和 cancelled_at
字段属于 STRING
数据类型 - 这就是为什么有一个额外的 temp
子查询负责将它们转换为 DATE
字段。如果原始表中的各个字段已经是 DATE
数据类型 - 您应该删除 temp
的使用,而只使用 `project.dataset.subscriptions`。
您可以使用您问题中的虚拟数据测试/玩上述查询,如下所示
#standardSQL
WITH `project.dataset.subscriptions` AS (
SELECT 1 subscription_id, 2 user_id, '2017-01-05' subscribed_at, '2017-06-03' cancelled_at UNION ALL
SELECT 2, 3, '2017-01-07', '2017-09-15' UNION ALL
SELECT 3, 4, '2017-01-09', NULL UNION ALL
SELECT 4, 1, '2017-01-11', '2017-05-27' UNION ALL
SELECT 5, 3, '2017-01-15', NULL
), temp AS (
SELECT subscription_id, user_id,
PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at,
PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
FROM `project.dataset.subscriptions`
), weeks AS (
SELECT
wk week_start,
DATE_ADD(wk, INTERVAL 6 DAY) week_end
FROM (
SELECT GENERATE_DATE_ARRAY(
DATE_TRUNC(MIN(subscribed_at), WEEK),
CURRENT_DATE(), INTERVAL 1 WEEK) weeks
FROM temp
), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start
ORDER BY week_start, subscription_id
另请注意:在上述查询中 - 星期从星期日开始
如果您想从星期一开始计算周数 - 您应该在 weeks
CTE 中调整一点 week_start 和 week_end - 如下所示
DATE_ADD(wk, INTERVAL 1 DAY) week_start,
DATE_ADD(wk, INTERVAL 7 DAY) week_end
【讨论】:
谢谢米哈伊尔!有用。你能推荐一本关于标准 SQL 的好书吗? 我会推荐这个 - cloud.google.com/bigquery/docs/reference/standard-sql以上是关于在 Google BigQuery 中获取某个时间范围内的每周订阅明细的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 Google Analytics 数据在 Bigquery 中获取可用的日期时间字段
通过 BigQuery 从 Google 分析中获取访问者纬度和经度