在 Google BigQuery 中获取某个时间范围内的每周订阅明细

Posted

技术标签:

【中文标题】在 Google BigQuery 中获取某个时间范围内的每周订阅明细【英文标题】:Get weekly breakdown of subscribtions from a time range in Google BigQuery 【发布时间】:2017-11-29 10:20:09 【问题描述】:

如果我有每个订阅处于活动状态的时间段,是否有办法获取有关活动订阅的每周数据? 我在 BigQuery 中有一个包含订阅列表的表:

+-----------------+---------+---------------+--------------+
| subscription_id | user_id | subscribed_at | cancelled_at |
+-----------------+---------+---------------+--------------+
|               1 |       2 | 2017-01-05    | 2017-06-03   |
|               2 |       3 | 2017-01-07    | 2017-09-15   |
|               3 |       4 | 2017-01-09    | NULL         |
|               4 |       1 | 2017-01-11    | 2017-05-27   |
|               5 |       3 | 2017-01-15    | NULL         |
+-----------------+---------+---------------+--------------+

我需要为每个唯一的 subscription_id + active_week 组合获取记录。像这样的:

+-----------------+---------+---------------+--------------+-------------+
| subscription_id | user_id | subscribed_at | cancelled_at | active_week |
+-----------------+---------+---------------+--------------+-------------+
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201701 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201701 |
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201702 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201702 |
|               3 |       4 | 2017-01-09    | NULL         |      201702 |
|               4 |       1 | 2017-01-11    | 2017-05-27   |      201702 |
|               1 |       2 | 2017-01-05    | 2017-06-03   |      201703 |
|               2 |       3 | 2017-01-07    | 2017-09-15   |      201703 |
|               3 |       4 | 2017-01-09    | NULL         |      201703 |
|               4 |       1 | 2017-01-11    | 2017-05-27   |      201703 |
|               5 |       3 | 2017-01-15    | NULL         |      201703 |
|             ... |      ...| ...           |...           |         ... |
+-----------------+---------+---------------+--------------+-------------+

我尝试从this 出发,但没有成功。

SELECT
  SPLIT(RPAD('', 1 + DATEDIFF(sub.ended_date, sub.started_date), '.'),'') AS weeks,
  sub.subscription_Id,
  sub.customer_id
FROM (
  SELECT
    subscribed_at AS started_date,
    CASE
      WHEN cancelled_at IS NULL THEN TIMESTAMP(CURRENT_DATE())
      ELSE TIMESTAMP(cancelled_at)
    END AS ended_date,
    subscription_id,
    customer_id
  FROM
    [subscriptions]) AS sub

非常感谢您的帮助!

最好, 拒绝

【问题讨论】:

欢迎来到 Stack Overflow!纯粹的代码编写请求在 Stack Overflow 上是题外话——我们希望这里的问题与特定的编程问题相关——但我们很乐意帮助您自己编写!告诉我们what you've tried,以及您遇到的问题。这也将有助于我们更好地回答您的问题。 这应该可以帮助您入门:cloud.google.com/bigquery/docs/reference/standard-sql/… 只显示您尝试过的内容和遇到的问题,以便我们从中挑选:o) @MikhailBerlyant 我尝试使用来自here 的查询并对其进行修改以从我的表中获取数据,但没有运气。 那么就展示一下吧! 【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
WITH temp AS (
  SELECT subscription_id, user_id, 
    PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at, 
    PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
  FROM `project.dataset.subscriptions`
), weeks AS (
  SELECT 
    wk week_start, 
    DATE_ADD(wk, INTERVAL 6 DAY) week_end
  FROM (
    SELECT GENERATE_DATE_ARRAY(
        DATE_TRUNC(MIN(subscribed_at), WEEK), 
        CURRENT_DATE(), INTERVAL 1 WEEK) weeks
    FROM temp
  ), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start

请注意:我假设您的 subscribed_atcancelled_at 字段属于 STRING 数据类型 - 这就是为什么有一个额外的 temp 子查询负责将它们转换为 DATE 字段。如果原始表中的各个字段已经是 DATE 数据类型 - 您应该删除 temp 的使用,而只使用 `project.dataset.subscriptions`。

您可以使用您问题中的虚拟数据测试/玩上述查询,如下所示

#standardSQL
WITH `project.dataset.subscriptions` AS (
  SELECT 1 subscription_id, 2 user_id, '2017-01-05' subscribed_at, '2017-06-03' cancelled_at UNION ALL
  SELECT 2, 3, '2017-01-07', '2017-09-15' UNION ALL
  SELECT 3, 4, '2017-01-09', NULL UNION ALL
  SELECT 4, 1, '2017-01-11', '2017-05-27' UNION ALL
  SELECT 5, 3, '2017-01-15', NULL
), temp AS (
  SELECT subscription_id, user_id, 
    PARSE_DATE('%Y-%m-%d', subscribed_at) subscribed_at, 
    PARSE_DATE('%Y-%m-%d', cancelled_at) cancelled_at
  FROM `project.dataset.subscriptions`
), weeks AS (
  SELECT 
    wk week_start, 
    DATE_ADD(wk, INTERVAL 6 DAY) week_end
  FROM (
    SELECT GENERATE_DATE_ARRAY(
        DATE_TRUNC(MIN(subscribed_at), WEEK), 
        CURRENT_DATE(), INTERVAL 1 WEEK) weeks
    FROM temp
  ), UNNEST(weeks) wk
)
SELECT subscription_id, user_id, subscribed_at, cancelled_at, week_start, week_end
FROM weeks
JOIN temp
ON subscribed_at <= week_end
AND IFNULL(cancelled_at, CURRENT_DATE()) > week_start
ORDER BY week_start, subscription_id

另请注意:在上述查询中 - 星期从星期日开始 如果您想从星期一开始计算周数 - 您应该在 weeks CTE 中调整一点 week_start 和 week_end - 如下所示

    DATE_ADD(wk, INTERVAL 1 DAY) week_start, 
    DATE_ADD(wk, INTERVAL 7 DAY) week_end

【讨论】:

谢谢米哈伊尔!有用。你能推荐一本关于标准 SQL 的好书吗? 我会推荐这个 - cloud.google.com/bigquery/docs/reference/standard-sql

以上是关于在 Google BigQuery 中获取某个时间范围内的每周订阅明细的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 Google Analytics 数据在 Bigquery 中获取可用的日期时间字段

如何在 Google BigQuery 中获取日期名称

通过 BigQuery 从 Google 分析中获取访问者纬度和经度

bigquery如何获取设备信息

在 BigQuery 或 Google Data Studio 中获取上个月的数据

在 BigQuery Google Cloud 中获取我所有计划的 SQL 查询