时间段之间的 BigQuery 计数实例分组依据

Posted

技术标签:

【中文标题】时间段之间的 BigQuery 计数实例分组依据【英文标题】:BigQuery Count Instances between timeperiod Group By 【发布时间】:2016-03-04 16:40:18 【问题描述】:

我有一个订单表上传到 BigQuery,其中包含以下标题

ConsumerID、TransactionDate、Revenue、OrderID

ConsumerID 和 OrderID 是整数 TransactionDate 是一个时间戳

数据结构如下

ConsumerId   || TransactionDate          || Revenue   ||  OrderID
1            || 2014-10-27 00:00:00 UTC  || 55        ||  653745
1            || 2015-02-27 00:00:00 UTC  || 65        ||  767833
1            || 2015-12-27 00:00:00 UTC  || 456       ||  5676324
2            || 2014-10-27 00:00:00 UTC  || 56        ||  435261
2            || 2016-02-27 00:00:00 UTC  || 43        ||  5632436724

所以我的预期输出是

ConsumerId   || Count Of Orders In Last 12 months
    1        || 2
    2        || 1

我想计算客户自第一次下订单之日起的第一个 12 个月内下的订单数量。

在大查询中我写了以下内容

SELECT
  ConsumerId,
  COUNT(OrderNumber BETWEEN MIN(TransactionDate)AND DATE_ADD(MIN(TransactionDate),11,"MONTH")) AS CountOfOrdersTwelve,
FROM
  [ordertable.orders]
GROUP BY
  1,
  2
ORDER BY
  ConsumerId ;

但是这会出现以下错误

错误:(L3:157):无法按聚合分组。

有谁知道在 bigquery 中可以做到这一点的方法吗?

【问题讨论】:

【参考方案1】:

供您考虑的快速选项(假设输入如下)

      (SELECT 1 AS ConsumerID, '2014-01-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 1 AS ConsumerID, '2014-05-01' AS TransactionDate, 2 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-05-01' AS TransactionDate, 6 AS OrderID),

      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 2 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
      (SELECT 2 AS ConsumerID, '2016-05-01' AS TransactionDate, 6 AS OrderID),

      (SELECT 3 AS ConsumerID, '2015-04-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 3 AS ConsumerID, '2015-05-01' AS TransactionDate, 2 AS OrderID)

您的数据可能因数据类型而异,因此您需要进行相应调整

SELECT ConsumerID, MAX(CountOfOrders) AS CountOfOrdersTwelve
FROM (
  SELECT ConsumerID, CountOfOrders
  FROM (
    SELECT
      ConsumerID, TransactionDate,
      COUNT(1) OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS CountOfOrders,
      FIRST_VALUE(TransactionDate) 
        OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS firstTransactionDate
    FROM [ordertable.orders]
  ) HAVING DATEDIFF(TransactionDate, firstTransactionDate) <= 365
) GROUP BY ConsumerID ORDER BY ConsumerID

精简版

注意:此版本适用于STRING(如上述第一个解决方案的示例)和TIMESTAMP(如您更新的问题)数据类型TransactionDate

SELECT 
  ConsumerID, CountOfOrdersTwelve
FROM (
  SELECT 
    ConsumerID,
    TIMESTAMP_TO_SEC(TIMESTAMP(TransactionDate)) AS ts,
    COUNT(ts) OVER (PARTITION BY ConsumerID ORDER BY ts 
      RANGE BETWEEN CURRENT ROW AND 365*24*3600 FOLLOWING) AS CountOfOrdersTwelve,
    ROW_NUMBER() OVER(PARTITION BY ConsumerID ORDER BY ts) AS pos
  FROM [ordertable.orders]
)
WHERE pos = 1
ORDER BY ConsumerID

【讨论】:

感谢您的回复,查询运行没有错误,不幸的是它返回 0 结果,这不可能是真的,因为人们显然必须下订单。数据类型重要吗? ConsumerId 和 OrderNumber 是整数,我是否应该澄清一下订单号只是一个数据库增量编号,而不是说明客户的订单数量。我真的很感激你能提供的任何帮助。 上述查询确实有效 - 问题可能出在您的特定数据中的数据类型上。提供您的数据的简要示例,我将分别调整 谢谢,对原始问题进行了澄清 非常感谢,Compact 版本完美运行……而且速度非常快。非常感谢您的帮助。

以上是关于时间段之间的 BigQuery 计数实例分组依据的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery:如何在滚动时间戳窗口内对行进行分组和计数?

哪个最快?计数子查询或分组依据

与计数、最大值和分组依据相关

LINQ C# 中的条件计数分组依据

Laravel Eloquent 查询,带有连接、计数和分组依据

使用条件和“分组依据”使用“分组依据”计算的计数记录总和