时间段之间的 BigQuery 计数实例分组依据
Posted
技术标签:
【中文标题】时间段之间的 BigQuery 计数实例分组依据【英文标题】:BigQuery Count Instances between timeperiod Group By 【发布时间】:2016-03-04 16:40:18 【问题描述】:我有一个订单表上传到 BigQuery,其中包含以下标题
ConsumerID、TransactionDate、Revenue、OrderID
ConsumerID 和 OrderID 是整数 TransactionDate 是一个时间戳
数据结构如下
ConsumerId || TransactionDate || Revenue || OrderID
1 || 2014-10-27 00:00:00 UTC || 55 || 653745
1 || 2015-02-27 00:00:00 UTC || 65 || 767833
1 || 2015-12-27 00:00:00 UTC || 456 || 5676324
2 || 2014-10-27 00:00:00 UTC || 56 || 435261
2 || 2016-02-27 00:00:00 UTC || 43 || 5632436724
所以我的预期输出是
ConsumerId || Count Of Orders In Last 12 months
1 || 2
2 || 1
我想计算客户自第一次下订单之日起的第一个 12 个月内下的订单数量。
在大查询中我写了以下内容
SELECT
ConsumerId,
COUNT(OrderNumber BETWEEN MIN(TransactionDate)AND DATE_ADD(MIN(TransactionDate),11,"MONTH")) AS CountOfOrdersTwelve,
FROM
[ordertable.orders]
GROUP BY
1,
2
ORDER BY
ConsumerId ;
但是这会出现以下错误
错误:(L3:157):无法按聚合分组。
有谁知道在 bigquery 中可以做到这一点的方法吗?
【问题讨论】:
【参考方案1】:供您考虑的快速选项(假设输入如下)
(SELECT 1 AS ConsumerID, '2014-01-01' AS TransactionDate, 1 AS OrderID),
(SELECT 1 AS ConsumerID, '2014-05-01' AS TransactionDate, 2 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-05-01' AS TransactionDate, 6 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 1 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 2 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
(SELECT 2 AS ConsumerID, '2016-05-01' AS TransactionDate, 6 AS OrderID),
(SELECT 3 AS ConsumerID, '2015-04-01' AS TransactionDate, 1 AS OrderID),
(SELECT 3 AS ConsumerID, '2015-05-01' AS TransactionDate, 2 AS OrderID)
您的数据可能因数据类型而异,因此您需要进行相应调整
SELECT ConsumerID, MAX(CountOfOrders) AS CountOfOrdersTwelve
FROM (
SELECT ConsumerID, CountOfOrders
FROM (
SELECT
ConsumerID, TransactionDate,
COUNT(1) OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS CountOfOrders,
FIRST_VALUE(TransactionDate)
OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS firstTransactionDate
FROM [ordertable.orders]
) HAVING DATEDIFF(TransactionDate, firstTransactionDate) <= 365
) GROUP BY ConsumerID ORDER BY ConsumerID
精简版
注意:此版本适用于STRING
(如上述第一个解决方案的示例)和TIMESTAMP
(如您更新的问题)数据类型TransactionDate
SELECT
ConsumerID, CountOfOrdersTwelve
FROM (
SELECT
ConsumerID,
TIMESTAMP_TO_SEC(TIMESTAMP(TransactionDate)) AS ts,
COUNT(ts) OVER (PARTITION BY ConsumerID ORDER BY ts
RANGE BETWEEN CURRENT ROW AND 365*24*3600 FOLLOWING) AS CountOfOrdersTwelve,
ROW_NUMBER() OVER(PARTITION BY ConsumerID ORDER BY ts) AS pos
FROM [ordertable.orders]
)
WHERE pos = 1
ORDER BY ConsumerID
【讨论】:
感谢您的回复,查询运行没有错误,不幸的是它返回 0 结果,这不可能是真的,因为人们显然必须下订单。数据类型重要吗? ConsumerId 和 OrderNumber 是整数,我是否应该澄清一下订单号只是一个数据库增量编号,而不是说明客户的订单数量。我真的很感激你能提供的任何帮助。 上述查询确实有效 - 问题可能出在您的特定数据中的数据类型上。提供您的数据的简要示例,我将分别调整 谢谢,对原始问题进行了澄清 非常感谢,Compact 版本完美运行……而且速度非常快。非常感谢您的帮助。以上是关于时间段之间的 BigQuery 计数实例分组依据的主要内容,如果未能解决你的问题,请参考以下文章
BigQuery:如何在滚动时间戳窗口内对行进行分组和计数?