当前行和上一行之间的秒数差异,并使用 google bigquery 将值存储在单独的列中
Posted
技术标签:
【中文标题】当前行和上一行之间的秒数差异,并使用 google bigquery 将值存储在单独的列中【英文标题】:difference in seconds between current row and previous row and store the value in separate column using google bigquery 【发布时间】:2016-05-31 19:31:35 【问题描述】:我有一个存储时间戳的表,如下所示
Date Order ID
2016-05-31 11:46:54 UTC 14567
2016-05-31 11:46:43 UTC 876
2016-05-31 11:46:24 UTC 1345
2016-05-31 11:46:04 UTC 7345
我想获取。
Date Order Difference In Seconds
2016-05-31 11:46:54 UTC 14567 0
2016-05-31 11:46:43 UTC 876 11
2016-05-31 11:46:24 UTC 1345 19
2016-05-31 11:46:04 UTC 7345 42
【问题讨论】:
我不确定您是如何计算的。我可能期望 (0, 11, 19, 20) 与 next 行的差异。 基于问题 - 我实际上期望 (11, 19, 20, 0) 【参考方案1】:以下假设您的 DATE 文件是 STRING 数据类型 如果它已经是 TIMESTAMP 数据类型 - 你应该从下面的查询中删除 TIMESTAMP()
SELECT
DATE, id,
IFNULL(TIMESTAMP_TO_SEC(TIMESTAMP(DATE)) -
TIMESTAMP_TO_SEC(TIMESTAMP(prev_date))
, 0) AS Difference_In_Seconds
FROM (
SELECT
DATE, id,
LEAD(DATE) OVER (ORDER BY DATE DESC) AS prev_date
FROM
(SELECT '2016-05-31 11:46:54 UTC' AS DATE, 14567 AS id),
(SELECT '2016-05-31 11:46:43 UTC' AS DATE, 876 AS id),
(SELECT '2016-05-31 11:46:24 UTC' AS DATE, 1345 AS id),
(SELECT '2016-05-31 11:46:04 UTC' AS DATE, 7345 AS id)
)
ORDER BY DATE DESC
添加到 DATE 字段为 TIMESTAMP 数据类型时的地址案例
为了简单起见 - 试试下面 :o)
SELECT
DATE, id,
IFNULL(TIMESTAMP_TO_SEC(TIMESTAMP(DATE)) -
TIMESTAMP_TO_SEC(TIMESTAMP(prev_date))
, 0) AS Difference_In_Seconds
FROM (
SELECT
DATE, id,
LEAD(DATE) OVER (ORDER BY DATE DESC) AS prev_date
FROM
(SELECT STRING(DATE) AS DATE, id FROM [test:product.tab1] )
)
ORDER BY DATE DESC
【讨论】:
非常感谢。感谢您的帮助。但是当我尝试稍微修改运行查询时,difference_in_seconds 正在为所有行写入“0”。你能帮我理解我哪里出错了吗? SELECT DATE,prev_date, id, IFNULL(TIMESTAMP_TO_SEC(TIMESTAMP(DATE)) - TIMESTAMP_TO_SEC(TIMESTAMP(prev_date)) , 0) AS Difference_In_Seconds FROM (SELECT DATE, id, LEAD(DATE) OVER (ORDER BY DATE DESC) AS prev_date FROM [test:product.tab1] ) 按日期顺序排序 请澄清 - 您的 tab1 表中日期字段的数据类型是什么! 日期字段在时间戳中,如果我删除 timestamp() 并执行查询,则会出现以下错误“类型不匹配。预期的 TIMESTAMP,实际的未知。” 还有一个问题。如果我希望订单 ID 也与前一行匹配,如何扩展相同的查询。例如:日期顺序差异以秒为单位 2016-05-31 11:46:54 UTC 14567 11 2016-05-31 11:46:43 UTC 14567 19 2016-05-31 11:46:24 UTC 14567 20 2016-05- 31 11:46:04 UTC 14567 4 2016-05-31 11:46:54 UTC 22455 11 2016-05-31 11:46:43 UTC 24567 0 2016-05-31 11:46:00 UTC 14567 0跨度> 评论格式不适合另一个问题和答案。请发布您的新问题,我或其他人会很乐意回答它【参考方案2】:您可以使用timestamp_diff()
和lag()
:
select t.*,
coalesce(timestamp_diff(lag(date) over (order by date), date, second),
0) as diff_in_seconds
from t;
您的数据可能是当前行和下一个行之间的差异。为此,请使用lead()
:
select t.*,
coalesce(timestamp_diff(lead(date) over (order by date), date, second),
0) as diff_in_seconds
from t;
【讨论】:
以上是关于当前行和上一行之间的秒数差异,并使用 google bigquery 将值存储在单独的列中的主要内容,如果未能解决你的问题,请参考以下文章