在多列上聚合时间序列数据

Posted 2023-03-11

技术标签:

【中文标题】在多列上聚合时间序列数据【英文标题】：Aggregation of time-series data on multiple columns 【发布时间】：2021-04-02 08:09:16 【问题描述】：

                     rand_val  new_val           copy_time
2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
2020-10-15 00:04:00         3       19 2020-10-15 00:04:00

我正在使用 resample 方法对时间序列进行下采样。我发现在聚合数据上应用函数时无法调用特定列。

假设我想做一些涉及调用列名的操作：

df.resample("1min").apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)

我收到一个错误：

AttributeError: 'Series' object has no attribute 'rand_val'

如果我对其他变量进行了 groupby，这将是可能的。我猜重采样功能不一样。有什么想法吗？

【问题讨论】：

试试df.resample('1min',on='copy_time').apply(...) 【参考方案1】：

这是一个好问题！当我们对某些列执行groupby 时，每个数据块都被视为一个pandas DataFrame。因此，我们可以像往常一样访问列。但是，在这种resample 的情况下，它是一个系列。

只为rand_val 获取的一种方法是直接传递该系列，如下所示：

df.resample("1min")['rand_val'].apply(lambda x: sum(x) if len(x)>1 else 0)

我假设您的索引是日期时间格式。否则请使用pd.to_datetime 转换如下：

df.index=pd.to_datetime(df.index)

【讨论】：

【参考方案2】：

使用on=copy_time，我得到以下输出。

a = df.resample('1min',on='copy_time').apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)
print (a)

resample 正在寻找一个必须具有类似日期时间的索引的对象。在你的例子中，我没有看到。传递 copy_time 将处理该数据时间序列。

             org_time  rand_val  new_val           copy_time
0 2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
1 2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2 2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
3 2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
4 2020-10-15 00:04:00         3       19 2020-10-15 00:04:00


copy_time
2020-10-15 00:00:00    16
2020-10-15 00:01:00     0
2020-10-15 00:02:00     0
2020-10-15 00:03:00     0
2020-10-15 00:04:00     0
Freq: T, dtype: int64

【讨论】：

我的索引是日期时间对象。默认情况下，它看起来对索引进行下采样。

以上是关于在多列上聚合时间序列数据的主要内容，如果未能解决你的问题，请参考以下文章