如何从最后一行开始对时间序列数据进行反向重采样？

Posted 2023-03-12

技术标签:

【中文标题】如何从最后一行开始对时间序列数据进行反向重采样？【英文标题】：How to do backward resampling on time series data starting from the last row? 【发布时间】：2019-11-27 08:20:39 【问题描述】：

我有几行数据（每秒），我曾经以两个小时为单位重新采样，简而言之，我针对每个特征应用了不同的计算：

data = data.resample('2H').agg('id':'first','x1': np.sum, 
                                'x2': np.mean).dropna()

由于每个文件包含一天的记录，因此它将产生大约 12 行，从每天 00:00 开始。

datetime            id      x1      x2      
2019/05/03 0:00     5603    1324    4600
2019/05/03 2:00     5603    1276    2836
2019/05/03 4:00     5603    184     258
2019/05/03 6:00     5603    546     929
2019/05/03 8:00     5603    2       1
2019/05/03 10:00    5603    6       3
2019/05/03 12:00    5603    8       5
2019/05/03 14:00    5603    835     1798
2019/05/03 16:00    5603    14      7
2019/05/03 18:00    5603    690     1518
2019/05/03 20:00    5603    823     1636
2019/05/03 22:00    5603    972     2547

我的问题是：如何从最后一行向后重新采样两个小时（或任何时间范围）？目前，对于每个 CSV 文件，最后一行将是我向后重新采样的“起点”，例如：

如果我的最后一行是 2019/05/03 09:27:00，那么我需要重新采样 2019/05/03 07:27:00 到 2019/05/03 09:27:00 的数据，以此类推在。

我寻找同样的问题，发现这个：Pandas resample time series counting backwards (or reverse resample) 但无法为我的问题实现它。

【问题讨论】：

【参考方案1】：

您可以通过对时间戳应用转换，对转换后的索引重新采样，然后恢复转换来做到这一点。

end_time = data.index[-1]
data['time to end'] = end_time - data.index
data.set_index('time to end', inplace=True)

data = data.resample('2h').mean() # Or your function

data['datetime'] = end_time - data.index
data.set_index('datetime', inplace=True)

【讨论】：

这里出现了一些错误：ValueError：传递值的形状为 (1, 2)，索引表示 (1, 0)，结束时间显示 NaT，知道为什么吗？【参考方案2】：

这也让我发疯了。我一直觉得重新采样应该做我想做的事。最终我通过使用 origin 参数让它工作。

periods = pd.date_range("2020-10-17 15:53:03", "2020-10-17 15:56:56", freq="1s")
ts = pd.Series(range(len(periods)), index=periods)
resampled = ts.resample('60s', origin=ts.index[-1], closed='right', label='right')

【讨论】：

以上是关于如何从最后一行开始对时间序列数据进行反向重采样？的主要内容，如果未能解决你的问题，请参考以下文章