Python 使用依赖于另一列的值的复杂函数聚合时间序列

Posted 2023-03-29

技术标签:

【中文标题】Python 使用依赖于另一列的值的复杂函数聚合时间序列【英文标题】：Python aggregate time series using a complex function that depends on the value from anther column 【发布时间】：2021-09-22 20:27:58 【问题描述】：

我的时间序列是这样的：

TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice
1,09:25:00,137.69,200,200,453,B,182023,137.69,241939,137.69
2,09:25:00,137.69,253,300,453,S,184857,137.69,241939,137.69
3,09:25:00,137.69,47,300,200,B,184857,137.69,241322,137.69
4,09:25:00,137.69,153,200,200,B,219208,137.69,241322,137.69

我可以通过对所有 Volume 求和来进行聚合

res = df.resample('t').agg('Volume': 'sum')

但我想根据 volume 和 type 列聚合 volume 和 type 列，当 type 为 S 时添加卷，否则删除卷。如果聚合后的总体积为负数，则类型为S，否则类型为B。

在上面的例子中，我聚合了体积后，总体积会变成

200 - 253 + 300 + 200 = 447

并且类型是B，因为 447 > 0

结果：

Time,Volume,Type
09:25:00,447,B

【问题讨论】：

【参考方案1】：

最简单的方法是将音量乘以 1 或 -1，具体取决于带有 map 的类型中的值。然后 assign 列类型取决于总和的结果。

res = (
    (df['Volume']*df['Type'].map('S':-1, 'B':1))
      .groupby(df['Time']).sum()#here should work with resample, 
                                #just your input is not the right format to use resample
      .reset_index(name='Volume')
      .assign(Type=lambda x: np.where(x['Volume']>0, 'B', 'S'))
)

print(res)
       Time  Volume Type
0  09:25:00     147    B # you used 2 columns to calculate your result volume 447?

【讨论】：

以上是关于Python 使用依赖于另一列的值的复杂函数聚合时间序列的主要内容，如果未能解决你的问题，请参考以下文章