使用 pandas 的 groupby 和 shift

Posted 2023-03-11

技术标签:

【中文标题】使用 pandas 的 groupby 和 shift【英文标题】：Using pandas' groupby with shifting 【发布时间】：2017-08-25 18:47:44 【问题描述】：

我希望在groupby 操作中使用pd.rolling_mean。我想在每个组中都有一个滚动平均值previous同一组内的元素。这是一个例子：

按id分组，应该转化为：

id    val
0     nan
0     1
0     1.5
1     nan
1     4
2     nan

【问题讨论】：

【参考方案1】：

我认为你需要groupby 和shift 和rolling，窗口大小可以设置为标量：

df['val']=df.groupby('id')['val'].apply(lambda x: x.shift().rolling(2, min_periods=1).mean())
print (df)
   id  val
0   0  NaN
1   0  1.0
2   0  1.5
3   1  NaN
4   1  4.0
5   2  NaN

感谢3novak 的评论-您可以通过组的最大长度设置窗口大小：

f = lambda x: x.shift().rolling(df['id'].value_counts().iloc[0], min_periods=1).mean()
df['val'] = df.groupby('id')['val'].apply(f)
print (df)
   id  val
0   0  NaN
1   0  1.0
2   0  1.5
3   1  NaN
4   1  4.0
5   2  NaN

【讨论】：

我认为 OP 的问题可能需要更大的窗口大小。 2 对于这个数据集就足够了，但是应该设置为df['id'].value_counts().iloc[0]。【参考方案2】：

相信你想要pd.Series.expanding

df.groupby('id').val.apply(lambda x: x.expanding().mean().shift())

0    NaN
1    1.0
2    1.5
3    NaN
4    4.0
5    NaN
Name: val, dtype: float64

【讨论】：

以上是关于使用 pandas 的 groupby 和 shift的主要内容，如果未能解决你的问题，请参考以下文章