如果我有重复的日期，如何用 pandas 中两个日期之间计算的值填充一列？

Posted 2023-03-11

技术标签:

【中文标题】如果我有重复的日期，如何用 pandas 中两个日期之间计算的值填充一列？【英文标题】：How can I fill a column with values that are computed between two dates in pandas, if I have repeating dates? 【发布时间】：2021-12-26 23:23:36 【问题描述】：

这个问题是this one 的变体，唯一的区别是日期可以在 DataFrame 的行中重复。因此，示例将是：

Date	Position	TrainerID	Win%
2017-09-03	4	1788	0 (0 wins, 1 race)
2017-09-16	5	1788	0 (0 wins, 2 races)
2017-10-14	1	1788	33 (1 win, 3 races)
2017-10-14	3	1788	25 (1 win, 4 races)

是否可以在这些条件下计算过去 1000 天的 Win%？如果有，怎么做？

【问题讨论】：

【参考方案1】：

其他方案中的逻辑还是正确的；问题是groupby + rolling 破坏了索引，因此将结果与原始数据帧对齐变得有问题。

在这种情况下，您可以.reset_index 并使用max（假设是 RangeIndex）来带来原始索引。这允许您聚合，然后将结果对齐。

我在最后添加了一行，向您展示它如何强制执行 1000 天窗口。

# If your DataFrame doesn't have a RangeIndex this is required for the logic
#df = df.reset_index(drop=True)

df['win'] = df['Position'].eq(1) 

s = (df.reset_index().groupby('TrainerID')
       .rolling('1000D', on='Date')
       .agg('win': 'mean', 'index': 'max')
       .reset_index(drop=True)
       .set_index('index')
       .mul(100))  
#              win
#index            
#0.0      0.000000
#1.0      0.000000
#2.0     33.333333
#3.0     25.000000
#4.0    100.000000

df['Win %'] = s

print(df)
        Date  Position  TrainerID    win       Win %
0 2017-09-03         4       1788  False    0.000000
1 2017-09-16         5       1788  False    0.000000
2 2017-10-14         1       1788   True   33.333333
3 2017-10-14         3       1788  False   25.000000
4 2027-10-14         1       1788   True  100.000000

【讨论】：

如果我想计算过去 1000 天的数据？ @BogdanDoicin 知道了，现在应该修复了。正是我想要的。谢谢！

以上是关于如果我有重复的日期，如何用 pandas 中两个日期之间计算的值填充一列？的主要内容，如果未能解决你的问题，请参考以下文章

如何用零填充 pandas groupby 列表的缺失日期？

用于各种日期的 Python/Pandas 正则表达式 [重复]

如何用日期时间索引抵消 Pandas Pearson 相关性

如何用PHP+MySQL计算两个日期之间的月份数和其余下的天数？

如何用PowerBI实现分时对比功能？

如何用datedif函数计算起止日期间相隔的时间