有没有一种有效的方法来计算 Pandas 中的列值，使用基于其他列的条件值的前行的值？

Posted 2023-03-12

技术标签:

【中文标题】有没有一种有效的方法来计算 Pandas 中的列值，使用基于其他列的条件值的前行的值？【英文标题】：Is there an efficient way to compute column values in Pandas using values from previous rows based on conditional values from other columns? 【发布时间】：2022-01-10 18:19:50 【问题描述】：

考虑循环遍历我的 DataFrame：

import pandas as pd

df = pd.DataFrame(
    'Price': [1000, 1000, 1000, 2000, 2000, 2000, 2000, 1400, 1400],
    'Count': [0, 0, 0, 0, 0, 0, 0, 0, 0]
)

for idx in df.index:
    if df['Price'].iloc[idx] > 1500:
        if idx > 0:
            df['Count'].iloc[idx] = df['Count'].iloc[idx - 1] + 1

导致：

	Price	Count
0	1000	0
1	1000	0
2	1000	0
3	2000	1
4	2000	2
5	2000	3
6	2000	4
7	1400	0
8	1400	0

有没有更有效的方法来做到这一点？

【问题讨论】：

【参考方案1】：

使用Series.cumsum 创建伪组，然后使用groupby.cumcount 生成组内计数：

groups = df.Price.le(1500).cumsum()
df['Count'] = df.Price.gt(1500).groupby(groups).cumcount()

#    Price  Count
# 0   1000      0
# 1   1000      0
# 2   1000      0
# 3   2000      1
# 4   2000      2
# 5   2000      3
# 6   2000      4
# 7   1400      0
# 8   1400      0

【讨论】：

【参考方案2】：

使用mask 隐藏低于1500 的值并使用cumsum 创建计数器：

df['Count'] = df.mask(df['Price'] <= 1500)['Count'].add(1).cumsum().fillna(0).astype(int)
print(df)

# Output:
   Price  Count
0   1000      0
1   1000      0
2   1000      0
3   2000      1
4   2000      2
5   2000      3
6   2000      4
7   1400      0
8   1400      0

【讨论】：

谢谢。您的解决方案适用于我的示例。但是，如果您在我的 DataFrame 中再添加两行，价格分别为 3000 和 3000，您的解决方案将继续计算 5、6。我需要从 1、2 重新开始。

以上是关于有没有一种有效的方法来计算 Pandas 中的列值，使用基于其他列的条件值的前行的值？的主要内容，如果未能解决你的问题，请参考以下文章