Pandas Python Groupby 累积和反向
Posted
技术标签:
【中文标题】Pandas Python Groupby 累积和反向【英文标题】:Pandas Python Groupby Cummulative Sum Reverse 【发布时间】:2018-02-28 10:10:15 【问题描述】:我找到了Pandas groupby cumulative sum,发现它非常有用。但是,我想确定如何计算反向累积和。
该链接建议以下内容。
df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()
为了求和,我尝试对数据进行切片,但失败了。
df.groupby(by=['name','day']).ix[::-1, 'no'].sum().groupby(level=[0]).cumsum()
Jack | Monday | 10 | 90
Jack | Tuesday | 30 | 80
Jack | Wednesday | 50 | 50
Jill | Monday | 40 | 80
Jill | Wednesday | 40 | 40
编辑: 根据反馈,我尝试实现代码并使数据框更大:
import pandas as pd
df = pd.DataFrame(
'name': ['Jack', 'Jack', 'Jack', 'Jill', 'Jill'],
'surname' : ['Jones','Jones','Jones','Smith','Smith'],
'car' : ['VW','Mazda','VW','Merc','Merc'],
'country' : ['UK','US','UK','EU','EU'],
'year' : [1980,1980,1980,1980,1980],
'day': ['Monday', 'Tuesday','Wednesday','Monday','Wednesday'],
'date': ['2016-02-31','2016-01-31','2016-01-31','2016-01-31','2016-01-31'],
'no': [10,30,50,40,40],
'qty' : [100,500,200,433,222])
然后,我尝试对多个列进行分组,但无法应用分组。
df = df.groupby(by=['name','surname','car','country','year','day','date']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1].reset_index()
为什么会这样?我希望开着马自达车的杰克·琼斯与开着大众汽车的杰克·琼斯是一个单独的累积数量。
【问题讨论】:
@BradSolomon,不幸的是,您引用的链接没有显示如何包含分组依据。请查看我更新的帖子,让我知道这是否更清楚 - 我似乎无法让小组工作。 【参考方案1】:你可以使用双iloc
:
df = df.groupby(by=['name','day']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1]
print (df)
no
name day
Jack Monday 90
Tuesday 80
Wednesday 50
Jill Monday 80
Wednesday 40
对于另一列的解决方案是简化:
df = df.groupby(by=['name','day']).sum()
df['new'] = df.iloc[::-1].groupby(level=[0]).cumsum()
print (df)
no new
name day
Jack Monday 10 90
Tuesday 30 80
Wednesday 50 50
Jill Monday 40 80
Wednesday 40 40
编辑:
第二个groupby
有问题需要附加更多级别 - level=[0,1,2]
表示按第一个name
、第二个surname
和第三个car
级别分组。
df1 = (df.groupby(by=['name','surname','car','country','year','day','date'])
.sum())
print (df1)
no qty
name surname car country year day date
Jack Jones Mazda US 1980 Tuesday 2016-01-31 30 500
VW UK 1980 Monday 2016-02-31 10 100
Wednesday 2016-01-31 50 200
Jill Smith Merc EU 1980 Monday 2016-01-31 40 433
Wednesday 2016-01-31 40 222
df2 = (df.groupby(by=['name','surname','car','country','year','day','date'])
.sum()
.iloc[::-1]
.groupby(level=[0,1,2])
.cumsum()
.iloc[::-1]
.reset_index())
print (df2)
name surname car country year day date no qty
0 Jack Jones Mazda US 1980 Tuesday 2016-01-31 30 500
1 Jack Jones VW UK 1980 Monday 2016-02-31 60 300
2 Jack Jones VW UK 1980 Wednesday 2016-01-31 50 200
3 Jill Smith Merc EU 1980 Monday 2016-01-31 80 655
4 Jill Smith Merc EU 1980 Wednesday 2016-01-31 40 222
或者可以按名称选择 - 请参阅 groupby enhancements in 0.20.1+:
df2 = (df.groupby(by=['name','surname','car','country','year','day','date'])
.sum()
.iloc[::-1]
.groupby(['name','surname','car'])
.cumsum()
.iloc[::-1]
.reset_index())
print (df2)
name surname car country year day date no qty
0 Jack Jones Mazda US 1980 Tuesday 2016-01-31 30 500
1 Jack Jones VW UK 1980 Monday 2016-02-31 60 300
2 Jack Jones VW UK 1980 Wednesday 2016-01-31 50 200
3 Jill Smith Merc EU 1980 Monday 2016-01-31 80 655
4 Jill Smith Merc EU 1980 Wednesday 2016-01-31 40 222
【讨论】:
谢谢@jezarel - 这有帮助,但我还有另一个问题。请看我编辑的帖子?以上是关于Pandas Python Groupby 累积和反向的主要内容,如果未能解决你的问题,请参考以下文章
Pandas DataFrame groupby,跨列计数和求和