[Python Cookbook] Pandas Groupby

Posted Sherrrry

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[Python Cookbook] Pandas Groupby相关的知识,希望对你有一定的参考价值。

Groupby Count

# Party’s Frequency of donations
nyc.groupby(’Party’)[’contb receipt amt’].count()

The command returns a series where the index is the name of a Party and the value is the count of that Party. Note that the series is ordered by the name of Party alphabetically.

 

Multiple Variables

# Party’s Frequency of donations by Date
nyc.groupby([’Party’, ’Date’])[’contb receipt amt’].count()

 

Groupby Sum

# Party’s Sum of donations
nyc.groupby(’Party’)[’contb receipt amt’].sum()

# Define the format of float
pd.options.display.float format = ’{:,.2f}’.format 
nyc.groupby(’Party’)[’contb receipt amt’].sum()

 

Groupby Order

# Top 5 Donors, by Occupation
df7 = nyc.groupby(’contbr occupation’)[’contb receipt amt’]. sum(). reset  index ()
df7.sort_values(’contb receipt amt’, ascending=False, inplace =True)
df7.head(5)
#or
df7.nlargest(5,’contb receipt amt’)

# Bottom 5 Donors, by Occupation
df8 = nyc.groupby(’contbr occupation’)[’contb receipt amt’]. sum() . reset   index ()
df8 . sort_values (by=’ contb receipt amt ’ , inplace=True) df8.head(5)
# OR
df7.tail(5)
#OR
df8.nsmallest(5,’contb receipt amt’)

Get rid of negative values:

df8 [ df8 . contb receipt amt >0].head(5)

The following commands give an example to find the Top 5 occupations that donated to each cadidate. Note that we need to sort the table based on two variables, firtly sorted by candidate name alphabetically and then sorted by contribution amount in a descending order. Finally, we hope to show the Top 5 occupations for each candidate. 

# Top 5 Occupations that donated to Each Candidate
df10 = nyc.groupby ([ ’cand_nm’ , ’contbr_occupation’ ]) [ ’contb_receipt_amt’ ].sum().reset_index ()
df10.sort_values ([ ’cand_nm’ , ’contb_receipt_amt’ ] , ascending =[True , False ], inplace=True)
df10.groupby(’cand_nm’).head(5)

 

Groupby Plot 

#Top 5 Fundraising Candidates Line Graph
df11 = nyc.groupby(’cand_nm’)[’contb_receipt_amt’].sum(). reset_index ()
df11_p = df11.nlargest(5,’contb_receipt_amt’)
df11_g = nyc[nyc.cand_nm.isin(df11_p.cand_nm)][[ ’cand_nm’,’Date’,’contb_receipt_amt’]]
dfpiv=pd.pivot table(df11_g , values=’contb_receipt_amt’, index=[’Date’],columns=[’cand_nm’], aggfunc=np.sum)
dfpiv.loc[\'2016-01-01\':\'2016−01−30\'].plot.line()

 

以上是关于[Python Cookbook] Pandas Groupby的主要内容,如果未能解决你的问题,请参考以下文章

[Python Cookbook] Pandas: 3 Ways to define a DataFrame

Pandas Cookbook -- 06索引对齐

《Pandas CookBook》---- DataFrame基础操作

《Pandas Cookbook》第02章 DataFrame基本操作

Pandas Cookbook -- 08数据清理

《Pandas CookBook》---- 第五章 布尔索引