Groupby对python中的多列求和并计数

Posted 2023-03-11

技术标签:

【中文标题】Groupby对python中的多列求和并计数【英文标题】：Groupby sum and count on multiple columns in python 【发布时间】：2018-07-23 22:22:03 【问题描述】：

我有一个看起来像这样的 pandas 数据框

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

我想按 ID、国家、月份进行分组，并计算每个月和国家/地区的 ID，并将收入、利润、ebit 相加。上述数据的输出将是：

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

我尝试了 pandas 的 groupby、sum 和 count 函数的不同变体，但我无法弄清楚如何将 groupby sum 和 count 一起应用以得到如图所示的结果。请分享您可能有的任何想法。谢谢！

【问题讨论】：

你能把尝试的代码也贴出来吗，你试过什么方法？如果答案有帮助，别忘了accept 它 - 点击答案旁边的复选标记 (v) 将其从灰色切换为已填充。谢谢。 【参考方案1】：

可以这样使用pivot_table：

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

更新

可以使用pivot_table 单行完成，并提供一个函数字典以应用于aggfunc 参数中的每一列：

pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc='revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len
).rename(columns='ID': 'count')

                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19

【讨论】：

太棒了！这绝对有效！感谢您的帮助！你介意调查一下吗？它变得更加棘手！ ***.com/questions/48785833/…【参考方案2】：

以下解决方案似乎是最简单的。

按国家和月份分组：

grouped_df = df.groupby(['country', 'month'])

将总和应用于感兴趣的列（收入、利润、息税前利润）：

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

将 grouped_df 的大小分配给 'final' 中的新列：

final['count'] = grouped_df.size()
print(final)

Out[256]: 
                revenue  profit  ebit  count
country month                               
Canada  201411       15      10     5      1
UK      201410       20      10     5      1
USA     201409       19      12     5      2

全部完成！

【讨论】：

【参考方案3】：

您可以进行分组，然后将每个国家/地区的计数映射到一个新列。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:


    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

编辑

要获得每个国家和月份的计数，您可以进行另一个 groupby，然后将两个 DataFrame 连接在一起。

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()

Out[6]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   UK       201411  10       5       2     1
3   USA      201409  19       12      5     2

我为英国添加了另一条日期不同的记录 - 请注意合并的 DataFrame 中现在有两个英国条目，具有适当的计数。

【讨论】：

感谢您的帮助本。但是这个解决方案没有考虑到月份。我需要针对国家和月份的每个唯一组合的所有 ID 的计数值。太棒了！谢谢！

以上是关于Groupby对python中的多列求和并计数的主要内容，如果未能解决你的问题，请参考以下文章

GroupBy 多列作为键并对多列求和，如 sql？

Python Pandas 对多列进行值计数并根据结果生成图表

如何访问python groupby对象值

如何对sql查询引用组中的多列求和

具有多列的groupby，在pandas中具有添加和频率计数[重复]

使用 pandas 在数据帧上执行 groupby，按计数排序并获取 python 中的前 2 个计数