通过熊猫中的字符串列聚合数据框[重复]

Posted 2023-03-29

技术标签:

【中文标题】通过熊猫中的字符串列聚合数据框[重复]【英文标题】：Aggregating a dataframe by a String column in pandas [duplicate] 【发布时间】：2021-11-05 00:56:05 【问题描述】：

我有一个如下所示的数据框：

dfB
name        value        country
benzene     spice        Australia
benzene     spice        Australia
benzene     spice        Australia
benzene     herbs        Australia
benzene     herbs        Americas
benzene     anise        Poland
methyl      herbs
methyl      herbs        Americas        
methyl      spice        Americas
alcohol     spice        Germany
alcohol     spice        Germany

我想创建一个不同的数据框，它是国家列的聚合，如下所示：

dfB
name        value        country        count
benzene     spice        Australia      3
benzene     herbs        Australia      1
benzene     herbs        Americas       1
benzene     anise        Poland         1
methyl      herbs                       1
methyl      herbs        Americas       1 
methyl      spice        Americas       1
alcohol     spice        Germany        2

这个想法是聚合国家列并为每个唯一的“名称”和“值”组合创建国家列的计数。如果有空格或楠也应该区别对待。

我尝试使用 groupby：

grouped = dfB.groupby(["name", "value", "country"]).agg("country": "count")

但它似乎并没有按照我的意图创建数据框。我该怎么做？

【问题讨论】：

从 groupby 中删除“国家”，或者使用 nunique 代替 agg 如果有空格或 Nan，他也应该区别对待。 - 如果说 3 NaN - 它们应该算作 3 还是 1？跨度> 检查第二个重复的答案。如果相同的“名称”“值”组合有3个Nan/Blanks，则应计为1。使用dfB.groupby(["name", "value", "country"]).size().reset_index(name='count') 【参考方案1】：

使用value_counts或groupby不修改订单：

out = dfB.value_counts(["name", "value", "country"], sort=False, dropna=False) \
         .rename('count').reset_index()
out.loc[out['country'].isna(), 'count'] = 1

out1 = dfB.groupby(["name", "value", "country"], sort=False, dropna=False) \
         .size().reset_index(name='count')
out1.loc[out1['country'].isna(), 'count'] = 1

>>> out
      name  value    country  count
0  alcohol  spice    Germany      2
1  benzene  anise     Poland      1
2  benzene  herbs   Americas      1
3  benzene  herbs  Australia      1
4  benzene  spice  Australia      3
5   methyl  herbs   Americas      1
6   methyl  herbs        NaN      1
7   methyl  spice   Americas      1

>>> out1
      name  value    country  count
0  benzene  spice  Australia      3
1  benzene  herbs  Australia      1
2  benzene  herbs   Americas      1
3  benzene  anise     Poland      1
4   methyl  herbs        NaN      1
5   methyl  herbs   Americas      1
6   methyl  spice   Americas      1
7  alcohol  spice    Germany      2

【讨论】：

以上是关于通过熊猫中的字符串列聚合数据框[重复]的主要内容，如果未能解决你的问题，请参考以下文章

熊猫数据框中的行排序和聚合

分区上的聚合 - 熊猫数据框

将分组的聚合唯一列添加到熊猫数据框

在没有聚合的熊猫数据透视表中重复条目并重命名列行

日期时间列的简化 pandas groupby 聚合[重复]

如何聚合数据框并通过 r 中的重复行对列的值求和