分组熊猫数据框后索引消失

Posted

技术标签:

【中文标题】分组熊猫数据框后索引消失【英文标题】:Index disappears after grouping a pandas dataframe 【发布时间】:2020-09-01 14:54:51 【问题描述】:

我有以下熊猫系列

Reducedset['% Renewable']

这给了我:

Asia           China                 19.7549
               Japan                 10.2328
               India                 14.9691
               South Korea           2.27935
               Iran                  5.70772
North America  United States          11.571
               Canada                61.9454
Europe         United Kingdom        10.6005
               Russian Federation    17.2887
               Germany               17.9015
               France                17.0203
               Italy                 33.6672
               Spain                 37.9686
Australia      Australia             11.8108
South America  Brazil                 69.648
Name: % Renewable, dtype: object

然后我将这个系列分为 5 个箱子:

binning = pd.cut(Top15['% Renewable'],5)

这给了我:

Asia           China                 (15.753, 29.227]
               Japan                  (2.212, 15.753]
               India                  (2.212, 15.753]
               South Korea            (2.212, 15.753]
               Iran                   (2.212, 15.753]
North America  United States          (2.212, 15.753]
               Canada                (56.174, 69.648]
Europe         United Kingdom         (2.212, 15.753]
               Russian Federation    (15.753, 29.227]
               Germany               (15.753, 29.227]
               France                (15.753, 29.227]
               Italy                 (29.227, 42.701]
               Spain                 (29.227, 42.701]
Australia      Australia              (2.212, 15.753]
South America  Brazil                (56.174, 69.648]
Name: % Renewable, dtype: category
Categories (5, interval[float64]): [(2.212, 15.753] < (15.753, 29.227] < (29.227, 42.701] <
                                    (42.701, 56.174] < (56.174, 69.648]]

然后我对这些分箱数据进行分组,以计算每个分箱中的国家/地区数量:

 Reduced = Reducedset.groupby(binning)['% Renewable'].agg(['count'])

这给了我:

% Renewable
(2.212, 15.753]     7
(15.753, 29.227]    4
(29.227, 42.701]    2
(42.701, 56.174]    0
(56.174, 69.648]    2
Name: count, dtype: int64

但是,索引已消失,我仍想保留“大陆”(外部索引)的索引。

因此,在 (% Renewable) 列的最左侧,它应该说:

Asia
North America 
Europe
Australia
South America 

当我尝试这样做时:

print(Reducedset['% Renewable'].groupby([Reducedset['% Renewable'].index.get_level_values(0),pd.cut(Reducedset['% Renewable'],5)]).count())

有效!

问题解决了!

【问题讨论】:

@Ben.T 实际上,我想要这个输出:count binning (2.212, 15.753] 7 (15.753, 29.227] 4 (29.227, 42.701] 2 (56.174, 69.648] 2 但是大陆包括索引 【参考方案1】:

让我们假设以下数据:

np.random.seed(1)
s = pd.Series(np.random.randint(0,10, 16), 
              index=pd.MultiIndex.from_arrays([list('aaaabbccdddddeee'), 
                                               list('abcdefghijklmnop')]))

那么你在看 IIUC 是什么

print(s.groupby([s.index.get_level_values(0), #that is the continent for you
                 pd.cut(s, 5)]) #that is the binning you created
       .count())
a  (-0.009, 1.8]    0
   (1.8, 3.6]       0
   (3.6, 5.4]       2
   (5.4, 7.2]       0
   (7.2, 9.0]       2
b  (-0.009, 1.8]    2
   (1.8, 3.6]       0
   (3.6, 5.4]       0
   (5.4, 7.2]       0
   (7.2, 9.0]       0
c  (-0.009, 1.8]    1
   (1.8, 3.6]       0
   (3.6, 5.4]       0
   (5.4, 7.2]       1
   (7.2, 9.0]       0
d  (-0.009, 1.8]    0
   (1.8, 3.6]       1
   (3.6, 5.4]       2
   (5.4, 7.2]       1
   (7.2, 9.0]       1
e  (-0.009, 1.8]    0
   (1.8, 3.6]       2
   (3.6, 5.4]       1
   (5.4, 7.2]       0
   (7.2, 9.0]       0
dtype: int64

【讨论】:

当我尝试时: print(Reducedset['% Renewable'].groupby([Reducedset['% Renewable'].index.get_level_values(0),pd.cut(Reducedset['% Renewable' ],5)])),我得到: @Ben.T 啊,是的!忘了那个!傻我! 我现在唯一的问题是计数列没有列标签,我怎样才能在该列中添加标签“计数”? 谢谢,实际上我想索引计数的原因是因为我想删除 count = 0 的 bin,有没有另一种方法可以在不转换为数据帧的情况下做到这一点? @Caledonian26 假设带有零的系列称为s_count,那么您可以这样做s_count[s_count&gt;0]

以上是关于分组熊猫数据框后索引消失的主要内容,如果未能解决你的问题,请参考以下文章

熊猫数据框的索引目前是两列的“分组依据”,但我希望它们作为自己的列,并带有标准数字索引

如何按索引级别和值对分组的多索引熊猫系列进行排序?

如果日期时间索引的差异小于熊猫系列的 5 分钟,则分组

根据索引的接近程度拆分熊猫数据框

如果列表中的索引,熊猫按功能分组以执行不同的方法

根据熊猫数据框中的列标签对数据进行分组