分组熊猫数据框后索引消失
Posted
技术标签:
【中文标题】分组熊猫数据框后索引消失【英文标题】:Index disappears after grouping a pandas dataframe 【发布时间】:2020-09-01 14:54:51 【问题描述】:我有以下熊猫系列:
Reducedset['% Renewable']
这给了我:
Asia China 19.7549
Japan 10.2328
India 14.9691
South Korea 2.27935
Iran 5.70772
North America United States 11.571
Canada 61.9454
Europe United Kingdom 10.6005
Russian Federation 17.2887
Germany 17.9015
France 17.0203
Italy 33.6672
Spain 37.9686
Australia Australia 11.8108
South America Brazil 69.648
Name: % Renewable, dtype: object
然后我将这个系列分为 5 个箱子:
binning = pd.cut(Top15['% Renewable'],5)
这给了我:
Asia China (15.753, 29.227]
Japan (2.212, 15.753]
India (2.212, 15.753]
South Korea (2.212, 15.753]
Iran (2.212, 15.753]
North America United States (2.212, 15.753]
Canada (56.174, 69.648]
Europe United Kingdom (2.212, 15.753]
Russian Federation (15.753, 29.227]
Germany (15.753, 29.227]
France (15.753, 29.227]
Italy (29.227, 42.701]
Spain (29.227, 42.701]
Australia Australia (2.212, 15.753]
South America Brazil (56.174, 69.648]
Name: % Renewable, dtype: category
Categories (5, interval[float64]): [(2.212, 15.753] < (15.753, 29.227] < (29.227, 42.701] <
(42.701, 56.174] < (56.174, 69.648]]
然后我对这些分箱数据进行分组,以计算每个分箱中的国家/地区数量:
Reduced = Reducedset.groupby(binning)['% Renewable'].agg(['count'])
这给了我:
% Renewable
(2.212, 15.753] 7
(15.753, 29.227] 4
(29.227, 42.701] 2
(42.701, 56.174] 0
(56.174, 69.648] 2
Name: count, dtype: int64
但是,索引已消失,我仍想保留“大陆”(外部索引)的索引。
因此,在 (% Renewable) 列的最左侧,它应该说:
Asia
North America
Europe
Australia
South America
当我尝试这样做时:
print(Reducedset['% Renewable'].groupby([Reducedset['% Renewable'].index.get_level_values(0),pd.cut(Reducedset['% Renewable'],5)]).count())
有效!
问题解决了!
【问题讨论】:
@Ben.T 实际上,我想要这个输出:count binning (2.212, 15.753] 7 (15.753, 29.227] 4 (29.227, 42.701] 2 (56.174, 69.648] 2 但是大陆包括索引 【参考方案1】:让我们假设以下数据:
np.random.seed(1)
s = pd.Series(np.random.randint(0,10, 16),
index=pd.MultiIndex.from_arrays([list('aaaabbccdddddeee'),
list('abcdefghijklmnop')]))
那么你在看 IIUC 是什么
print(s.groupby([s.index.get_level_values(0), #that is the continent for you
pd.cut(s, 5)]) #that is the binning you created
.count())
a (-0.009, 1.8] 0
(1.8, 3.6] 0
(3.6, 5.4] 2
(5.4, 7.2] 0
(7.2, 9.0] 2
b (-0.009, 1.8] 2
(1.8, 3.6] 0
(3.6, 5.4] 0
(5.4, 7.2] 0
(7.2, 9.0] 0
c (-0.009, 1.8] 1
(1.8, 3.6] 0
(3.6, 5.4] 0
(5.4, 7.2] 1
(7.2, 9.0] 0
d (-0.009, 1.8] 0
(1.8, 3.6] 1
(3.6, 5.4] 2
(5.4, 7.2] 1
(7.2, 9.0] 1
e (-0.009, 1.8] 0
(1.8, 3.6] 2
(3.6, 5.4] 1
(5.4, 7.2] 0
(7.2, 9.0] 0
dtype: int64
【讨论】:
当我尝试时: print(Reducedset['% Renewable'].groupby([Reducedset['% Renewable'].index.get_level_values(0),pd.cut(Reducedset['% Renewable' ],5)])),我得到:s_count
,那么您可以这样做s_count[s_count>0]
以上是关于分组熊猫数据框后索引消失的主要内容,如果未能解决你的问题,请参考以下文章