使用 selection & groupby (python) 维护 pandas df 索引

Posted 2023-03-11

技术标签:

【中文标题】使用 selection & groupby (python) 维护 pandas df 索引【英文标题】：maintaining pandas df index with selection & groupby (python) 【发布时间】：2022-01-09 08:34:27 【问题描述】：

在子选择某些 df 后，在给定 groupby 条件的情况下返回行的原始 df 索引时遇到问题。通过代码更容易理解。所以如果我们从一个玩具数据框开始：

headers = ['a','b']
nrows = 8
df = pd.DataFrame(columns = headers)
df['a'] = [0]*(nrows//2) + [1]*(nrows//2)
df['b'] = [2]*(nrows//4) + [4]*(nrows//4) + [2]*(nrows//4) + [4]*(nrows//4)
print(df)

然后我选择我感兴趣的数据子集并检查索引是否保留：

sub_df = df[df['a']==1]    ## selects for only group 1 (indices 4-7)
print(sub_df.index)        ## looks good so far

sub_df.index 返回

Int64Index([4, 5, 6, 7], dtype='int64')

这看起来很棒！我想从该子集中对数据进行分组并提取原始 df 索引，这就是问题发生的地方：例如：

g_df = sub_df.groupby('b')
g_df_idx = g_df.indices 
print(g_df_idx)          ## bad!

当我打印时（g_df_idx）我希望它返回：

2: 数组([4,5]), 4: 数组([6,7])

由于我将使用此代码的方式，我不能只使用 groupby(['a','b'])

我对这件事要发疯了。以下是我尝试过的众多解决方案中的一些：


## 1 
e1_idx = sub_df.groupby('b').indices
# print(e1_idx)                          ## issue persists

## 2
e2 = sub_df.groupby('b', as_index = True) ## also tried as_index = False 
e2_idx = e2.indices 
# print(e2_idx)                          ## issue persists

## 3
e3 = sub_df.reset_index()
e3_idx = e3.groupby('b').indices
# print(e3_idx)                          ## issue persists

我确信一定有一些我只是忽略的简单解决方案。非常感谢您的任何建议。

【问题讨论】：

【参考方案1】：

你可以这样做

g_df_idx = g_df.apply(lambda x: x.index).to_dict()
print(g_df_idx)
# 2: Int64Index([4, 5], dtype='int64'), 4: Int64Index([6, 7], dtype='int64')

【讨论】：

以上是关于使用 selection & groupby (python) 维护 pandas df 索引的主要内容，如果未能解决你的问题，请参考以下文章

Mysql显示其他不同的单元格数据与同组下的组