将列从多索引堆叠到单索引

Posted

技术标签:

【中文标题】将列从多索引堆叠到单索引【英文标题】:stacking columns from Multi-index to Single index 【发布时间】:2020-02-18 18:38:17 【问题描述】:

我有这张表,列中有 Multiindex,它是从 Excel

读取的
  Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0   2017                        2018                     
              Person           Material            Country DEMAND HITS MISSES FILLRATE DEMAND HITS MISSES FILLRATE
0            Person1          Material1              Spain      0    0      0        0      0    0      0        0
1            Person1          Material1             France      0    0      0        0      0    0      0        0
2            Person1          Material1              India      0    0      0        0      5    5      0        1
3            Person1          Material1              China      0    0      0        0      0    0      0        0
4            Person2          Material2              Spain      0    0      0        0      0    0      0        0
5            Person2          Material2             France      0    0      0        0      0    0      0        0
6            Person2          Material2              India      0    0      0        0      5    5      0        1
7            Person2          Material2              China      0    0      0        0      0    0      0        0

我已经复制了下面的结构,以防它有助于可能的解决方案

# Column Multi-Index
col_idx_arr = list(zip(['', '', '', '2017', '2017', '2017', '2017', '2018', '2018', '2018', '2018'],
                       ['Person', 'Material', 'Country', 'DEMAND', 'HITS', 'MISSES', 'FILLRATE', 'DEMAND', 'HITS', 'MISSES', 'FILLRATE']))

col_idx = pd.MultiIndex.from_tuples(col_idx_arr)

# Create the DataFrame
df = pd.DataFrame('-', index=range(10), columns=col_idx)

我正在尝试将其转换为 “部分”堆叠一些列

      Date   Person   Material Country  DEMAND  HITS  MISSES  FILLRATE
0   2017.0  Person1  Material1   Spain       0     0       0         0
1      NaN  Person1  Material1  France       0     0       0         0
2      NaN  Person1  Material1   India       0     0       0         0
3      NaN  Person1  Material1   China       0     0       0         0
4      NaN  Person2  Material2   Spain       0     0       0         0
5      NaN  Person2  Material2  France       0     0       0         0
6      NaN  Person2  Material2   India       0     0       0         0
7      NaN  Person2  Material2   China       0     0       0         0
8   2018.0  Person1  Material1   Spain       0     0       0         0
9      NaN  Person1  Material1  France       0     0       0         0
10     NaN  Person1  Material1   India       5     5       0         1
11     NaN  Person1  Material1   China       0     0       0         0
12     NaN  Person2  Material2   Spain       0     0       0         0
13     NaN  Person2  Material2  France       0     0       0         0
14     NaN  Person2  Material2   India       5     5       0         1
15     NaN  Person2  Material2   China       0     0       0         0

【问题讨论】:

【参考方案1】:

IIUC,这可能行得通

(df.set_index(list(df.columns[:3]))
   .stack(level=0)
   .reset_index()
   .rename(columns=lambda x: x[1] if x[0]=='' else x)
   .sort_values('level_3')
)

输出:

     Person   Material Country level_3  DEMAND  FILLRATE  HITS  MISSES
0   Person1  Material1   Spain    2017       0         0     0       0
2   Person1  Material1  France    2017       0         0     0       0
4   Person1  Material1   India    2017       0         0     0       0
6   Person1  Material1   China    2017       0         0     0       0
8   Person2  Material2   Spain    2017       0         0     0       0
10  Person2  Material2  France    2017       0         0     0       0
12  Person2  Material2   India    2017       0         0     0       0
14  Person2  Material2   China    2017       0         0     0       0
1   Person1  Material1   Spain    2018       0         0     0       0
3   Person1  Material1  France    2018       0         0     0       0
5   Person1  Material1   India    2018       5         1     5       0
7   Person1  Material1   China    2018       0         0     0       0
9   Person2  Material2   Spain    2018       0         0     0       0
11  Person2  Material2  France    2018       0         0     0       0
13  Person2  Material2   India    2018       5         1     5       0
15  Person2  Material2   China    2018       0         0     0       0

【讨论】:

这绝对是工作,我明白我错过了什么。谢谢!

以上是关于将列从多索引堆叠到单索引的主要内容,如果未能解决你的问题,请参考以下文章

当索引依赖于该列而不重新创建索引时,如何将列从 null 更改为非 null?

为啥我在多索引的索引中有空项目

熊猫将第一个多索引转换为行索引,将第二个多索引转换为列索引

Pandas 多索引数据框 - 从多索引中的一个索引中选择最大值

排序多索引数据框保持索引排序

Elasticsearch - 单索引与多索引