pandas 合并两个多层次系列

Posted

技术标签:

【中文标题】pandas 合并两个多层次系列【英文标题】:pandas merging two multi-level series 【发布时间】:2016-11-24 13:01:13 【问题描述】:

我有两个多级 Series 并希望根据两个索引合并它们。第一个Series 看起来像这样:

                                              # of restaurants    
BORO           CUISINE      
BRONX          American                                425
               Chinese                                 330
               Pizza                                   206 
***LYN       American                               1254
               Chinese                                 750
               Cafe/Coffee/Tea                         350

第二个有更多的行,是这样的:

                                                # of votes    
BORO           CUISINE      
BRONX          American                                2425
               Caribbean                               320
               Chinese                                 3130
               Pizza                                   3336 
***LYN       American                               21254
               Caribbean                               2320
               Chinese                                 7250
               Cafe/Coffee/Tea                         3350
               Pizza                                   13336 

【问题讨论】:

【参考方案1】:

设置:

s1 = pd.Series(('BRONX', 'American'): 425, ('***LYN', 'Chinese'): 750, ('***LYN', 'Cafe/Coffee/Tea'): 350, ('BRONX', 'Pizza'): 206, ('***LYN', 'American'): 1254, ('BRONX', 'Chinese'): 330)
s2 = pd.Series(('BRONX', 'Caribbean'): 320, ('BRONX', 'American'): 2425, ('***LYN', 'Chinese'): 7250, ('***LYN', 'Cafe/Coffee/Tea'): 3350, ('BRONX', 'Pizza'): 3336, ('***LYN', 'American'): 21254, ('***LYN', 'Pizza'): 13336, ('BRONX', 'Chinese'): 3130, ('***LYN', 'Caribbean'): 2320)
s1 = s1.rename_axis(['BORO','CUISINE']).rename('restaurants')
s2 = s2.rename_axis(['BORO','CUISINE']).rename('votes')


print (s1)
BORO      CUISINE        
BRONX     American            425
          Chinese             330
          Pizza               206
***LYN  American           1254
          Chinese             750
          Cafe/Coffee/Tea     350
Name: restaurants, dtype: int64

print (s2)
BORO      CUISINE        
BRONX     American            2425
          Caribbean            320
          Chinese             3130
          Pizza               3336
***LYN  American           21254
          Caribbean           2320
          Chinese             7250
          Cafe/Coffee/Tea     3350
          Pizza              13336
Name: votes, dtype: int64

如果需要inner join,请使用concat 和参数join

print (pd.concat([s1,s2], axis=1, join='inner'))
                          restaurants  votes
BORO     CUISINE                            
BRONX    American                 425   2425
         Chinese                  330   3130
         Pizza                    206   3336
***LYN American                1254  21254
         Cafe/Coffee/Tea          350   3350
         Chinese                  750   7250

#join='outer' is by default, so can be omited
print (pd.concat([s1,s2], axis=1))
                          restaurants  votes
BORO     CUISINE                            
BRONX    American               425.0   2425
         Caribbean                NaN    320
         Chinese                330.0   3130
         Pizza                  206.0   3336
***LYN American              1254.0  21254
         Cafe/Coffee/Tea        350.0   3350
         Caribbean                NaN   2320
         Chinese                750.0   7250
         Pizza                    NaN  13336

另一种解决方案是使用mergereset_index

#by default how='inner', so can be omited
print (pd.merge(s1.reset_index(), s2.reset_index(), on=['BORO','CUISINE']))
       BORO          CUISINE  restaurants  votes
0     BRONX         American          425   2425
1     BRONX          Chinese          330   3130
2     BRONX            Pizza          206   3336
3  ***LYN         American         1254  21254
4  ***LYN          Chinese          750   7250
5  ***LYN  Cafe/Coffee/Tea          350   3350

#outer join
print (pd.merge(s1.reset_index(), s2.reset_index(), on=['BORO','CUISINE'], how='outer'))
       BORO          CUISINE  restaurants  votes
0     BRONX         American        425.0   2425
1     BRONX          Chinese        330.0   3130
2     BRONX            Pizza        206.0   3336
3  ***LYN         American       1254.0  21254
4  ***LYN          Chinese        750.0   7250
5  ***LYN  Cafe/Coffee/Tea        350.0   3350
6     BRONX        Caribbean          NaN    320
7  ***LYN        Caribbean          NaN   2320
8  ***LYN            Pizza          NaN  13336

【讨论】:

以上是关于pandas 合并两个多层次系列的主要内容,如果未能解决你的问题,请参考以下文章

在数据框中合并两个熊猫系列而不显式使用键

Pandas pd.cut() - 合并日期时间列/系列

Pandas - 我想要的只是一个单一的系列值输出,而不是其他元数据

在 Pandas 中合并数据后重命名列

将 pandas DataFrame 与 Series 合并

Pandas系列(十八)- 多重索引