pandas 结合两个 dfs：如果任一 df 在该索引处具有空值，则不保留

Posted 2023-03-11

技术标签:

【中文标题】pandas 结合两个 dfs：如果任一 df 在该索引处具有空值，则不保留【英文标题】：pandas combine two dfs: leave none if either df has null value at that index 【发布时间】：2017-06-08 06:35:41 【问题描述】：

我想以这样的方式组合两个数据帧，如果对于其中一个 df，索引处的值为 null，它应该保留 null 值。例如，在下面的 df_combine 代码中，yyy 列的索引 0 处，它应该保留 np.nan。

df1= pd.DataFrame.from_dict('xxx':['ind',np.nan,'ind'],'yyy':[np.nan,'pin','din'], orient='columns')
    >>df1
       xxx  yyy
    0  ind  NaN
    1  NaN  pin
    2  ind  din

df2= pd.DataFrame.from_dict('xxx':['0.12','0.15','0.18','8.1'],'yyy':['9.2','1.1',np.nan,'7.1'], orient='columns')
    >>df2
        xxx  yyy
    0  0.12  9.2
    1  0.15  1.1
    2  0.18  NaN
    3   8.1  7.1

期望的输出：

 >>df_combine
             xxx       yyy
    0    ind||0.12   np.nan
    1     np.nan     pin||1.1
    2    ind||0.18     din

【问题讨论】：

请查看我的编辑。我添加了我想要的输出。 【参考方案1】：

IIUC 你可以这样做：

In [92]: df1.add('||').add(df2.values)
Out[92]:
         xxx       yyy
0  ind||0.12       NaN
1        NaN  pin||1.1
2  ind||0.18       NaN

设置：

In [86]: df1= pd.DataFrame.from_dict('xxx':['ind',np.nan,'ind'],'yyy':[np.nan,'pin','din'], orient='columns')

In [87]: df2= pd.DataFrame.from_dict('xxx':['0.12','0.15','0.18'],'yyy':['9.2','1.1',np.nan], orient='columns')

In [88]: df1
Out[88]:
   xxx  yyy
0  ind  NaN
1  NaN  pin
2  ind  din

In [89]: df2
Out[89]:
    xxx  yyy
0  0.12  9.2
1  0.15  1.1
2  0.18  NaN

更新：

In [126]: df1.add('||').add(df2.iloc[:len(df1)].values)
Out[126]:
         xxx       yyy
0  ind||0.12       NaN
1        NaN  pin||1.1
2  ind||0.18       NaN

【讨论】：

绝妙的解决方案。我不知道添加。如果仅在两个具有不同维度的 dfs 上工作，我本可以接受这个答案，而我的真实数据就是这种情况。假设我在 df2 中再添加一行，它将不起作用。我已经修改了我原来的问题。 @Rtut，你想要的数据集呢？还是一样吗？如果对某人有用。对于我的实际数据，我做了类似的事情来包括基于索引的搜索。我仍在测试它的效果如何。 df2_subset= df2.loc[df1.index] , df1.add('||').add(df2_subset.values)

以上是关于pandas 结合两个 dfs：如果任一 df 在该索引处具有空值，则不保留的主要内容，如果未能解决你的问题，请参考以下文章