如果在多索引系列中找不到索引，如何返回 NaN？

Posted 2023-03-12

技术标签:

【中文标题】如果在多索引系列中找不到索引，如何返回 NaN？【英文标题】：How return NaN if not found indice in multiindex series? 【发布时间】：2018-02-09 17:48:00 【问题描述】：

我有两个 DataFrames df1 和 df2 有很多列

df1 - [2756003 行 x 44 列]

df2 - [22035 行 x 11 列]

我需要将新列添加到 df2，其中目标列的平均值来自 df1 基于按结果分组（对于 df1 和 df2 中的相同列）

t1 = df1.groupby(['category', 'manufacturer'])
t2=t1[c1].mean()
str1='_'.join(col)
df2[c1+'_'+str1+'_mean']=t2[df2[['category','manufacturer']].as_matrix()].values

IndexError: arrays used as indices must be of integer (or boolean) type

t2 - 存储多索引系列，例如：

category  manufacturer
1         2                0.000000
          4                8.796840
          10               2.312407
          19               1.135094
          24               4.355000

如果我使用现有索引，我会得到预期的结果

In [302]: t2[1, 2]
Out[302]: 0.0

但是如果我调用 t2[410, 332]，其中 332 是制造商的 id，它出现在 df2 中而不出现在 df1 中，我会得到 p>

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我想得到 NaN 而不是像我们得到的那样

df2['manufacturer'].map(t2)

如果只有一列。

【问题讨论】：

【参考方案1】：

使用pd.merge 合并df2 和t2：

df2 = pd.merge(df2, t2.reset_index(), on=['category','manufacturer'], how='left')

因为默认情况下，pd.merge 连接所有共享列，如果 'category' 和 'manufacturer' 是唯一的列df2 和t2.reset_index() 共享，则上面的行可以简化为

df2 = pd.merge(df2, t2.reset_index(), how='left')

import numpy as np
import pandas as pd
np.random.seed(2017)

df1 = pd.DataFrame(np.random.randint(4, size=(100,3)), columns=['category', 'manufacturer', 'col'])

df2 = pd.DataFrame(np.random.randint(1, 5, size=(100,3)), columns=['category', 'manufacturer', 'col2'])

t1 = df1.groupby(['category', 'manufacturer'])
c1 = 'col'
t2 = t1[c1].mean()
col = ['foo', 'bar']
str1='_'.join(col)
t2.name = c1+'_'+str1+'_mean'
df2 = pd.merge(df2, t2.reset_index(), on=['category','manufacturer'], how='left')
print(df2.head())

打印

   category  manufacturer  col2  col_foo_bar_mean
0         1             1     2          1.333333
1         3             4     3               NaN
2         4             4     2               NaN
3         3             3     1          1.000000
4         3             2     1          1.777778

由于这是“左连接”，df2 中没有对应的行 t2 中的行为缺失值的列分配了 NaN。

【讨论】：

在df2 = pd.merge(df2, t2.to_frame(), left_on=['category','manufacturer'], right_index=True, how='left') 之后有AttributeError: 'CategoricalIndex' object has no attribute 'is_dtype_equal' 所以我将这部分修改为

df2 = pd.merge(df2, t2.reset_index(), left_on=['category','manufacturer'], right_on=['category','manufacturer'], how='left')

并且成功了！谢谢很好，感谢您的更正。由于left_on 和right_on 指定了相同的列名，您可以将其简单地命名为on=['category','manufacturer']。如果这些是 2 个 DataFrame 共享的唯一列，您甚至可以完全省略它。

以上是关于如果在多索引系列中找不到索引，如何返回 NaN？的主要内容，如果未能解决你的问题，请参考以下文章