Numpy isnan() 在浮点数组上失败（来自 pandas 数据框应用）

Posted 2023-02-23

技术标签:

【中文标题】Numpy isnan() 在浮点数组上失败（来自 pandas 数据框应用）【英文标题】：Numpy isnan() fails on an array of floats (from pandas dataframe apply) 【发布时间】：2016-06-30 06:51:12 【问题描述】：

我有一个浮点数数组（一些正常数字，一些 nans），它们来自对 pandas 数据帧的应用。

由于某种原因，numpy.isnan 在这个数组上失败了，但是如下所示，每个元素都是一个浮点数，numpy.isnan 在每个元素上都正确运行，变量的类型肯定是一个 numpy 数组。

发生了什么事？！

set([type(x) for x in tester])
Out[59]: float

tester
Out[60]: 
array([-0.7000000000000001, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan], dtype=object)

set([type(x) for x in tester])
Out[61]: float

np.isnan(tester)
Traceback (most recent call last):

File "<ipython-input-62-e3638605b43c>", line 1, in <module>
np.isnan(tester)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

set([np.isnan(x) for x in tester])
Out[65]: False, True

type(tester)
Out[66]: numpy.ndarray

【问题讨论】：

【参考方案1】：

np.isnan 可以应用于native dtype的NumPy数组（例如np.float64）：

In [99]: np.isnan(np.array([np.nan, 0], dtype=np.float64))
Out[99]: array([ True, False], dtype=bool)

但在应用于对象数组时会引发 TypeError：

In [96]: np.isnan(np.array([np.nan, 0], dtype=object))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

既然你有 Pandas，你可以改用 pd.isnull -- 它可以接受 NumPy 对象数组或原生 dtypes：

In [97]: pd.isnull(np.array([np.nan, 0], dtype=float))
Out[97]: array([ True, False], dtype=bool)

In [98]: pd.isnull(np.array([np.nan, 0], dtype=object))
Out[98]: array([ True, False], dtype=bool)

请注意，None 在对象数组中也被视为空值。

【讨论】：

谢谢 - 使用 pd.isnull()。似乎也没有任何性能影响。【参考方案2】：

np.isnan() 和 pd.isnull() 的一个很好的替代品是

for i in range(0,a.shape[0]):
    if(a[i]!=a[i]):
       //do something here
       //a[i] is nan

因为只有nan不等于它自己。

【讨论】：

这可能不适用于数组，因为它会引发众所周知的“ValueError: Truth value of a xxx is ambiguous”。 @MSeifert 你在说python吗？我只是用这种方法在机器学习中做了一些事情，为什么我没有遇到众所周知的错误？是的，好像你以前没有使用过 numpy 或 pandas。只需使用 import numpy as np; a = np.array([1,2,3, np.nan]) 并运行您的代码。 @MSeifert er，我是 numpy 新手，但代码运行正常，没有发生错误在 [1] 中：将 numpy 导入为 np 在 [2] 中：a=np.array([1,2,3,np.nan]) 在 [3] 中：打印 [1。 2. 3. nan] 在[4]中：打印a[3]==a[3] False【参考方案3】：

在@unutbu 的回答之上，您可以将 pandas numpy 对象数组强制转换为原生 (float64) 类型，类似

import pandas as pd
pd.to_numeric(df['tester'], errors='coerce')

指定 errors='coerce' 以强制无法解析为数值的字符串变为 NaN。列类型为dtype: float64，然后isnan 检查应该可以工作

【讨论】：

他的名字好像是unutbu ;) @Dr_Zaszuś 谢谢，已修复【参考方案4】：

确保您使用 Pandas 导入 csv 文件

import pandas as pd

condition = pd.isnull(data[i][j])

【讨论】：

【参考方案5】：

回答这个问题是为了提醒自己。我花了一整天的时间来解决。深入挖掘代码后，发现在_encodepy.py：

if values.dtype.kind in 'UO':
    # correct branch
else
    # wrong branch, if in this branch whatever data you give it will produce the error
    if np.isnan(known_values).any(): # here is problematic line

所以解决方法很简单，只需astype你的数据和np.object

【讨论】：

以上是关于Numpy isnan() 在浮点数组上失败（来自 pandas 数据框应用）的主要内容，如果未能解决你的问题，请参考以下文章