TypeError：参数必须是字符串或数字列上的字符串或数字

Posted 2023-03-12

技术标签:

【中文标题】TypeError：参数必须是字符串或数字列上的字符串或数字【英文标题】：TypeError: argument must be a string or number on column with strings that are numbers 【发布时间】：2020-10-27 13:15:40 【问题描述】：

我有一个包含类别的数据集。在第 4 列中，我有 2 个值（两个和四个是字符串）。你知道我为什么会收到错误以及如何解决它吗？TypeError: argument must be a string or number

Traceback (most recent call last):

  File "C:..".py", line 112, in _encode
    res = _encode_python(values, uniques, encode)

  File "C:...py", line 60, in _encode_python
    uniques = sorted(set(values))

TypeError: '<' not supported between instances of 'str' and 'float'

在处理上述异常的过程中，又发生了一个异常：

Traceback (most recent call last):

  File "C...".py", line 35, in <module>
    X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])

  File "C:...py", line 252, in fit_transform
    self.classes_, y = _encode(y, encode=True)

  File "C:....py", line 114, in _encode
    raise TypeError("argument must be a string or number")

TypeError: argument must be a string or number

代码：

import numpy as np #mathematical tools
import matplotlib.pyplot as plt #plot nice charts
import pandas as pd #import and manage data sets

# Making a list of missing value types
missing_values = ["?"]
df= pd.read_csv('D:\\data.csv',na_values = missing_values)

#print the new table with the missing values 
# print (df)
# print (df.isnull())


X = df.iloc[:, :-1].values #Matrix - independent variables (features)
y = df.iloc[:, 24].values #dependent variables vectors


from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X2 = LabelEncoder()
X[:, 2] = labelencoder_X2.fit_transform(X[:, 2]) #gas=0, fuel=1 

labelencoder_X3 = LabelEncoder()
X[:, 3] = labelencoder_X3.fit_transform(X[:, 3])

#I get an error her
labelencoder_X4 = LabelEncoder()
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])

labelencoder_X5 = LabelEncoder()
X[:, 5] = labelencoder_X5.fit_transform(X[:,5])

labelencoder_X6 = LabelEncoder()
X[:, 6] = labelencoder_X6.fit_transform(X[:, 6])

labelencoder_X7 = LabelEncoder()
X[:, 7] = labelencoder_X7.fit_transform(X[:, 7])

labelencoder_X13 = LabelEncoder()
X[:, 13] = labelencoder_X13.fit_transform(X[:, 13])

labelencoder_X14 = LabelEncoder()
X[:, 14] = labelencoder_X14.fit_transform(X[:, 14])

labelencoder_X15 = LabelEncoder()
X[:, 16] = labelencoder_X14.fit_transform(X[:, 16])

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values="NaN", strategy='mean')
imputer.fit(X[:, 1:24])  
X[:, 1:24]=imputer.transform(X[:, 1:24])

感谢您的帮助！

【问题讨论】：

此列的类型是什么？ @Siddhant Ranjan 一个字符串，如图所示。列中 NAN 中的问题，我不知道如何用字符串估算列你必须在列中填充nan值，然后执行标签编码，它也可以是对象类型。它并不总是看起来像。 【参考方案1】：

当在带有字符串的列中包含 NaN 值时，通常会发生此错误。 NaN 是 float 类型，这就是你得到的原因：

TypeError: '<' not supported between instances of 'str' and 'float'

您应该先替换缺失值。一种方法：

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Making a list of missing value types
missing_values = ["?"]
df = pd.read_csv('D:\\data.csv', na_values=missing_values)


X = df.iloc[:, :-1]
y = df.iloc[:, 24]

X.iloc[:, 4] = X.iloc[:, 4].fillna('NaN') # <-- add this line

X.iloc[:, 4] = LabelEncoder().fit_transform(X.iloc[:, 4])

现在标签编码应该不会再引起任何问题了。您必须用字符串替换所有列。

【讨论】：

谢谢！另一种方法是使用 most_frequent 的策略和插补。它也适用于字符串！是的，有很多方法可以处理缺失值。正如答案中所指定的，这只是其中的一个 ;) 这里的好处是，除了已有的工具之外，您不需要任何额外的工具。否则，当然有像SimpleImputer 这样的选项。

以上是关于TypeError：参数必须是字符串或数字列上的字符串或数字的主要内容，如果未能解决你的问题，请参考以下文章