ValueError:X每个样本具有231个特征;期待1228
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ValueError:X每个样本具有231个特征;期待1228相关的知识,希望对你有一定的参考价值。
这里是训练模型的脚本的顶部(我在使用Logistic回归):
data_raw = pd.read_sql(sql,cnxn)
pd.Series(data_raw.columns)
pd.Series(data_raw.dtypes)
data_raw.describe(include='all')
data_raw['collision_type'] = data_raw.loc[0:, 'collision_type'].replace('?', 'Unknown')
data_raw['property_damage'] = data_raw.loc[0:, 'property_damage'].replace('?', 'Unknown')
data_raw.isnull().sum()
dropping_columns = ['months_as_customer', 'policy_bind_date', 'age', 'policy_number', 'policy_annual_premium', 'insured_zip',
'capital_gains', 'capital_loss', 'total_claim_amount', 'injury_claim', 'property_claim', 'vehicle_claim',
'auto_year']
data_cleaned = data_raw.drop(dropping_columns, axis=1)
data_preprocessed = pd.get_dummies(data_cleaned, drop_first=True)
targets = data_preprocessed['fraud_reported_Y']
features = data_preprocessed.drop(['fraud_reported_Y'], axis=1)
x_train, x_test, y_train, y_test = train_test_split(features, targets, test_size=0.2, random_state=420)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)
y_pred = logreg.predict(x_test)
现在,我正在尝试对测试输入(从SQL表导入的测试数据集)进行预测:
test = df['TestTable']
test = test[0]
sql = 'SELECT * FROM '+ test
test_raw = pd.read_sql(sql,cnxn)
#sample_rows = test_raw.sample(n=5)
test_raw.describe(include='all')
test_raw['collision_type'] = data_raw.loc[0:, 'collision_type'].replace('?', 'Unknown')
test_raw['property_damage'] = data_raw.loc[0:, 'property_damage'].replace('?', 'Unknown')
test_raw.isnull().sum()
print(test_raw.shape)
test_dropped = test_raw.drop(dropping_columns, axis=1)
test_preprocessed = pd.get_dummies(test_dropped, drop_first=True)
logreg = LogisticRegression()
logreg.fit(x_train, y_train)
test_predicted = logreg.predict(test_preprocessed)
这是我得到的错误:
Traceback (most recent call last):
File "<ipython-input-149-e6d470e94433>", line 1, in <module>
runfile('C:/Users/BusinessUser/Downloads/insurance_claim_fraud_detection-master/insurance_claim_fraud_detection.py', wdir='C:/Users/BusinessUser/Downloads/insurance_claim_fraud_detection-master')
File "C:UsersBusinessUserAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:UsersBusinessUserAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/BusinessUser/Downloads/insurance_claim_fraud_detection-master/insurance_claim_fraud_detection.py", line 402, in <module>
test_predicted = logreg.predict(test_preprocessed)
File "C:UsersBusinessUserAnaconda3libsite-packagessklearnlinear_modelase.py", line 289, in predict
scores = self.decision_function(X)
File "C:UsersBusinessUserAnaconda3libsite-packagessklearnlinear_modelase.py", line 270, in decision_function
% (X.shape[1], n_features))
ValueError: X has 231 features per sample; expecting 1228
我的火车数据集有999行,带有最终预测结果列,而测试数据集有50行,没有预测结果列。其他列基本相同。
我是一个新手,我很确定对于这种模型训练我尚不了解这样的基本知识。非常感谢你们帮助我。
答案
[使用功能test_preprocessed
,检查用于预测(x_train
)的数据的列数(特征)与用于训练/测试(x_test
,shape
)的数据的列数相同)例如len(test_preprocessed.columns)
。
以上是关于ValueError:X每个样本具有231个特征;期待1228的主要内容,如果未能解决你的问题,请参考以下文章
ValueError:X 每个样本有 29 个特征;期待 84
ValueError:发现样本数量不一致的输入变量:[1, 74]
ValueError:找到具有 0 个样本 (s) 的数组(形状 = (0, 1),而 MinMaxScaler 要求最小值为 1
ValueError: 找到具有 0 个特征的数组 (shape=(2698, 0)),而 MinMaxScaler 要求最小值为 1