从表格数据训练深度学习模型时,损失总是 nan
Posted
技术标签:
【中文标题】从表格数据训练深度学习模型时,损失总是 nan【英文标题】:Loss is always nan when training a deep learning model from tabular data 【发布时间】:2021-07-18 20:37:58 【问题描述】:我正在尝试从包含大约数千个条目的数据集中训练一个模型,该数据集具有 51 个数字特征和一个标记列,示例:
在训练模型以预测 3 个标签(候选、误报、确认)时,损失始终为 nan,并且准确度在特定值上稳定得非常快。 代码:
import tensorflow as tf
import numpy as np
import pandas as pd
import sklearn.preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler, RobustScaler
from sklearn.preprocessing import OrdinalEncoder
from tensorflow import optimizers
from tensorflow.python.keras.layers import Dense, Dropout, Normalization
from tensorflow.python.keras.models import Sequential, Model
def load_dataset(data_folder_csv):
# load the dataset as a pandas DataFrame
data = pd.read_csv(data_folder_csv, header=0)
# retrieve numpy array
dataset = data.values
# split into input (X) and output (y) variables
X = dataset[:, :-1]
y = dataset[:, -1]
print(y)
# format all fields as floats
X = X.astype(np.float)
# reshape the output variable to be one column (e.g. a 2D shape)
y = y.reshape((len(y), 1))
return X, y
# prepare input data using min/max scaler.
def prepare_inputs(X_train, X_test):
oe = RobustScaler().fit_transform(X_train)
X_train_enc = oe.transform(X_train)
X_test_enc = oe.transform(X_test)
return X_train_enc, X_test_enc
# prepare target
def prepare_targets(y_train, y_test):
le = LabelEncoder()
ohe = OneHotEncoder()
le.fit(y_train)
le.fit(y_test)
y_train_enc = ohe.fit_transform(y_train).toarray()
y_test_enc = ohe.fit_transform(y_test).toarray()
return y_train_enc, y_test_enc
X, y = load_dataset("csv_ready.csv")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
print('Train', X_train.shape, y_train.shape)
print('Test', X_test.shape, y_test.shape)
X_train_enc, X_test_enc = X_train, X_test
print('Finished preparing inputs.'
# prepare output data
y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
norm_layer = Normalization()
norm_layer.adapt(X)
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation="tanh", kernel_initializer='he_normal'))
model.add(Dropout(0.2))
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(3, activation='sigmoid'))
opt = optimizers.Adam(lr=0.01, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train_enc, epochs=20, batch_size=128, verbose=1, use_multiprocessing=True)
_, accuracy = model.evaluate(X_test, y_test_enc, verbose=0)
print('Accuracy: %.2f' % (accuracy * 100))
我尝试增加/减少学习率,更改优化器,减少和增加神经元和层的数量,以及使用批量大小,但似乎没有任何东西可以让模型获得好的结果。我想我在这里遗漏了一些东西,但不能指望它。 结果示例:
编辑:来自 csv 的更多行:
EDIT2:也尝试了 l2 正则化,但没有做任何事情。
【问题讨论】:
是梯度爆炸的问题。再次检查您的模型和损失指标 【参考方案1】:其中一个原因:
检查您的数据集是否有 NaN
值。 NaN
值可能会在学习时给模型带来问题。
您的代码中的一些主要错误:
对于具有 3 个神经元的输出层,您正在使用sigmoid
激活函数而不是 softmax
您在使用错误的编码器时同时拟合训练集和测试集。您应该将fit_transform
用于您的训练数据,并且仅将transform
用于测试集
另外,您对所有层都使用了错误的输入,只有第一层应该接受输入张量。
您忘记为X_train
和X_test
使用prepare_inputs
函数
你的模型应该适合X_train_enc
而不是X_train
改用这个
import tensorflow as tf
import numpy as np
import pandas as pd
import sklearn.preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.preprocessing import OrdinalEncoder
from tensorflow import optimizers
from tensorflow.python.keras.layers import Dense, Dropout, Normalization
from tensorflow.python.keras.models import Sequential, Model
def load_dataset(data_folder_csv):
# load the dataset as a pandas DataFrame
data = pd.read_csv(data_folder_csv, header=0)
# retrieve numpy array
dataset = data.values
# split into input (X) and output (y) variables
X = dataset[:, :-1]
y = dataset[:, -1]
print(y)
# format all fields as floats
X = X.astype(np.float)
# reshape the output variable to be one column (e.g. a 2D shape)
y = y.reshape((len(y), 1))
return X, y
# prepare input data using min/max scaler.
def prepare_inputs(X_train, X_test):
oe = MinMaxScaler()
X_train_enc = oe.fit_transform(X_train)
X_test_enc = oe.transform(X_test)
return X_train_enc, X_test_enc
# prepare target
def prepare_targets(y_train, y_test):
le = LabelEncoder()
ohe = OneHotEncoder()
y_train = le.fit_transform(y_train)
y_test = le.transform(y_test)
y_train_enc = ohe.fit_transform(y_train).toarray()
y_test_enc = ohe.transform(y_test).toarray()
return y_train_enc, y_test_enc
X, y = load_dataset("csv_ready.csv")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
print('Train', X_train.shape, y_train.shape)
print('Test', X_test.shape, y_test.shape)
#prepare_input function missing here
X_train_enc, X_test_enc = prepare_inputs(X_train, X_test)
print('Finished preparing inputs.')
# prepare output data
y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation='softmax'))
#opt = optimizers.Adam(lr=0.01, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train_enc, y_train_enc, epochs=20, batch_size=32, verbose=1, use_multiprocessing=True)
_, accuracy = model.evaluate(X_test_enc, y_test_enc, verbose=0)
print('Accuracy: %.2f' % (accuracy * 100))
【讨论】:
我尝试了下面的代码,但仍然得到 'nan' 损失和精确值 0.2444,和以前一样。 @Zedler 你能用 MinMaxScaler 来试试这个代码吗?检查我的更新答案 还是一模一样的结果 @Zedler 你的数据集在标准化后有 NaN 值吗?这可能是原因之一。检查你所有的训练和测试数据是否有 NaN。看到这个***.com/questions/55431690/… 成功了,有些值有 Nan 并且必须对其进行标准化,非常感谢!【参考方案2】:您想将模型定义更改为:
model = Sequential()
model.add(Dense(128, input_shape=X_train.shape[1:], activation="tanh", kernel_initializer='he_normal'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(3, activation='softmax'))
您只需要为第一层定义输入形状,Keras 会自动为后续层确定合适的形状。在定义 input_shape 时省略了批大小,这是第一个维度,因此是 input_shape=X_train.shape[1:]
。
sigmoid
激活实际上会起作用(因为输出将在 0 和 1 之间变化),但您真正想要的是 softmax
激活(确保所有输出总和为 1,这是概率决定的-- 某事发生的概率是 100%,而不是 sigmoid
最终可能给你的 120%。
另外,您没有在任何地方使用您的LabelEncoder
。我想你的意思是这样的:
def prepare_targets(y_train, y_test):
le = LabelEncoder()
ohe = OneHotEncoder()
# teach the label encoder our labels
le.fit(y_train)
# turn our strings into integers
y_train_transformed = le.transform(y_train)
y_test_transformed = le.transform(y_test)
# turn our integers into one-hot-encoded arrays
y_train_enc = ohe.fit_transform(y_train_transformed).toarray()
y_test_enc = ohe.transform(y_test_transformed).toarray()
return y_train_enc, y_test_enc
【讨论】:
仍然给我相同的 'nan' 和准确度值,但是是的,softmax 是要走的路。 您能否将data_ready.csv
的前几行添加到您的问题中?在我看到输入数据之前,我没有想法。
用 csv 中的更多行编辑了问题:]以上是关于从表格数据训练深度学习模型时,损失总是 nan的主要内容,如果未能解决你的问题,请参考以下文章
深度学习中损失值(loss值)为nan(以tensorflow为例)