我使用 pandas 将文件转换为数据框,现在我想通过 TensorFlow 训练深度学习模型。我没有成功训练模型:划分训练集和测试集后,当我去编译模型时它告诉我

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type 

我认为问题在于 numpy 数组的大小不同,但尽管执行了填充(这样所有数组在列内都具有相同的维度),但问题并没有解决。 下面我插入一个我在数据集中拥有的列的示例:如果我想将其转换为张量,我应该怎么做?

df = pd.read_parquet('example.parquet')

0                            [0, 1, 1, 1, 0, 1, 0, 1, 0]
1          [0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0]
2          [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
3                      [0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1]
4                   [0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
115                          [0, 1, 0, 0, 1, 1, 1, 1, 1]
116    [0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, ...
117     [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1]
118    [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, ...
119                    [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1]



from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)
#create model
model = Sequential()

#add model layers
model.add(Dense(20, activation='softmax', input_shape=(X_train.shape)))

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)


ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray). 
object type numpy.ndarray).
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).


请更新完整的回溯。 我用完整的回溯更新了第一条消息 会显示 x_train 的形状,似乎数据类型与某些对象不匹配,请尝试X.astype(np.float32) 我遇到了这个错误 ValueError: setting an array element with a sequence。 -------------------------------------------------- ------------------------- TypeError Traceback (最近一次调用最后一次) TypeError: only size-1 arrays can be convert to Python scalars 我的 X 的形状是 (120,3),我的 X_train 的形状是 (84,3) 【参考方案1】:


import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.preprocessing import LabelEncoder
#Import Dataset
df = pd.read_parquet('toydataset.parquet')
#Selected only one column of the dataset
X = df['column1']
Y= df['label']


0                            [0, 1, 1, 1, 0, 1, 0, 1, 0]
1          [0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0]
2          [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
3                      [0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1]
4                   [0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
115                          [0, 1, 0, 0, 1, 1, 1, 1, 1]
116    [0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, ...
117     [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1]
118    [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, ...
119                    [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1]
Name: column1, Length: 120, dtype: object


#First encode the label
label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
#Padding the column
for i in range (0, len(df['column1'])):
    pad_size = 13040-len(df['column1'][i])
    df['column1'][i] = np.pad(df['column1'][i], (pad_size, 0))


0      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
4      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
115    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
116    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
117    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
118    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
119    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
Name: column1, Length: 120, dtype: object


    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)

我还提供了 X_train 的结果

30     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
53     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
118    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
9      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
33     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
106    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
14     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
92     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
51     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
102    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
Name: column1, Length: 84, dtype: object


model = Sequential()
model.add(Dense(20, activation='softmax', input_shape=(84,)))
#I tried with input shape (84,) but also with other numbers but the result is always the same
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, batch_size=32)


    ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).


import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Conv1D, Flatten
import numpy as np
from sklearn.preprocessing import LabelEncoder
df = pd.read_parquet('toydataset.parquet')
Y= df['label']
label_encoder = LabelEncoder()
Y = label_encoder.fit_transform(Y)
#New Padding
for i in range (0, len(df['column1'])):
    pad_size = 13040 - len(df['column1'][i])
    df['column1'][i] = np.pad(df['column1'][i], (pad_size, 0))

final_array = np.array([np.array(i) for i in df['column1']])
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)
#create model
model = Sequential()

#add model layers
model.add(Dense(20, activation='softmax', input_shape=(13040,)))

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)


     ValueError: Shapes (None, 1) and (None, 20) are incompatible
    ValueError: Shapes (None, 1) and (None, 20) are incompatible


非常感谢。我已经解决了张量转换问题。当我去创建模型时,我应该把什么作为 input_shape? X_train.shape 是 (84, 13040),X_test.shape 是 (36, 13040),y_train.shape 是 (84, _ ),y_test.shape 是 (36, _ )。我尝试使用 input_shape = (84, _ ) 或 (84, 13040) 但它在第一个时期给了我错误 model.add(Dense(20, activation='softmax', input_shape=(13040,))) 此外,您不会得到任何好的结果,因为您的输入非常大,但您没有那么多样本。 非常感谢您为我提供的帮助。显然,这是一个了解训练如何工作的示例,我知道我需要更多样本。第一个epoch还是有问题,我更新了之前的回答。【参考方案2】:

我假设您目前正在处理填充数据。所以现在在填充数据之后,你要做Scaling。完成此操作后,您的 X 形状分别为 (120,3) 和 (84,3) 用于训练和测试。


model.add(Dense(20, activation='softmax', input_shape=(X_train.shape)))

您没有在input_shape 中指定batch 的维度。用更简单的方式说你正在给模型提供图像,那么在 1 通道图像的情况下你会在input_shape 中写什么?如下所示。

height = 224
width = 224
model.add(Dense(20, activation='softmax', input_shape=(height, width)))

# In your case you have written
model.add(Dense(20, activation='softmax', input_shape=(120, 3)))

这告诉模型,对应于形状 (120,3) 的每个输入,有一些标签不是这种情况,因此您应该只传递如下特征的维度

model.add(Dense(20, activation='softmax', input_shape=(3,)))

在此之后,错误应该被删除。另外,我没有看到你在model.fit 中使用batch_size 参数,你应该使用它。


#create model
model = Sequential()
#add model layers
model.add(BatchNormalization()) # RED FLAG
model.add(Dense(20, activation='softmax', input_shape=(X_train.shape)))

您不应该在输入上使用BatchNormalization。使用BatchNormalization 的主要原因是提高模型的训练速度,即使这样也不是在输入上。此外,需要注意的重要一点是,BatchNormalization 在训练批次上是 Normalization,而不是在整个数据集上,因此如果您不使用可以代表整个人口的大批量大小,则几乎没有用处。

更新: 您没有正确填充。填充后 X.shape 的输出应该是 ( _ , _ ) 而不是 ( _ , )。所以,请执行以下操作

# Creating some random data
random_array = []
for i in range(20):
    random_array.append([i for i in range(i+1)])

df = pd.DataFrame()
df['values'] = random_array

for i in range (0, len(df['values'])):
    pad_size = 21 - len(df['values'][i])
    df['values'][i] = np.pad(df['values'][i], (pad_size, 0))

final_array = np.array([np.array(i) for i in df['values']])
print(final_array.shape) # This will give (20, 21) and not (20,)



