Python Keras ValueError:数据基数不明确

Posted

技术标签:

【中文标题】Python Keras ValueError:数据基数不明确【英文标题】:Python Keras ValueError: Data cardinality is ambiguous 【发布时间】:2021-12-20 09:15:03 【问题描述】:

我正在尝试创建一个可以预测信用卡交易是否欺诈的模型。我的数据集可用on Kaggle。一切正常,直到我适合我的模型,当我得到这个错误时:

ValueError: Data cardinality is ambiguous:
  x sizes: 7433462
  y sizes: 284807
Make sure all arrays contain the same number of samples.

有人可以帮我找出问题所在吗?

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

data = pd.read_csv("creditcard.csv")
trainSamples = data['Class']
labels = ['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',  'V12', 'V13', 'V14', 'V15', 'V16',  'V17', 'V18', 'V19', 'V20', 'V21',  'V22',  'V23',  'V24',  'V25',  'V26',  'V27',  'V28',  'Amount']
trainSamples = data[labels]

trainLabels = np.array(trainLabels)
trainSamples = np.array(trainSamples)

trainLabels = shuffle(trainLabels)
trainSamples = shuffle(trainSamples)

scaler = MinMaxScaler(feature_range = (0, 1))
scaledTrainSample = scaler.fit_transform(trainSamples.reshape(-1,1))

model = Sequential([
    Dense(units = 16, input_shape = (1, ), activation = 'relu'),
    Dense(units = 32, activation = 'relu'),    
    Dense(units = 2, activation = 'softmax')
])

model.compile(optimizer = Adam(learning_rate = 0.0001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.fit(x = scaledTrainSample, y = trainLabels, validation_split = 0.1, batch_size = 10, epochs = 300, verbose = 2)

【问题讨论】:

【参考方案1】:

您的代码的主要问题是模型的输入形状应该是 30 而不是 1,因为您有 30 个特征,而输出形状应该是 1 而不是 2,因为您只有一个二进制标签(即只有两个类, 0 或 1)。下面的代码中还纠正了一些其他错误。

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler
tf.random.set_seed(0)

# import the data
df = pd.read_csv('creditcard.csv')

# extract the features and target
X = df.drop(labels=['Class'], axis=1).values
y = df['Class'].values

# count the number of classes
print(np.unique(y))
# [0 1]

# shuffle the data
X, y = shuffle(X, y, random_state=42)

# scale the features
scaler = MinMaxScaler(feature_range=(0, 1))
X = scaler.fit_transform(X)

# build the model
model = Sequential([
    Dense(units=16, activation='relu', input_shape=(X.shape[1], )),
    Dense(units=32, activation='relu'),
    Dense(units=1, activation='sigmoid')
])

# fit the model
model.compile(optimizer=Adam(learning_rate=0.0001), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x=X, y=y, validation_split=0.1, batch_size=256, epochs=3)
# Epoch 1/3
# 1002/1002 [==============================] - 1s 761us/step - loss: 0.1787 - accuracy: 0.9983 - val_loss: 0.0193 - val_accuracy: 0.9981
# Epoch 2/3
# 1002/1002 [==============================] - 1s 684us/step - loss: 0.0136 - accuracy: 0.9983 - val_loss: 0.0130 - val_accuracy: 0.9981
# Epoch 3/3
# 1002/1002 [==============================] - 1s 680us/step - loss: 0.0119 - accuracy: 0.9983 - val_loss: 0.0127 - val_accuracy: 0.9981

【讨论】:

以上是关于Python Keras ValueError:数据基数不明确的主要内容,如果未能解决你的问题,请参考以下文章

如何在 keras lambda 层中使用 tf.py_func 来包装 python 代码。 ValueError:应定义 Dense 输入的最后一个维度。没有找到

Python keras:多标签值的 to_categorical 给出 ValueError: invalid literal for int() with base 10

Python | Keras:ValueError:检查目标时出错:预期conv2d_3有4个维度,但得到了有形状的数组(1006,5)

ValueError:尝试计算 ROC 曲线时输入形状错误 (2, 256, 3)

ValueError:没有为任何变量提供梯度 - Tensorflow 2.0/Keras

Keras:ValueError:检查输入时出错