多类文本分类 TypeError: Input must be a SparseTensor
Posted
技术标签:
【中文标题】多类文本分类 TypeError: Input must be a SparseTensor【英文标题】:Multiclass text classification TypeError: Input must be a SparseTensor 【发布时间】:2021-07-24 07:32:28 【问题描述】:我正在尝试建立一个深度学习模型来进行文本分类。但是,当我运行下面的脚本时,我遇到了这个错误。
InvalidArgumentError: indices[2] = [0,398] is out of order. Many sparse ops require sorted indices. Use `tf.sparse.reorder` to create a correctly ordered copy.
但是,当我尝试使用 tf. sparse. reorder
时,我遇到了这个错误,上面写着 TypeError: Input must be a SparseTensor.
"
这些是输入的维度
X_train_cv1.shape, y_train.shape, X_validation_cv1.shape, y_validation.shape
((13435, 675), (13435, 3), (3359, 675), (3359, 3))
有什么办法可以解决这个问题吗?
# Split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=0.2, random_state=42)
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_y_train = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
y_train= np_utils.to_categorical(encoded_y_train)
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_validation)
encoded_y_validation = encoder.transform(y_validation)
# convert integers to dummy variables (i.e. one hot encoded)
y_validation= np_utils.to_categorical(encoded_y_validation)
# The first document-term matrix has default Count Vectorizer values - counts of bigrams
from sklearn.feature_extraction.text import CountVectorizer
cv1 = CountVectorizer(analyzer='char',ngram_range=(2, 2))
X_train_cv1 = cv1.fit_transform(X_train)
X_validation_cv1 = cv1.transform(X_validation)
input_dim = X_train_cv1.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
X_train_cv1 = tf.sparse.reorder(X_train_cv1)
y_train = tf.sparse.reorder(y_train)
X_validation_cv1 = tf.sparse.reorder(X_validation_cv1)
y_validation = tf.sparse.reorder(y_validation)
history = model.fit(X_train_cv1, y_train,epochs=100,verbose=True,validation_data=(X_validation_cv1, y_validation),batch_size=10)
这是我的数据集
【问题讨论】:
【参考方案1】:好的,我设法找到了答案。 显然 Keras 不能很好地处理稀疏数组,所以我只需将此编辑包含到我的代码行中即可使其成为数组。
X_train_cv1 = cv1.fit_transform(X_train).toarray()
X_validation_cv1 = cv1.transform(X_validation).toarray()
【讨论】:
以上是关于多类文本分类 TypeError: Input must be a SparseTensor的主要内容,如果未能解决你的问题,请参考以下文章
TypeError: Layer input_spec 必须是 InputSpec 的一个实例。得到:InputSpec(shape=(None, 128, 768), ndim=3)