是否可以在用于更传统数据集 (CIFAR-10/MNIST) 的 CNN 模型中使用高光谱 1x1 像素的集合？

Posted 2023-02-16

技术标签:

【中文标题】是否可以在用于更传统数据集 (CIFAR-10/MNIST) 的 CNN 模型中使用高光谱 1x1 像素的集合？【英文标题】：Is it possible to use a collection of hyperspectral 1x1 pixels in a CNN model purposed for more conventional datasets (CIFAR-10/MNIST)? 【发布时间】：2022-01-10 13:10:00 【问题描述】：

我在 Keras/Tensorflow 中创建了一个有效的 CNN 模型，并成功地使用 CIFAR-10 和 MNIST 数据集来测试这个模型。运行代码如下：

import keras
from keras.datasets import cifar10
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Conv2D, Flatten, MaxPooling2D
from keras.layers.normalization import BatchNormalization

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

#reshape data to fit model
X_train = X_train.reshape(50000,32,32,3)
X_test = X_test.reshape(10000,32,32,3)

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)


# Building the model 

#1st Convolutional Layer
model.add(Conv2D(filters=64, input_shape=(32,32,3), kernel_size=(11,11), strides=(4,4), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#2nd Convolutional Layer
model.add(Conv2D(filters=224, kernel_size=(5, 5), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

#3rd Convolutional Layer
model.add(Conv2D(filters=288, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

#4th Convolutional Layer
model.add(Conv2D(filters=288, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

#5th Convolutional Layer
model.add(Conv2D(filters=160, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))

model.add(Flatten())

# 1st Fully Connected Layer
model.add(Dense(4096, input_shape=(32,32,3,)))
model.add(BatchNormalization())
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))

#2nd Fully Connected Layer
model.add(Dense(4096))
model.add(BatchNormalization())
model.add(Activation('relu'))
#Add Dropout
model.add(Dropout(0.4))

#3rd Fully Connected Layer
model.add(Dense(1000))
model.add(BatchNormalization())
model.add(Activation('relu'))
#Add Dropout
model.add(Dropout(0.4))

#Output Layer
model.add(Dense(10))
model.add(BatchNormalization())
model.add(Activation('softmax'))


#compile model using accuracy to measure model performance
opt = keras.optimizers.Adam(learning_rate = 0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', 
              metrics=['accuracy'])


#train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=30)

从使用上述数据集之后的这一点开始，我想更进一步，使用通道数多于灰度或 rgb 的数据集，因此包含了高光谱数据集。在寻找高光谱数据集时，我遇到了this 一个。

这个阶段的问题是意识到这个高光谱数据集是一张图像，地面实况中的每个值都与每个像素相关。在这个阶段，我将其中的数据重新格式化为高光谱数据/像素的集合。

为 x_train 和 x_test 重新格式化更正数据集的代码：

import keras
import scipy
import numpy as np
import matplotlib.pyplot as plt
from keras.utils import to_categorical
from scipy import io

mydict = scipy.io.loadmat('Indian_pines_corrected.mat')
dataset = np.array(mydict.get('indian_pines_corrected'))


#This is creating the split between x_train and x_test from the original dataset 
# x_train after this code runs will have a shape of (121, 145, 200) 
# x_test after this code runs will have a shape of (24, 145, 200)
x_train = np.zeros((121,145,200), dtype=np.int)
x_test = np.zeros((24,145,200), dtype=np.int)    

xtemp = np.array_split(dataset, [121])
x_train = np.array(xtemp[0])
x_test = np.array(xtemp[1])

# x_train will have a shape of (17545, 200) 
# x_test will have a shape of (3480, 200)
x_train = x_train.reshape(-1, x_train.shape[-1])
x_test = x_test.reshape(-1, x_test.shape[-1])

为 Y_train 和 Y_test 重新格式化地面实况数据集的代码：

truthDataset = scipy.io.loadmat('Indian_pines_gt.mat')
gTruth = truthDataset.get('indian_pines_gt')

#This is creating the split between Y_train and Y_test from the original dataset 
# Y_train after this code runs will have a shape of (121, 145) 
# Y_test after this code runs will have a shape of (24, 145)

Y_train = np.zeros((121,145), dtype=np.int)
Y_test = np.zeros((24,145), dtype=np.int)    

ytemp = np.array_split(gTruth, [121])
Y_train = np.array(ytemp[0])
Y_test = np.array(ytemp[1])

# Y_train will have a shape of (17545) 
# Y_test will have a shape of (3480)
Y_train = Y_train.reshape(-1)
Y_test = Y_test.reshape(-1)


#17 binary categories ranging from 0-16

#Y_train one-hot encode target column
Y_train = to_categorical(Y_train)

#Y_test one-hot encode target column
Y_test = to_categorical(Y_test, num_classes = 17)

我的想法是，尽管初始图像被分解为 1x1 块，但每个块拥有的大量通道及其各自的值将有助于数据集的分类。

基本上我想将这些重新格式化的数据输入到我的模型中（见这篇文章的第一个代码片段），但是由于我在这方面缺乏经验，我不确定我是否采取了错误的方法专业知识。我希望输入一个 (1,1,200) 的形状，即 x_train 和 x_test 的形状分别是 (17545,1,1,200) 和 (3480,1,1,200)。

【问题讨论】：

【参考方案1】：

如果将高光谱数据集作为具有许多通道的大图像提供给您，我认为每个像素的分类应该取决于它周围的像素（否则我不会将数据格式化为图像，即没有网格结构）。鉴于此假设，将输入图片分成 1x1 部分并不是一个好主意，因为您会丢失网格结构。

我进一步假设通道的顺序是任意的，这意味着通道上的卷积可能没有意义（但无论如何您并不打算这样做）。

您可能希望创建一个模型，该模型将图像作为输入并输出包含每个像素分类的“图像”，而不是按照您的方式重新格式化数据。 IE。如果您有 10 个类并将 (145, 145, 200) 图像作为输入，您的模型将输出 (145, 145, 10) 图像。在该架构中，您将没有任何完全连接的层。您的输出层也将是一个卷积层。

但是，这意味着您将无法保留当前的架构。这是因为 MNIST/CIFAR10 和您的高光谱数据集的任务不同。对于 MNIST/CIFAR10，您希望对图像的整体进行分类，而对于其他数据集，您希望为每个像素分配一个类别（同时很可能还使用每个像素周围的像素）。

一些进一步的想法：

如果您想将高光谱数据集上的像素分类任务转换为整个图像的分类任务，也许您可以将该任务重新表述为“将高光谱图像分类为其中心（或左上角，或右下角，或（第 21、第 104）或其他）像素”。为了从您的单个高光谱图像中获取数据，对于每个像素，我会移动图像以使目标像素位于所需位置（例如中心）。所有“脱离”边界的像素都可以插入图像的另一侧。如果您想坚持像素分类任务但需要更多数据，可以将您拥有的单个高光谱图像拆分为许多较小的图像（例如 10x10x200）。您甚至可能想要使用许多不同尺寸的图像。如果您的模型只有卷积层和池化层，并且您确保保持图像的大小，那应该可以解决。

【讨论】：

【参考方案2】：

首先，假设您使用的高光谱图像是针对semantic segmentation 问题而不是分类问题。

如果我们看看什么是神经网络中的卷积层，它不太可能工作得很好。它可能有效，但可能有更好的方法。

让我们看看这个 2D 卷积动画（Michael Plotke 在CC-BY-SA 3.0 下授权）：

我们可以看到，2D 卷积操作的核心就像是对图像的某个区域应用一定大小的过滤器，然后对图像的所有区域重复此操作。在尝试学习/查找空间特征时，2D 卷积通常用于神经网络：即相邻像素之间的关系。

摘自CS231n - Convolutional Networks

当我们在输入体积的宽度和高度上滑动过滤器时，我们将生成一个二维激活图，该图给出了该过滤器在每个空间位置的响应。直观地说，网络将学习过滤器，当它们看到某种类型的视觉特征时激活过滤器，例如第一层上某个方向的边缘或某种颜色的斑点，或者最终在网络的较高层上出现整个蜂窝或轮状图案.

通过使用大小为 1x1 的小块，您基本上剥离了数据的空间维度。在这种情况下应用 2D 卷积没有太大意义。（特别是考虑到该架构中使用的过滤器的大小，例如第一层中的 11x11）。

建议的方法：

寻找一个更大的数据集，其中包含多个用于分类的图像：这可能是要走的路。在数据驱动的问题中，最重要的部分是数据。如果对该图像的区域进行分类对您很重要，您可以在光谱数据像素上使用更简单的网络架构和/或机器学习技术。这可能有效，但您仍然会丢失相邻像素之间的空间关系。

【讨论】：

欣赏详细的回复，包括语义分割的想法以及 2d 卷积的理论分解，这有助于我理解我在这里没有意识到的问题。在您建议的方法中，我更倾向于采用第一种方法 - 我之前曾尝试获取更大的高光谱数据集但没有成功，我需要重新审视这一点，因为我想将建议的架构保留在我的模型中。跨度>

以上是关于是否可以在用于更传统数据集 (CIFAR-10/MNIST) 的 CNN 模型中使用高光谱 1x1 像素的集合？的主要内容，如果未能解决你的问题，请参考以下文章