如何重塑3通道数据集以输入神经网络

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何重塑3通道数据集以输入神经网络相关的知识,希望对你有一定的参考价值。

我正在尝试将第k个动作数据集提供给cnn。我在重塑数据方面遇到了困难。我创建了这个数组(99,75,120,160)type = uint8 ie,属于一个类的99个视频,每个视频有75帧,每帧120x160维度。

model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'), 
                          input_shape=())) 
###need to reshape data in input_shape

我应该先指定密集层吗?

这是我的代码

model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same'), 
                          input_shape=(75,120,160)))
###need to reshape data in input_shape

model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=64, return_sequences=True))

model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(1, (3,3), padding='same')))

model.compile(optimizer='adam', loss='mse')

data = np.load(r"C:\Users\shj_k\Desktop\Project\handclapping.npy")
print (data.shape)
(x_train,x_test) = train_test_split(data)


x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.




print (x_train.shape)
print (x_test.shape)


model.fit(x_train, x_train,
                epochs=100,
                batch_size=1,
                shuffle=False,
                validation_data=(x_test, x_test))

变量是x_test(25,75,120,160)type = float32 x_train(74,75,120,160)type = float32

评论中的完整错误是

runfile('C:/Users/shj_k/Desktop/Project/cnn_lstm.py',wdir ='C:/ Users / shj_k / Desktop / Project')(99,75,120,160)(74,75,120, 160)(25,75,120,160)回溯(最近一次呼叫最后):

文件“”,第1行,在runfile中('C:/Users/shj_k/Desktop/Project/cnn_lstm.py',wdir ='C:/ Users / shj_k / Desktop / Project')

文件“C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py”,第668行,在runfile execfile(filename,namespace)中

文件“C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ spyder_kernels \ customize \ spydercustomize.py”,第108行,execfile exec(compile(f.read(),filename,'e​​xec'),namespace)

文件“C:/Users/shj_k/Desktop/Project/cnn_lstm.py”,第63行,在validation_data =(x_test,x_test))

文件“C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py”,第952行,适合batch_size = batch_size)

文件“C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training.py”,第751行,_standardize_user_data exception_prefix ='input')

文件“C:\ Users \ shj_k \ Anaconda3 \ lib \ site-packages \ keras \ engine \ training_utils.py”,第128行,在standardize_input_data'中,形状为'+ str(data_shape))

ValueError:检查输入时出错:预期time_distributed_403_input有5个维度,但得到的数组有形状(74,75,120,160)

谢谢你的回复

答案

有几件事:

Keras中的TimeDistributed层需要一个时间维度,因此对于视频图像处理,这里可能是75(帧)。

它还希望图像以形状发送(120,60,3)。所以TimeDistributed层input_shape应该是(75,120,160,3)。 3代表RGB通道。如果您有灰度图像,则最后一个尺寸应为1。

在您的情况99中,input_shape始终忽略示例的“行”维度。

要检查模型的每个图层创建的输出形状,请在编译后放置model.summary()

见:https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed

您可以使用Keras.preprocessing.image将图像转换为具有形状(X,Y,3)的numpy数组。

from keras.preprocessing import image

# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)

更新:似乎你必须使所有图像平方(128,128,1)的原因是在model.fit()中,训练样例(x_train)和标签(通常是y_train)是相同的集合。如果你看下面的模型摘要,在Flatten层之后,一切都变成了正方形。因此,期望标签是正方形。这是有道理的:使用此模型进行预测会将(120,160,1)图像转换为某种形状(128,128,1)。因此,将模型培训改为以下代码应该有效:

x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64)   320       
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64)     0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32)     18464     
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32)     0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16)     4624      
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16)     0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 5, 64)             1245440   
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1)        0         
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1)      0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16)     160       
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16)     0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32)     4640      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32)     0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64)     18496     
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64)   0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, 5, 128, 128, 1)    577       
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0

更新2:要使其在不更改y的情况下使用非方形图像,请设置LSTM(300),重塑(15,20,1),然后删除其中一个Conv2D + Upsampling图层。然后即使在自动编码器中也可以使用形状(120,160)的图像。诀窍是查看模型摘要,并确保在LSTM之后以正确的形状开始,以便在添加所有其他图层后,最终结果是(120,160)的形状。

model = Sequential()
model.add(
    TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))

model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))

model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))


model.compile(optimizer='adam', loss='mse')

model.summary()

x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))

model.fit(x_train, y_train)
另一答案

感谢Kai Aeberli先生的帮助。在将图像大小调整为128x128维度之后,我能够运行模型。数据集的大小可能会导致系统在没有gpu的情况下崩溃。根据需要减小尺寸。如果您有疑问,请参阅整个评论部分。你可以在github找到代码here

以上是关于如何重塑3通道数据集以输入神经网络的主要内容,如果未能解决你的问题,请参考以下文章

重塑数据集以正确大小

如何将 numpy 数组重塑为 3 维以输入到卷积层?

Tensorflow 3通道颜色输入顺序

卷积神经网络多输入通道和多输出通道(channels)

如何将图像转换为数据集以进行语义分割

如何修复数据集以返回所需的输出(pytorch)