预期输入 batch_size (18) 与目标 batch_size (6) 匹配
Posted
技术标签:
【中文标题】预期输入 batch_size (18) 与目标 batch_size (6) 匹配【英文标题】:Expected input batch_size (18) to match target batch_size (6) 【发布时间】:2021-04-03 20:43:33 【问题描述】:用于图像分类的 RNN 是否仅适用于灰度图像? 以下程序适用于灰度图像分类。
如果使用RGB图像,我有这个错误:
在这一行loss = criterion(outputs, labels)
。
我的 train、valid 和 test 数据加载如下。
input_size = 300
inputH = 300
inputW = 300
#Data transform (normalization & data augmentation)
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_resize_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
train_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.RandomHorizontalFlip(),
tt.ToTensor(),
tt.Normalize(*stats)])
valid_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
test_tfms = tt.Compose([tt.Resize((inputH, inputW), interpolation=2),
tt.ToTensor(),
tt.Normalize(*stats)])
#Create dataset
train_ds = ImageFolder('./data/train', train_tfms)
valid_ds = ImageFolder('./data/valid', valid_tfms)
test_ds = ImageFolder('./data/test', test_tfms)
from torch.utils.data.dataloader import DataLoader
batch_size = 6
#Training data loader
train_dl = DataLoader(train_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Validation data loader
valid_dl = DataLoader(valid_ds, batch_size, shuffle = True, num_workers = 8, pin_memory=True)
#Test data loader
test_dl = DataLoader(test_ds, 1, shuffle = False, num_workers = 1, pin_memory=True)
我的模型如下。
num_steps = 300
hidden_size = 256 #size of hidden layers
num_classes = 5
num_epochs = 20
learning_rate = 0.001
# Fully connected neural network with one hidden layer
num_layers = 2 # 2 RNN layers are stacked
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)#batch must have first dimension
#our inpyt needs to have shape
#x -> (batch_size, seq, input_size)
self.fc = nn.Linear(hidden_size, num_classes)#this fc is after RNN. So needs the last hidden size of RNN
def forward(self, x):
#according to ducumentation of RNN in pytorch
#rnn needs input, h_0 for inputs at RNN (h_0 is initial hidden state)
#the following one is initial hidden layer
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)#first one is number of layers and second one is batch size
#output has two outputs. The first tensor contains the output features of the hidden last layer for all time steps
#the second one is hidden state f
out, _ = self.rnn(x, h0)
#output has batch_size, num_steps, hidden size
#we need to decode hidden state only the last time step
#out (N, 30, 128)
#Since we need only the last time step
#Out (N, 128)
out = out[:, -1, :] #-1 for last time step, take all for N and 128
out = self.fc(out)
return out
stacked_rnn_model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()#cross entropy has softmax at output
#optimizer = torch.optim.Adam(stacked_rnn_model.parameters(), lr=learning_rate) #optimizer used gradient optimization using Adam
optimizer = torch.optim.SGD(stacked_rnn_model.parameters(), lr=learning_rate)
# Train the model
n_total_steps = len(train_dl)
for epoch in range(num_epochs):
t_losses=[]
for i, (images, labels) in enumerate(train_dl):
# origin shape: [6, 3, 300, 300]
# resized: [6, 300, 300]
images = images.reshape(-1, num_steps, input_size).to(device)
print('images shape')
print(images.shape)
labels = labels.to(device)
# Forward pass
outputs = stacked_rnn_model(images)
print('outputs shape')
print(outputs.shape)
loss = criterion(outputs, labels)
t_losses.append(loss)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
打印图像和输出形状是
images shape
torch.Size([18, 300, 300])
outputs shape
torch.Size([18, 5])
错在哪里?
【问题讨论】:
input_size
的价值是多少,stacked_rnn_model
的定义在哪里?
【参考方案1】:
Tl;dr:您正在展平前两个轴,即batch
和channels
。
我不确定你是否采取了正确的方法,但我会写关于该层的内容。
无论如何,让我们看看您面临的问题。您有一个生成(6, 3, 300, 300)
的数据加载器,即6 批三通道300x300
图像。从外观上看,您希望将每个批处理元素 (3, 300, 300)
重塑为 (step_size=300, -1)
。
但是,您使用images.reshape(-1, num_steps, input_size)
影响了第一个轴-您不应该这样做。这将在处理单通道图像时产生预期的效果,因为dim=1
不会是“通道轴”。在您的情况下,您有 3 个通道,因此,生成的形状为:(6*3*300*300//300//300, 300, 300)
,即(18, 300, 300)
,因为num_steps=300
和input_size=300
。结果,您只剩下 18 个批处理元素,而不是 6。
您想要的是用(batch_size, num_steps, -1)
重塑。留下可变大小的最后一个轴(又名seq_length
)。这将产生一个形状(6, 300, 900)
。
这是一个更正和简化的sn-p:
batch_size = 6
channels = 3
inputH, inputW = 300, 300
train_ds = TensorDataset(torch.rand(100, 3, inputH, inputW), torch.rand(100, 5))
train_dl = DataLoader(train_ds, batch_size)
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
# (batch_size, seq, input_size)
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
# (batch_size, hidden_size)
self.fc = nn.Linear(hidden_size, num_classes)
# (batch_size, num_classes)
def forward(self, x):
out, _ = self.rnn(x)
out = out[:, -1, :]
out = self.fc(out)
return out
num_steps = 300
input_size = inputH*inputW*channels//num_steps
hidden_size = 256
num_classes = 5
num_layers = 2
rnn = RNN(input_size, hidden_size, num_layers, num_classes)
for x, y in train_dl:
print(x.shape, y.shape)
images = images.reshape(batch_size, num_steps, -1)
print(images.shape)
outputs = rnn(images)
print(outputs.shape)
break
正如我在开始时所说的,我对这种方法有点谨慎,因为您实际上是以 300 个扁平向量序列的形式向 RNN 提供 RGB 300x300
图像...我不能说这是否有意义和训练条款,以及模型是否能够从中学习。我可能错了!
【讨论】:
如果图像是灰度的,这有意义吗? 也许确实如此,如果您认为您正在将图像逐行处理为300
大小为300
的向量序列。我不确定 RGB,您必须查看通道在您的序列中混合的位置……它们是否是连续的。在上面的代码中,输入大小为3*300
,即三通道像素“线”,因此最终可以实际工作......您需要尝试训练,看看是否可以得到结果。我希望我能有所帮助!
是的,它确实有很大帮助,但是为什么您在第一维中考虑 6*3*300*300//300//300 这个。可以再解释一次吗?
当然。在您的代码中,您在大小为 (6, 3, 300, 300)
的张量上调用了 reshape(-1, num_steps, input_size)
(总长度:6*3*300*300
)。生成的张量将在dim=1
上具有num_steps
(即300
)和input_size
在dim=2
上(即300
)。在dim=0
你有-1
这意味着“扁平化所有剩余的组件”。剩余组件的大小为total_size / product([size_dim for each other dim])
(伪代码)。对应于6*3*300*300/(300*300)=18
。因此长度在dim=0
上值得18
。
是的,这就是输入大小 900 没有任何意义的原因。现在我明白了,谢谢。以上是关于预期输入 batch_size (18) 与目标 batch_size (6) 匹配的主要内容,如果未能解决你的问题,请参考以下文章
预期的输入batch_size以匹配目标batch_size(11)