Pytorch vs. Keras:Pytorch 模型严重过拟合
Posted
技术标签:
【中文标题】Pytorch vs. Keras:Pytorch 模型严重过拟合【英文标题】:Pytorch vs. Keras: Pytorch model overfits heavily 【发布时间】:2018-10-09 07:44:16 【问题描述】:现在几天来,我正在尝试使用 pytorch 复制我的 keras 训练结果。无论我做什么,pytorch 模型都会比 keras 中的验证集更早、更强大地过拟合。对于 pytorch,我使用来自 https://github.com/Cadene/pretrained-models.pytorch 的相同 XCeption 代码。
数据加载、扩充、验证、训练计划等都是等效的。我错过了一些明显的东西吗?某个地方一定有一个普遍的问题。我尝试了数千个不同的模块星座,但似乎没有什么比 keras 训练更接近。有人可以帮忙吗?
Keras 模型:验证准确率 > 90%
# base model
base_model = applications.Xception(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# top model
x = base_model.output
x = GlobalMaxPooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(4, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# Compile model
from keras import optimizers
adam = optimizers.Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=adam, metrics=['accuracy'])
# LROnPlateau etc. with equivalent settings as pytorch
Pytorch 模型:val 准确率 ~81%
from xception import xception
import torch.nn.functional as F
# modified from https://github.com/Cadene/pretrained-models.pytorch
class XCeption(nn.Module):
def __init__(self, num_classes):
super(XCeption, self).__init__()
original_model = xception(pretrained="imagenet")
self.features=nn.Sequential(*list(original_model.children())[:-1])
self.last_linear = nn.Sequential(
nn.Linear(original_model.last_linear.in_features, 512),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(512, num_classes)
)
def logits(self, features):
x = F.relu(features)
x = F.adaptive_max_pool2d(x, (1, 1))
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x
def forward(self, input):
x = self.features(input)
x = self.logits(x)
return x
device = torch.device("cuda")
model=XCeption(len(class_names))
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
criterion = nn.CrossEntropyLoss(size_average=False)
optimizer = optim.Adam(model.parameters(), lr=0.0001)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)
非常感谢!
更新: 设置:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)
model = train_model(model, train_loader, val_loader,
criterion, optimizer, scheduler,
batch_size, trainmult=8, valmult=10,
num_epochs=200, epochs_top=0)
清理训练函数:
def train_model(model, train_loader, val_loader, criterion, optimizer, scheduler, batch_size, trainmult=1, valmult=1, num_epochs=None, epochs_top=0):
for epoch in range(num_epochs):
for phase in ['train', 'val']:
running_loss = 0.0
running_acc = 0
total = 0
# Iterate over data.
if phase=="train":
model.train(True) # Set model to training mode
for i in range(trainmult):
for data in train_loader:
# get the inputs
inputs, labels = data
inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
# zero the parameter gradients
optimizer.zero_grad()
# forward
outputs = model(inputs) # notinception
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
loss.backward()
optimizer.step()
# statistics
total += labels.size(0)
running_loss += loss.item()*labels.size(0)
running_acc += torch.sum(preds == labels)
train_loss=(running_loss/total)
train_acc=(running_acc.double()/total)
else:
model.train(False) # Set model to evaluate mode
with torch.no_grad():
for i in range(valmult):
for data in val_loader:
# get the inputs
inputs, labels = data
inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
# zero the parameter gradients
optimizer.zero_grad()
# forward
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels.data)
# statistics
total += labels.size(0)
running_loss += loss.item()*labels.size(0)
running_acc += torch.sum(preds == labels)
val_loss=(running_loss/total)
val_acc=(running_acc.double()/total)
scheduler.step(val_loss)
return model
【问题讨论】:
这只是一个猜测。检查过初始化方法吗?我不确定在 keras 中使用了什么 init,但是使用默认的 init 方法,pytorch 中的权重值会变得非常大——这可能会导致更快的学习。你能检查一下培训记录的差异吗? pytorch 中的 dropout 存在变化,即 Dropout VS Dropout2d。 Dropout 2D 会丢弃整个图像通道,而 Dropout 特定像素。这可能是问题吗?也许 pytorch 数据加载器没有对训练批次进行改组,而 keras 数据加载器会? 我之前只见过model.eval(),没见过mode.train(False)。另外,Kevinj22 有一个很好的观点,你是否在你的 trainloader 中传递了 shuffle=True ? 我认为 ADAM 的默认设置在 Keras 和 PyTorch 之间有所不同! 如果你不训练任何一个模型,你在验证过程中会得到同样的损失吗?不同的损失会暗示不同的架构,并且问题与训练无关。 【参考方案1】:self.features=nn.Sequential(*list(original_model.children())[:-1])
您确定这条线以完全相同的方式重新实例化您的模型吗?您使用的是 NN.Sequential 而不是原始 XCeption 模型的 forward 函数。如果该 forward 函数中的任何内容与使用 nn.Sequential 不完全相同,则它不会重现相同的性能。
你可以改变它,而不是将它包装在一个顺序中,你可以改变它
my_model = Xception()
# load weights before you change the architecture
my_model = load_weights(path_to_weights)
# overwrite the original's last_linear with your own
my_model.last_linear = nn.Sequential(
nn.Linear(original_model.last_linear.in_features, 512),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(512, num_classes)
)
【讨论】:
【参考方案2】:这可能是因为您使用的权重初始化类型 否则这不应该发生 尝试在两个模型中使用相同的初始化程序
【讨论】:
以上是关于Pytorch vs. Keras:Pytorch 模型严重过拟合的主要内容,如果未能解决你的问题,请参考以下文章
Keras vs PyTorch:谁是「第一」深度学习框架?
Keras vs. PyTorch in Transfer Learning
深度学习之 Keras vs Tensorflow vs Pytorch 三种深度学习框架