如何使用 Pytorch 中的预训练权重修改具有 4 个通道作为输入的 resnet 50?

Posted

技术标签:

【中文标题】如何使用 Pytorch 中的预训练权重修改具有 4 个通道作为输入的 resnet 50?【英文标题】:how to modify resnet 50 with 4 channels as input using pre-trained weights in Pytorch? 【发布时间】:2020-10-19 02:01:29 【问题描述】:

我想更改 resnet50 以便我可以切换到 4 通道输入, 对 rgb 通道使用相同的权重,并使用均值为 0、方差为 0.01 的法线初始化最后一个通道。

这是我的代码:

import torch.nn as nn
import torch
from torchvision import models

from misc.layer import Conv2d, FC

import torch.nn.functional as F
from misc.utils import *

import pdb

class Res50(nn.Module):
    def __init__(self,  pretrained=True):
        super(Res50, self).__init__()

        self.de_pred = nn.Sequential(Conv2d(1024, 128, 1, same_padding=True, NL='relu'),
                                     Conv2d(128, 1, 1, same_padding=True, NL='relu'))
        
        self._initialize_weights()

        res = models.resnet50(pretrained=pretrained)
        pretrained_weights = res.conv1.weight

        res.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,bias=False)

        res.conv1.weight[:,:3,:,:] = pretrained_weights
        res.conv1.weight[:,3,:,:].data.normal_(0.0, std=0.01)
        
        self.frontend = nn.Sequential(
            res.conv1, res.bn1, res.relu, res.maxpool, res.layer1, res.layer2
        )
        
        self.own_reslayer_3 = make_res_layer(Bottleneck, 256, 6, stride=1)        
        self.own_reslayer_3.load_state_dict(res.layer3.state_dict())

        
    def forward(self,x):
        x = self.frontend(x)
        x = self.own_reslayer_3(x)
        x = self.de_pred(x)
        x = F.upsample(x,scale_factor=8)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                m.weight.data.normal_(0.0, std=0.01)
                if m.bias is not None:
                    m.bias.data.fill_(0)
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.fill_(1)
                m.bias.data.fill_(0)

但它会产生以下错误,有人有什么建议吗?

/usr/local/lib/python3.6/dist-packages/torch/tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "
Traceback (most recent call last):
  File "train.py", line 62, in <module>
    cc_trainer = Trainer(loading_data,cfg_data,pwd)
  File "/content/drive/My Drive/Folder/Code/trainer.py", line 28, in __init__
    self.optimizer = optim.Adam(self.net.CCN.parameters(), lr=cfg.LR, weight_decay=1e-4) #remenber was 1e-4
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/adam.py", line 44, in __init__
    super(Adam, self).__init__(params, defaults)
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/optimizer.py", line 51, in __init__
    self.add_param_group(param_group)
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/optimizer.py", line 206, in add_param_group
    raise ValueError("can't optimize a non-leaf Tensor")
ValueError: can't optimize a non-leaf Tensor

【问题讨论】:

【参考方案1】:

尝试同时设置第一个频道的.data

res.conv1.weight[:,:3,:,:].data[...] = pretrained_weights

【讨论】:

【参考方案2】:

理想情况下,ResNet 接受 3 通道输入。要使其适用于 4 通道输入,您必须添加一个额外的层(2D conv),将 4 通道输入通过该层,以使该层的输出适合 ResNet 架构。

步骤

    复制模型权重

    weight = model.conv1.weight.clone()
    

    为 4 通道输入添加额外的 2d 转换

    model.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3, bias=False) #here 4 indicates 4-channel input
    

    您可以在额外的 con2d 之上添加 Relu 和 BatchNorm。在这个例子中,我没有使用。

    将额外的 cov2d 与 ResNet 模型(你之前复制的权重)连接起来

    model.conv1.weight[:, :3] = weight
    model.conv1.weight[:, 3] = model.conv1.weight[:, 0]
    

    完成

抱歉,我没有修改您的代码。您可以调整代码中的更改。

【讨论】:

谢谢,用 ...conv1.weight.data 替换 ...conv1.weight 对我有用。因为 conv1.weight 是 nn.Parameter 而 conv1.weight.data 是张量,我们只想复制张量值并避免使用一些与梯度相关的东西的参数包装器。【参考方案3】:

我想我已经解决了,但我不明白为什么。 谁能给我解释一下 nn.Parameter 的作用?为什么它会起作用?

class Res50(nn.Module):
    def __init__(self,  pretrained=True):
        super(Res50, self).__init__()

        self.de_pred = nn.Sequential(Conv2d(1024, 128, 1, same_padding=True, NL='relu'),
                                     Conv2d(128, 1, 1, same_padding=True, NL='relu'))
        
        initialize_weights(self.modules())
        res = models.resnet50(pretrained=pretrained)

        pretrained_weights = res.conv1.weight.clone()

        res.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,bias=False)

        res.conv1.weight[:,:3,:,:] = torch.nn.Parameter(pretrained_weights)
        res.conv1.weight[:,3,:,:] = torch.nn.Parameter(pretrained_weights[:,1,:,:])
        
        self.frontend = nn.Sequential(
            res.conv1, res.bn1, res.relu, res.maxpool, res.layer1, res.layer2
        )
        
        self.own_reslayer_3 = make_res_layer(Bottleneck, 256, 6, stride=1)        
        self.own_reslayer_3.load_state_dict(res.layer3.state_dict())

【讨论】:

以上是关于如何使用 Pytorch 中的预训练权重修改具有 4 个通道作为输入的 resnet 50?的主要内容,如果未能解决你的问题,请参考以下文章

为啥新层在修改后的预训练 pytorch 模型中被忽略?

Pytorch 中的预训练模型

如果我们扩展或减少同一模型的层,我们仍然可以从 Pytorch 中的预训练模型进行训练吗?

libtorch c++ 使用预训练权重(以resnet为例)

在 pytorch 中修改预训练模型时,旧的权重会重新初始化吗?

使用过去运行的预训练节点 - Pytorch Biggraph