[Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现

Posted 文火冰糖的硅基工坊

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现相关的知识,希望对你有一定的参考价值。

作者主页(文火冰糖的硅基工坊):文火冰糖(王文兵)的博客_文火冰糖的硅基工坊_CSDN博客

本文网址:https://blog.csdn.net/HiWangWenBing/article/details/121363706


目录

第1章 关于Fine Tuning与Transfer Trainning概述

1.1 理论基础

1.2 迁移学习的关键步骤

1.3 本文概述

1.4 训练环境

第2章 输入数据集

2.1 定义数据集加载时的数据格式的转换

2.2 加载数据集

2.3 定义数据集批处理data_loader

2.4 展现一个批次的图片

第3章 定义前向计算的网络

3.1 定义操作神经网络训练是否能够训练的函数

3.2 定义创建神经网络的函数

3.3 创建并显示创建的神经网络

3.4 展示需要训练的网络参数

3.5 重新设置可训练参数 =》卷积 + 全连接(new)

3.6 加载先前的checkpoint(new)

第4章 模型训练

4.1 定义模型训练流程与策略(重点、重点、重点)

4.2 指定反向计算的loss函数

4.3 指定反向计算的优化器/算法

4.4 训练前的准备

4.5 开始训练

第5章 模型评估

5.1 可视化loss变化过程

5.2 可视化accuracy变化过程

5.3 可视化best accuracy变化过程

5.4 定义评估模型

5.5 在训练集上评估

5.6 在测试集上评估

第6章 模型存储

第7章 笔者感悟




第1章 关于Fine Tuning与Transfer Trainning概述

1.1 理论基础

[人工智能-深度学习-46]:FineTuning(微调)、Transfer Trainning(迁移学习)的理论基础与深度解析_文火冰糖(王文兵)的博客-CSDN博客第1张 前言:常见的工程诉求与特点(1)数据集欠缺个人的数据集小,无法提供向ImageNet这样的大数据集,但又想利用在ImageNet上训练的模型好的模型,为我所用,即基于在一些知名的数据集上训练好的模型,在进一步的训练,以满足自己的应用场景的需求,而无需重头开始训练。(2)分类数多变个人特定的应用,分类的种类与Image(1000种分类)等知名数据集的分类的种类不同,我们想在已经训练好模型的基础上,做适当的重新训练,以支持我们自己的分类数目,如100分类。(3)防止过度训练,即过..https://blog.csdn.net/HiWangWenBing/article/details/121312417

1.2 迁移学习的关键步骤

(1)步骤1:全连接层的初步训练

基于第三方(如官网)提供的预定义模型以及其预训练好的模型参数,锁定特征提取层,替换全连接层,并在自己的数据集上只重新训练全连接层,从而第三方训练好的神经网络能够适配到自身数据集和图片分类的业务需求上。

本步骤训练模型的基本策略是:

  • 一边在训练集上训练,一边在验证集上验证。
  • 选择在整个验证集上,而不是验证集的一个batch上,其准确率最高的模型参数以及优化器参数作为最终的模型参数
  • 在整个验证集,而不是batch的目的:增加在测试集上的泛化能力
  • 在验证集上准确率最高的目的:     防止在训练集上的过拟合

(2)步骤2:全网络的优化训练

相对于步骤1,步骤2的主要完成

  • 加载步骤1的训练模型,并基于此模型进一步训练
  • 开放整个网络(包括特征提取)和全连接层
  • 降低学习率100倍,以便在根据精细的层面进行训练

1.3 本文概述

本文主要针对步骤-2的pytorch实现。

步骤-2与步骤-1 90%的代码是相同的,大致在10%的差异。

主要差异在:

  • 网络的参数数值的初始化
  • 网络可训练属性的设置

1.4 训练环境

文本以Resnet + CIFAR100 + GPU为例。

对于没有GPU的学习环境,可以把案例中的网络修改成:Alexnet + CIFAR10 + CPU, 只需要几行的代码改动,并不影响软件流程和架构。

第2章 输入数据集

2.1 定义数据集加载时的数据格式的转换

#2-1 准备数据集
# 数据集格式转换
transform_train = transforms.Compose(
    [transforms.Resize(256),           #transforms.Scale(256)
     transforms.CenterCrop(224), 
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])


transform_test = transforms.Compose(
    [transforms.Resize(256),         #transforms.Scale(256)
     transforms.CenterCrop(224), 
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

2.2 加载数据集

从本地加载数据集文件中加载数据集,如果没有数据集,自动从官网上在线下载数据集

# 训练数据集
train_data = dataset.CIFAR100 (root = "../datasets/cifar100",
                           train = True,
                           transform = transform_train,
                           download = True)

# 测试数据集
test_data = dataset.CIFAR100 (root = "../datasets/cifar100",
                           train = False,
                           transform = transform_test,
                           download = True)

print(train_data)
print("size=", len(train_data))
print("")
print(test_data)
print("size=", len(test_data))
Files already downloaded and verified
Files already downloaded and verified
Dataset CIFAR100
    Number of datapoints: 50000
    Root location: ../datasets/cifar100
    Split: Train
    StandardTransform
Transform: Compose(
               Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
               CenterCrop(size=(224, 224))
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )
size= 50000

Dataset CIFAR100
    Number of datapoints: 10000
    Root location: ../datasets/cifar100
    Split: Test
    StandardTransform
Transform: Compose(
               Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
               CenterCrop(size=(224, 224))
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )
size= 10000

2.3 定义数据集批处理data_loader

# 批量数据读取
batch_size = 32

train_loader = data_utils.DataLoader(dataset = train_data,  #训练数据
                                  batch_size = batch_size,           #每个批次读取的图片数量
                                  shuffle = True)           #读取到的数据,是否需要随机打乱顺序

test_loader = data_utils.DataLoader(dataset = test_data,   #测试数据集
                                  batch_size = batch_size,
                                  shuffle = True)

print(train_loader)
print(test_loader)
print(len(train_data), len(train_data)/batch_size)
print(len(test_data),  len(test_data)/batch_size)
<torch.utils.data.dataloader.DataLoader object at 0x0000015158CCF4C0>
<torch.utils.data.dataloader.DataLoader object at 0x000001516F052640>
50000 1562.5
10000 312.5

备注:

批处理的长度与GPU的内存,图片文件的大小有关,8G的GPU, batch size设定为32比较合适。

2.4 展现一个批次的图片

(1)定义显示函数

def img_show_from_torch(img_data, title = None, debug_flag = False):
    # 颜色通道还原
    img_data = img_data.numpy()
    img_data = img_data.transpose(1,2,0)
    
    # 标准化的还原
    mean = [0.485, 0.456, 0.406]
    std  = [0.229, 0.224, 0.225]
    img_data = std * img_data + mean
    
    # 像素值限制
    img_data = np.clip(img_data, 0, 1)

    if(debug_flag == True):
        print("PIL Image data")
        #print("image_shape: ", img_data.shape)
        #print("image_dtype: ", img_data.dtype)
        print("image_type: ", type(img_data))
        print(img_data)    
    
    # 显示图片
    fig, ax = plt.subplots()
    ax.imshow(img_data)
    ax.set_title(title)


def img_show_from_torch_batch(img_data, title = None, debug_flag = False):
    # 把多张图片合并成一章图片
    img_data = utils.make_grid(img_data)
    
    # 显示单张图片
    img_show_from_torch(img_data, title = title, debug_flag = debug_flag)

(2)获取一个批次的图片

#显示一个batch图片
print("获取一个batch组图片")
imgs, labels = next(iter(train_loader))
print(imgs.shape)
print(labels.shape)

(3)显示单张图片

img_show_from_torch(img_data = imgs[0], debug_flag = False)

(4)显示批次图片

img_show_from_torch_batch(imgs)

第3章 定义前向计算的网络

3.1 定义操作神经网络训练是否能够训练的函数

# 设置网络参数的trainable属性, 即设置梯度迭代使能的属性
def set_model_grad_state(model, trainable_state):
    for param in model.parameters():
        param.requires_grad = trainable_state
# 显示网络参数允许trainable的参数,即梯度迭代使能的参数
def show_model_grad_state_enabled(model):
    print("params to be trained:")
    for name, parameters in model.named_parameters():
        if(parameters.requires_grad == True):
            print(name, ':', parameters.requires_grad)

3.2 定义创建神经网络的函数

该函数的主要任务包括:

  • 创建指定的官网预定义的神经网络(默认是1000分类),支持多种神经网络,可扩展。
  • 锁定特征提取层
  • 根据自身的需要,替换全连接层,适配到自身的图片种类分类(如100分类或10分类)
  • use_pretrained = True时,自动自动远程下载预训练参数,并利用预训练的模型参数(主要针对ImageNet)初始化神经网络
# model_name: 模型的名称
# num_classes:输出种类
# lock_feature_extract:是否锁定特征提取网络
# use_pretrained:是否需要使用预训练参数初始化自定义的神经网络
# feature_extact_trainable: 特征提取层是否能够训练,即是否需要锁定特征提取层
def initialize_model(model_name, num_classes, use_pretrained = False, feature_extact_trainable = True):
    model = None
    input_size = 0

    if(model_name == "resnet"):
        if(use_pretrained == True):
             # 使用预训练参数
            model = models.resnet101(pretrained = True)
            
            # 锁定特征提取层
            set_model_grad_state(model, feature_extact_trainable)
            
            #替换全连接层
            num_in_features  = model.fc.in_features
            model.fc = nn.Sequential(nn.Linear(num_in_features, num_classes))
        else:
            model = models.resnet101(pretrained = False, num_classes = num_classes)
        input_size = 224
    elif(model_name == "alexnet"):
        if(use_pretrained == True):
             # 使用预训练参数
            model = models.alexnet(pretrained = True)
            
            # 锁定特征提取层
            set_model_grad_state(model, feature_extact_trainable)
            
            #替换全连接层
            num_in_features  = model.classifier[6].in_features
            model.classifier[6] = nn.Sequential(nn.Linear(num_in_features, num_classes))
        else:
            model = models.alexnet(pretrained = False, num_classes = num_classes)
        input_size = 224       
        
    elif(model_name == "vgg"):
        if(use_pretrained == True):
             # 使用预训练参数
            model = models.vgg16(pretrained = True)
            
            # 锁定特征提取层
            set_model_grad_state(model, feature_extact_trainable)
            
            #替换全连接层
            num_in_features  = model.classifier[6].in_features
            model.classifier[6] = nn.Sequential(nn.Linear(num_in_features, num_classes))
        else:
            model = models.vgg16(pretrained = False, num_classes = num_classes)
        input_size = 224
    return model, input_size

备注:

从上述代码可以看出,利用官网提供的API,很方便定义一个复杂的预定义神经网络。

3.3 创建并显示创建的神经网络

# 创建网络实例
model, input_size = initialize_model(model_name = "resnet", num_classes = 100, use_pretrained = True, feature_extact_trainable=False)

print(input_size)
print(model)
224
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (6): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (7): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (8): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (9): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (10): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (11): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (12): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (13): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (14): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (15): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (16): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (17): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (18): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (19): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (20): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (21): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (22): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Sequential(
    (0): Linear(in_features=2048, out_features=100, bias=True)
  )
)

备注:

(fc): Sequential( (0): Linear(in_features=2048, out_features=100, bias=True)

  • 从这里可以看出,全连接层被替换,1000分类被替换成100分类。也就是说,加载的预训练参数,包括特征提取层和全连接层,但全连接被替代了,需要重新训练。

3.4 展示需要训练的网络参数

# 检查需要训练的参数
show_model_grad_state_enabled(model)
params to be trained:
fc.0.weight : True
fc.0.bias : True

3.5 重新设置可训练参数 =》卷积 + 全连接(new)

# 重新设定网络属性:使能所有层的训练
set_model_grad_state(model, True)

# 检查需要训练的参数
show_model_grad_state_enabled(model)
params to be trained:
conv1.weight : True
bn1.weight : True
bn1.bias : True
layer1.0.conv1.weight : True
layer1.0.bn1.weight : True
layer1.0.bn1.bias : True
layer1.0.conv2.weight : True
layer1.0.bn2.weight : True
layer1.0.bn2.bias : True
layer1.0.conv3.weight : True
layer1.0.bn3.weight : True
layer1.0.bn3.bias : True
layer1.0.downsample.0.weight : True
layer1.0.downsample.1.weight : True
layer1.0.downsample.1.bias : True
layer1.1.conv1.weight : True
layer1.1.bn1.weight : True
layer1.1.bn1.bias : True
layer1.1.conv2.weight : True
layer1.1.bn2.weight : True
layer1.1.bn2.bias : True
layer1.1.conv3.weight : True
layer1.1.bn3.weight : True
layer1.1.bn3.bias : True
layer1.2.conv1.weight : True
layer1.2.bn1.weight : True
layer1.2.bn1.bias : True
layer1.2.conv2.weight : True
layer1.2.bn2.weight : True
layer1.2.bn2.bias : True
layer1.2.conv3.weight : True
layer1.2.bn3.weight : True
layer1.2.bn3.bias : True
layer2.0.conv1.weight : True
layer2.0.bn1.weight : True
layer2.0.bn1.bias : True
layer2.0.conv2.weight : True
layer2.0.bn2.weight : True
layer2.0.bn2.bias : True
layer2.0.conv3.weight : True
layer2.0.bn3.weight : True
layer2.0.bn3.bias : True
layer2.0.downsample.0.weight : True
layer2.0.downsample.1.weight : True
layer2.0.downsample.1.bias : True
layer2.1.conv1.weight : True
layer2.1.bn1.weight : True
layer2.1.bn1.bias : True
layer2.1.conv2.weight : True
layer2.1.bn2.weight : True
layer2.1.bn2.bias : True
layer2.1.conv3.weight : True
layer2.1.bn3.weight : True
layer2.1.bn3.bias : True
layer2.2.conv1.weight : True
layer2.2.bn1.weight : True
layer2.2.bn1.bias : True
layer2.2.conv2.weight : True
layer2.2.bn2.weight : True
layer2.2.bn2.bias : True
layer2.2.conv3.weight : True
layer2.2.bn3.weight : True
layer2.2.bn3.bias : True
layer2.3.conv1.weight : True
layer2.3.bn1.weight : True
layer2.3.bn1.bias : True
layer2.3.conv2.weight : True
layer2.3.bn2.weight : True
layer2.3.bn2.bias : True
layer2.3.conv3.weight : True
layer2.3.bn3.weight : True
layer2.3.bn3.bias : True
layer3.0.conv1.weight : True
layer3.0.bn1.weight : True
layer3.0.bn1.bias : True
layer3.0.conv2.weight : True
layer3.0.bn2.weight : True
layer3.0.bn2.bias : True
layer3.0.conv3.weight : True
layer3.0.bn3.weight : True
layer3.0.bn3.bias : True
layer3.0.downsample.0.weight : True
layer3.0.downsample.1.weight : True
layer3.0.downsample.1.bias : True
layer3.1.conv1.weight : True
layer3.1.bn1.weight : True
layer3.1.bn1.bias : True
layer3.1.conv2.weight : True
layer3.1.bn2.weight : True
layer3.1.bn2.bias : True
layer3.1.conv3.weight : True
layer3.1.bn3.weight : True
layer3.1.bn3.bias : True
layer3.2.conv1.weight : True
layer3.2.bn1.weight : True
layer3.2.bn1.bias : True
layer3.2.conv2.weight : True
layer3.2.bn2.weight : True
layer3.2.bn2.bias : True
layer3.2.conv3.weight : True
layer3.2.bn3.weight : True
layer3.2.bn3.bias : True
layer3.3.conv1.weight : True
layer3.3.bn1.weight : True
layer3.3.bn1.bias : True
layer3.3.conv2.weight : True
layer3.3.bn2.weight : True
layer3.3.bn2.bias : True
layer3.3.conv3.weight : True
layer3.3.bn3.weight : True
layer3.3.bn3.bias : True
layer3.4.conv1.weight : True
layer3.4.bn1.weight : True
layer3.4.bn1.bias : True
layer3.4.conv2.weight : True
layer3.4.bn2.weight : True
layer3.4.bn2.bias : True
layer3.4.conv3.weight : True
layer3.4.bn3.weight : True
layer3.4.bn3.bias : True
layer3.5.conv1.weight : True
layer3.5.bn1.weight : True
layer3.5.bn1.bias : True
layer3.5.conv2.weight : True
layer3.5.bn2.weight : True
layer3.5.bn2.bias : True
layer3.5.conv3.weight : True
layer3.5.bn3.weight : True
layer3.5.bn3.bias : True
layer3.6.conv1.weight : True
layer3.6.bn1.weight : True
layer3.6.bn1.bias : True
layer3.6.conv2.weight : True
layer3.6.bn2.weight : True
layer3.6.bn2.bias : True
layer3.6.conv3.weight : True
layer3.6.bn3.weight : True
layer3.6.bn3.bias : True
layer3.7.conv1.weight : True
layer3.7.bn1.weight : True
layer3.7.bn1.bias : True
layer3.7.conv2.weight : True
layer3.7.bn2.weight : True
layer3.7.bn2.bias : True
layer3.7.conv3.weight : True
layer3.7.bn3.weight : True
layer3.7.bn3.bias : True
layer3.8.conv1.weight : True
layer3.8.bn1.weight : True
layer3.8.bn1.bias : True
layer3.8.conv2.weight : True
layer3.8.bn2.weight : True
layer3.8.bn2.bias : True
layer3.8.conv3.weight : True
layer3.8.bn3.weight : True
layer3.8.bn3.bias : True
layer3.9.conv1.weight : True
layer3.9.bn1.weight : True
layer3.9.bn1.bias : True
layer3.9.conv2.weight : True
layer3.9.bn2.weight : True
layer3.9.bn2.bias : True
layer3.9.conv3.weight : True
layer3.9.bn3.weight : True
layer3.9.bn3.bias : True
layer3.10.conv1.weight : True
layer3.10.bn1.weight : True
layer3.10.bn1.bias : True
layer3.10.conv2.weight : True
layer3.10.bn2.weight : True
layer3.10.bn2.bias : True
layer3.10.conv3.weight : True
layer3.10.bn3.weight : True
layer3.10.bn3.bias : True
layer3.11.conv1.weight : True
layer3.11.bn1.weight : True
layer3.11.bn1.bias : True
layer3.11.conv2.weight : True
layer3.11.bn2.weight : True
layer3.11.bn2.bias : True
layer3.11.conv3.weight : True
layer3.11.bn3.weight : True
layer3.11.bn3.bias : True
layer3.12.conv1.weight : True
layer3.12.bn1.weight : True
layer3.12.bn1.bias : True
layer3.12.conv2.weight : True
layer3.12.bn2.weight : True
layer3.12.bn2.bias : True
layer3.12.conv3.weight : True
layer3.12.bn3.weight : True
layer3.12.bn3.bias : True
layer3.13.conv1.weight : True
layer3.13.bn1.weight : True
layer3.13.bn1.bias : True
layer3.13.conv2.weight : True
layer3.13.bn2.weight : True
layer3.13.bn2.bias : True
layer3.13.conv3.weight : True
layer3.13.bn3.weight : True
layer3.13.bn3.bias : True
layer3.14.conv1.weight : True
layer3.14.bn1.weight : True
layer3.14.bn1.bias : True
layer3.14.conv2.weight : True
layer3.14.bn2.weight : True
layer3.14.bn2.bias : True
layer3.14.conv3.weight : True
layer3.14.bn3.weight : True
layer3.14.bn3.bias : True
layer3.15.conv1.weight : True
layer3.15.bn1.weight : True
layer3.15.bn1.bias : True
layer3.15.conv2.weight : True
layer3.15.bn2.weight : True
layer3.15.bn2.bias : True
layer3.15.conv3.weight : True
layer3.15.bn3.weight : True
layer3.15.bn3.bias : True
layer3.16.conv1.weight : True
layer3.16.bn1.weight : True
layer3.16.bn1.bias : True
layer3.16.conv2.weight : True
layer3.16.bn2.weight : True
layer3.16.bn2.bias : True
layer3.16.conv3.weight : True
layer3.16.bn3.weight : True
layer3.16.bn3.bias : True
layer3.17.conv1.weight : True
layer3.17.bn1.weight : True
layer3.17.bn1.bias : True
layer3.17.conv2.weight : True
layer3.17.bn2.weight : True
layer3.17.bn2.bias : True
layer3.17.conv3.weight : True
layer3.17.bn3.weight : True
layer3.17.bn3.bias : True
layer3.18.conv1.weight : True
layer3.18.bn1.weight : True
layer3.18.bn1.bias : True
layer3.18.conv2.weight : True
layer3.18.bn2.weight : True
layer3.18.bn2.bias : True
layer3.18.conv3.weight : True
layer3.18.bn3.weight : True
layer3.18.bn3.bias : True
layer3.19.conv1.weight : True
layer3.19.bn1.weight : True
layer3.19.bn1.bias : True
layer3.19.conv2.weight : True
layer3.19.bn2.weight : True
layer3.19.bn2.bias : True
layer3.19.conv3.weight : True
layer3.19.bn3.weight : True
layer3.19.bn3.bias : True
layer3.20.conv1.weight : True
layer3.20.bn1.weight : True
layer3.20.bn1.bias : True
layer3.20.conv2.weight : True
layer3.20.bn2.weight : True
layer3.20.bn2.bias : True
layer3.20.conv3.weight : True
layer3.20.bn3.weight : True
layer3.20.bn3.bias : True
layer3.21.conv1.weight : True
layer3.21.bn1.weight : True
layer3.21.bn1.bias : True
layer3.21.conv2.weight : True
layer3.21.bn2.weight : True
layer3.21.bn2.bias : True
layer3.21.conv3.weight : True
layer3.21.bn3.weight : True
layer3.21.bn3.bias : True
layer3.22.conv1.weight : True
layer3.22.bn1.weight : True
layer3.22.bn1.bias : True
layer3.22.conv2.weight : True
layer3.22.bn2.weight : True
layer3.22.bn2.bias : True
layer3.22.conv3.weight : True
layer3.22.bn3.weight : True
layer3.22.bn3.bias : True
layer4.0.conv1.weight : True
layer4.0.bn1.weight : True
layer4.0.bn1.bias : True
layer4.0.conv2.weight : True
layer4.0.bn2.weight : True
layer4.0.bn2.bias : True
layer4.0.conv3.weight : True
layer4.0.bn3.weight : True
layer4.0.bn3.bias : True
layer4.0.downsample.0.weight : True
layer4.0.downsample.1.weight : True
layer4.0.downsample.1.bias : True
layer4.1.conv1.weight : True
layer4.1.bn1.weight : True
layer4.1.bn1.bias : True
layer4.1.conv2.weight : True
layer4.1.bn2.weight : True
layer4.1.bn2.bias : True
layer4.1.conv3.weight : True
layer4.1.bn3.weight : True
layer4.1.bn3.bias : True
layer4.2.conv1.weight : True
layer4.2.bn1.weight : True
layer4.2.bn1.bias : True
layer4.2.conv2.weight : True
layer4.2.bn2.weight : True
layer4.2.bn2.bias : True
layer4.2.conv3.weight : True
layer4.2.bn3.weight : True
layer4.2.bn3.bias : True
fc.0.weight : True
fc.0.bias : True

备注:所有的网络参数都参与训练。

3.6 加载先前的checkpoint(new)

# 加载先前训练的check point
checkpoint_file = "../models/checkpoints/resnet101_cifar100_checkpoint.pth"

checkpoint = torch.load(checkpoint_file)
model.load_state_dict(checkpoint ["state_dict"])

#获取check point模型的准确率
best_accuarcy = checkpoint ["best_accuracy"]
print("best_accuarcy of check  point =", best_accuarcy)
best_accuarcy of check  point = 60.83

第4章 模型训练

4.1 定义模型训练流程与策略(重点、重点、重点)

# 模块迁移学习/训练的定义:
# 一边在训练集上训练,一边在验证集上验证
# 策略:
# 最终选择在整个验证集上,而不是验证集的一个batch上,其准确率最高的模型参数以及优化器参数作为最终的模型参数
# 在整个验证集,而不是batch的目的:增加在测试集上的泛化能力
# 在验证集上准确率最高的目的:     防止在训练集上的过拟合
def model_train(model, train_loader, test_loader, criterion, optimizer, device, num_epoches = 1, check_point_filename=""):
    # 记录训练的开始时间
    time_train_start = time.time()
    print('+ Train start: num_epoches = {}'.format(num_epoches))
    
    # 历史数据,用于显示
    batch_loss_history = []
    batch_accuracy_history = []
    best_accuracy_history = []
    
    # 记录最好的精度,用于保存此时的模型,并不是按照epoch来保存模型,也不是保存最后的模型
    best_accuracy = 0  
    best_epoch = 0 
    
    #使用当前的模型参数,作为best model的初始值
    best_model_state = copy.deepcopy(model.state_dict())
    
    # 把模型迁移到 GPU device上
    model.to(device)
    
    # epoch层
    for epoch in range(num_epoches):
        time_epoch_start = time.time()
        print('++ Epoch start: {}/{}'.format(epoch, num_epoches-1))

        epoch_size     = 0
        epoch_loss_sum = 0
        epoch_corrects = 0

        # 数据集层
        #每训练完一个epoch,进行一次全训练样本的训练和一次验证样本的验证
        for dataset in ["train", "valid"]:
            time_dataset_start = time.time()
            print('+++ dataset start: epoch = {}, dataset = {}'.format(epoch, dataset))
            
            if dataset == "train":
                model.train()  # 设置在训练模式
                data_loader = train_loader
            else:
                model.eval()   # 设置在验证模式
                data_loader = test_loader
            
            dataset_size = len(data_loader.dataset)
            dataset_loss_sum = 0
            dataset_corrects = 0
            
            # batch层
            # begin to operate in mode
            for batch, (inputs, labels) in enumerate(data_loader):
                # (0) batch size
                batch_size = inputs.size(0)
                
                #(1) 指定数据处理的硬件单元
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                #(2) 复位优化器的梯度
                optimizer.zero_grad()
                
                # session层
                with torch.set_grad_enabled (dataset == "train"):
                    #(3) 前向计算输出
                    outputs = model(inputs)
                    
                    #(4) 计算损失值
                    loss = criterion(outputs, labels)
                    
                    if(dataset == "train"):
                        #(5) 反向求导
                        loss.backward()
                        
                        #(6) 反向迭代
                        optimizer.step()
                    
                    # (7-1) 统计当前batch的loss(包括训练集和验证集)
         

以上是关于[Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现的主要内容,如果未能解决你的问题,请参考以下文章

[Pytorch系列-49]:卷积神经网络 - 迁移学习的统一处理流程与软件架构 - Pytorch代码实现

[Pytorch系列-31]:卷积神经网络 - torch.nn.Conv2d() 用法详解

[Pytorch系列-45]:卷积神经网络 - 用GPU训练AlexNet+CIFAR10数据集

[Pytorch系列-46]:卷积神经网络 - 用GPU训练ResNet+CIFAR100数据集

[Pytorch系列-35]:卷积神经网络 - 搭建LeNet-5网络与CFAR10分类数据集

[Pytorch系列-34]:卷积神经网络 - 搭建LeNet-5网络与MNIST数据集手写数字识别