Pytorch CIFAR10图像分类 GoogLeNet篇

Posted 2021-09-01 Real&Love

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Pytorch CIFAR10图像分类 GoogLeNet篇相关的知识，希望对你有一定的参考价值。

Pytorch CIFAR10图像分类 GoogLeNet篇

文章目录

Pytorch CIFAR10图像分类 GoogLeNet篇

这里贴一下汇总篇：汇总篇

4.定义网络（GoogLeNet）

GoogLeNet在2014年由Google团队提出（与VGG网络同年，注意GoogLeNet中的L大写是为了致敬LeNet），斩获当年ImageNet竞赛中Classification Task (分类任务) 第一名。原论文名称是《Going deeper with convolutions》，下面是网络结构图。

说说该网络中的亮点：

（1）引入了Inception结构（融合不同尺度的特征信息）

（2）使用1x1的卷积核进行降维以及映射处理（虽然VGG网络中也有，但该论文介绍的更详细）

（3）添加两个辅助分类器帮助训练

（4）丢弃全连接层，使用平均池化层（大大减少模型参数，除去两个辅助分类器，网络大小只有vgg的1/20）

接着我们来分析一下Inception结构：

Inception v1网络是一个精心设计的22层卷积网络，并提出了具有良好局部特征结构Inception模块，即对特征并行地执行多个大小不同的卷积运算与池化，最后再拼接到一起。由于1×1、3×3和5×5的卷积运算对应不同的特征图区域，因此这样做的好处是可以得到更好的图像表征信息。为了让四个分支的输出能够在深度方向进行拼接，必须保证四个分支输出的特征矩阵高度和宽度都相同）。

Inception模块如图所示，使用了三个不同大小的卷积核进行卷积运算，同时还有一个最大值池化，然后将这4部分级联起来（通道拼接），送入下一层。

分支1是卷积核大小为1x1的卷积层，stride=1，

分支2是卷积核大小为3x3的卷积层，stride=1，padding=1（保证输出特征矩阵的高和宽和输入特征矩阵相等），

分支3是卷积核大小为5x5的卷积层，stride=1，padding=2（保证输出特征矩阵的高和宽和输入特征矩阵相等），

分支4是池化核大小为3x3的最大池化下采样，stride=1，padding=1（保证输出特征矩阵的高和宽和输入特征矩阵相等）。

在上述模块的基础上，为进一步降低网络参数量，Inception又增加了多个1×1的卷积模块。如图3.14所示，这种1×1的模块可以先将特征图降维，再送给3×3和5×5大小的卷积核，由于通道数的降低，参数量也有了较大的减少。

Inception v1网络一共有9个上述堆叠的模块，共有22层，在最后的Inception模块处使用了全局平均池化

为了避免深层网络训练时带来的梯度消失问题，作者还引入了两个辅助的分类器，在第3个与第6个Inception模块输出后执行Softmax并计算损失，在训练时和最后的损失一并回传。
接着下来在看看辅助分类器结构，网络中的两个辅助分类器结构是一模一样的，如下图所示：

辅助分类器：

第一层是一个平均池化下采样层，池化核大小为5x5，stride=3

第二层是卷积层，卷积核大小为1x1，stride=1，卷积核个数是128

第三层是全连接层，节点个数是1024

第四层是全连接层，节点个数是1000（对应分类的类别个数）

Inception v1的参数量是AlexNet的1/12，VGGNet的1/3，适合处理大规模数据，尤其是对于计算资源有限的平台。

Inception v2

在Inception v1网络的基础上，随后又出现了多个Inception版本。Inception v2进一步通过卷积分解与正则化实现更高效的计算，增加了BN层，同时利用两个级联的3×3卷积取代Inception v1版本中的5×5卷积，如图3.15所示，这种方式既减少了卷积参数量，也增加了网络的非线性能力。

此外除了这两个版本，这几年还分别出了Inception v3和Inception v4。

Inception v3在Inception v2的基础上，使用了RMSProp优化器，在辅助的分类器部分增加了7×7的卷积，并且使用了标签平滑技术。

Inception v4则是将Inception的思想与残差网络进行了结合，显著提升了训练速度与模型准确率，这里对于模块细节不再展开讲述。至于残差网络这一里程碑式的结构，正是由下一节的网络ResNet引出的。

这里演示只展示一个Inception v1的网络结构有兴趣的话也可以尝试其他网络结构

首先我们还是得判断是否可以利用GPU，因为GPU的速度可能会比我们用CPU的速度快20-50倍左右，特别是对卷积神经网络来说，更是提升特别明显。

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# 定义一个卷积加一个 relu 激活函数和一个 batchnorm 作为一个基本的层结构
class BasicConv2d(nn.Module):
    def __init__(self,in_channel, out_channel, kernel_size, stride=1, padding=0):
        super(BasicConv2d,self).__init__()
        self.conv = nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding)
        self.batch = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(True)
    def forward(self, x):
        x = self.conv(x)
        x = self.batch(x)
        x = self.relu(x)
        return x

# 首先定义Inception
class Inception(nn.Module):
    def __init__(self, in_channel, n1x1, n3x3red, n3x3, n5x5red, n5x5, pool_proj):
        super(Inception, self).__init__()
        
        # 第一条线路
        self.branch1x1 = BasicConv2d(in_channel, n1x1, 1)
        
        # 第二条线路
        self.branch3x3 = nn.Sequential( 
            BasicConv2d(in_channel, n3x3red, 1),
            BasicConv2d(n3x3red, n3x3, 3, padding=1)
        )
        
        # 第三条线路
        self.branch5x5 = nn.Sequential(
            BasicConv2d(in_channel, n5x5red, 1),
            BasicConv2d(n5x5red, n5x5, 5, padding=2)
        )
        
        # 第四条线路
        self.branch_pool = nn.Sequential(
            nn.MaxPool2d(kernel_size= 3, stride=1, padding=1),
            BasicConv2d(in_channel, pool_proj, 1)
        )
    def forward(self, x):
        f1 = self.branch1x1(x)
        f2 = self.branch3x3(x)
        f3 = self.branch5x5(x)
        f4 = self.branch_pool(x)
        output = torch.cat((f1, f2, f3, f4), dim=1)
        return output

class InceptionAux(nn.Module):
    def __init__(self, in_channel, num_classes):
        super(InceptionAux, self).__init__()
#         self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.averagePool = nn.AvgPool2d(kernel_size=2)
        self.conv = BasicConv2d(in_channel, 128, kernel_size=1)  # output[batch, 128, 4, 4]
        
        self.fc1 = nn.Sequential(
#             nn.Linear(2048, 1024),
            nn.Linear(128,64),
            nn.ReLU(True)
        )
#         self.drop = nn.Dropout(0.5)
#         self.fc2 = nn.Linear(1024, num_classes)
        self.fc2 = nn.Linear(64, num_classes)
    def forward(self, x):

        x = self.averagePool(x)
        x = self.conv(x)
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        x = self.fc1(x)
        x = F.dropout(x, 0.5, training=self.training)
        x = self.fc2(x)
        return x

我们可以测试一下Inception，是否会改变

test_net = Inception(3, 64, 48, 64, 64, 96, 32)
test_x =torch.zeros(1, 3, 32, 32)
print('input shape: {} x {} x {}'.format(test_x.shape[1], test_x.shape[2], test_x.shape[3]))
test_y = test_net(test_x)
print('output shape: {} x {} x {}'.format(test_y.shape[1], test_y.shape[2], test_y.shape[3]))

input shape: 3 x 32 x 32
output shape: 256 x 32 x 32

class GoogLeNet(nn.Module):
    def __init__(self, in_channel, num_classes, aux_logits=False, verbose=False, init_weights=True):
        super(GoogLeNet, self).__init__()
        self.verbose = verbose
        self.aux_logits = aux_logits
        
#         self.block1 = nn.Sequential(
#             BasicConv2d(in_channel, out_channel=64, kernel=7, stride=2, padding=3),
#             nn.MaxPool2d(3, 2, ceil_mode=True)
#         )
        
#         self.block2 = nn.Sequential(
#             BasicConv2d(64, 64, kernel=1),
#             BasicConv2d(64, 192, kernel=3, padding=1),
#             nn.MaxPool2d(3, 2, ceil_mode=True)
#         )
        
#         self.block3 = nn.Sequential(
#             inception(192, 64, 96, 128, 16, 32, 32),
#             inception(256, 128, 128, 192, 32, 96, 64),
#             nn.MaxPool2d(3, 2, ceil_mode=True)
#         )
        
#         self.block4 = nn.Sequential(
#             inception(480, 192, 96, 208, 16, 48, 64),
#             inception(512, 160, 112, 224, 24, 64, 64),
#             inception(512, 128, 128, 256, 24, 64, 64),
#             inception(512, 112, 144, 288, 32, 64, 64),
#             inception(528, 256, 160, 320, 32, 128, 128),
#             nn.MaxPool2d(3, 2, ceil_mode=True)
#         )
        
#         self.block5 = nn.Sequential(
#             inception(832, 256, 160, 320, 32, 128, 128),
#             inception(832, 384, 182, 384, 48, 128, 128),
            
#         )
        # block1
        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
        # block2
        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
        # block3
        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
        # block4
        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
        # block5
        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
        
        if self.aux_logits:
            self.aux1 = InceptionAux(512, num_classes)
            self.aux2 = InceptionAux(528, num_classes)
        
#         self.avgpool = nn.AvgPool2d(7)
        self.avgpool = nn.AvgPool2d(1) # 对32x32 不一样
        self.dropout = nn.Dropout(0.4)
        self.classifier = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()
        
            
    def forward(self, x):
#         x = self.block1(x)
        x = self.conv1(x)
        x = self.maxpool1(x)
        if self.verbose:
            print('block 1 output: {}'.format(x.shape))
#         x = self.block2(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool2(x)
        if self.verbose:
            print('block 2 output: {}'.format(x.shape))
#         x = self.block3(x)
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
        if self.verbose:
            print('block 3 output: {}'.format(x.shape))
#         x = self.block4(x)
        x = self.inception4a(x)
        if self.training and self.aux_logits:    # eval model lose this layer
            aux1 = self.aux1(x)
            if self.verbose:
                print('aux 1 output: {}'.format(aux1.shape))
            
        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)
        if self.training and self.aux_logits:    # eval model lose this layer
            aux2 = self.aux2(x)
            if self.verbose:
                print('aux 2 output: {}'.format(aux2.shape))
        x = self.inception4e(x)
        x = self.maxpool4(x)
        if self.verbose:
            print('block 4 output: {}'.format(x.shape))
#         x = self.block5(x)
        x = self.inception5a(x)
        x = self.inception5b(x)
        if self.verbose:
            print('block 5 output: {}'.format(x.shape))
        x = self.avgpool(x)
        x = x.view(x.shape[0], -1)
        x = self.dropout(x)
        x = self.classifier(x)
        if self.training and self.aux_logits:   # eval model lose this layer
            return x, aux2, aux1
        return x
    
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)
net = GoogLeNet(3,10,aux_logits = True, verbose = False).to(device)

summary(net,(3,32,32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 16, 16]           9,472
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
       BasicConv2d-4           [-1, 64, 16, 16]               0
         MaxPool2d-5             [-1, 64, 8, 8]               0
            Conv2d-6             [-1, 64, 8, 8]           4,160
       BatchNorm2d-7             [-1, 64, 8, 8]             128
              ReLU-8             [-1, 64, 8, 8]               0
       BasicConv2d-9             [-1, 64, 8, 8以上是关于Pytorch CIFAR10图像分类 GoogLeNet篇的主要内容，如果未能解决你的问题，请参考以下文章