EfficientNet代码解读笔记
Posted Lf&x&my
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了EfficientNet代码解读笔记相关的知识,希望对你有一定的参考价值。
一、简单回顾EfficientNet结构
EfficientNet -B0 baseline netwwork网络列表参数,有9个stage,其中2-8使用的operator全部都是MBConv。
MBConv的结构
在它的主分支上,先是一个1*1的升维卷积,个数是channel的n倍,然后是BN+Swish激活函数;然后,是我们的DW卷积,接上BN+Swish激活函数;然后是SE注意力机制模块; 接着是1*1降维卷积+BN,最后通过Droupout层。
SE模块结构
就是一个全局平均池化,两个全连接构成。
简单的回顾之后就看一下我们的代码。
二、代码
1._make_divisible
作用就是将channel的个数,调整到离它最近的8的整数。这样做之后,对我们的硬件更加友好。
def _make_divisible(ch, divisor=8, min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_ch < 0.9 * ch:
new_ch += divisor
return new_ch
2.ConvBNActivation模块
对应MBConv模块,在代码中,使用定义ConvBNActivation类。在这个类当中传入我们输入序列channel,输出channel,核的大小,步距,groups控制我们的卷积使用Conv还是DWconv,标准化层(BN),激活函数。下面根据我们核的大小,计算padding,定义标准化和激活的默认情况,为BatchNorm2d
和SiLU
(torch>1.7)。在super中,传入所需构建的层结构,卷积层--输入特征矩阵、输出特征矩阵、kernel_size、stride、padding、groups、bias,都是我们传入的参数。标准化层norm_layer--输出的特征矩阵的channel,激活层。
class ConvBNActivation(nn.Sequential):
def __init__(self,
in_planes: int,
out_planes: int,
kernel_size: int = 3,
stride: int = 1,
groups: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None,
activation_layer: Optional[Callable[..., nn.Module]] = None):
padding = (kernel_size - 1) // 2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.SiLU # alias Swish (torch>=1.7)
super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
out_channels=out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer())
3.SE模块
传入input_channel对应MB输入的channel;expand_channel对应第一个卷积后,输出的channel,由于DW卷积通道维度不变,所以输出维度和第一个一样;squeeze_c对应我们第一个全连接层的节点个数,等于input_c/squeeze_factor,论文中默认为4。
首先,我们定义squeeze_c;全连接层用卷积代替,第一个全连接层,输入expand_channel对应第一个卷积后,输出的channel;输出squeeze_c对应我们第一个全连接层的节点个数,kernel1*1,对应的激活函数Swish-SiLU
。
第二个全连接层,输入squeeze_c对应我们第一个全连接层的节点个数,因为要尺寸对应相乘,输出expand_channel对应第一个全连接输入,kernel1*1,对应的激活函数sigmoid-Sigmoid
。
下面定义它的正向传播过程。对我们的输入进行平均池化操作,对每个channel进行全局平均池化。再分别通过全连接一,激活一,全连接二,激活二,再和x相乘,就得到输出。
class SqueezeExcitation(nn.Module):
def __init__(self,
input_c: int, # block input channel
expand_c: int, # block expand channel
squeeze_factor: int = 4):
super(SqueezeExcitation, self).__init__()
squeeze_c = input_c // squeeze_factor
self.fc1 = nn.Conv2d(expand_c, squeeze_c, 1)
self.ac1 = nn.SiLU() # alias Swish
self.fc2 = nn.Conv2d(squeeze_c, expand_c, 1)
self.ac2 = nn.Sigmoid()
def forward(self, x: Tensor) -> Tensor:
scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
scale = self.fc1(scale)
scale = self.ac1(scale)
scale = self.fc2(scale)
scale = self.ac2(scale)
return scale * x
4.InvertedResidualConfig
是每一个MBConv模块的配置参数。这里设计到这么几个参数。kernel(3或5)、input_c对应输入MB模块的channel、out_c对应输出MB模块的channel、expanded_ratio(1或6)、stride(1或2)、use_se布尔变量使用SE模块、drop_rate、index(1a、2a...)记录当前MB名称、width_coefficient网络宽度的倍率因子: float。
在我们类当中还定义了一个类方法,adjust_channels,就是调用我们第一个讲的_make_divisible函数,将channel乘上宽度倍率因子,在传入函数中,再调整到离它最近的8的整数倍。
在搭建函数过程中,输入和输出的channel都调用adjust_channels,其他的赋值过去。
class InvertedResidualConfig:
# kernel_size, in_channel, out_channel, exp_ratio, strides, use_SE, drop_connect_rate
def __init__(self,
kernel: int, # 3 or 5
input_c: int,
out_c: int,
expanded_ratio: int, # 1 or 6
stride: int, # 1 or 2
use_se: bool, # True
drop_rate: float,
index: str, # 1a, 2a, 2b, ...
width_coefficient: float):
self.input_c = self.adjust_channels(input_c, width_coefficient)
self.kernel = kernel
self.expanded_c = self.input_c * expanded_ratio
self.out_c = self.adjust_channels(out_c, width_coefficient)
self.use_se = use_se
self.stride = stride
self.drop_rate = drop_rate
self.index = index
@staticmethod
def adjust_channels(channels: int, width_coefficient: float):
return _make_divisible(channels * width_coefficient, 8)
5.InvertedResidual模块
传入参数:cnf:配置文件,还有传入norm_layer就是BN结构。
首先,判断cnf的步距是否为1,2,如果不是报错。第二,判断是否需要shortcut捷径分支,只有大小一样才有。判断步距是否为一,并且输入等于输出,都满足的情况下,才使用捷径分支。第三,定义一个有序的字典赋值给layer,激活定义为SiLU,然后依次搭建网络。
当n=1时,不要第一个升维的1x1卷积层 即Stage2中的MBConv结构都没有第一个升维的1x1卷积层(这和MobileNetV3网络类似)。要判断expanded是否等于input,等于一时说明n=1,跳转。这两个不等,构建expand_conv结构,调用我们之前实现的ConvBNActivation。然后,搭建DW卷积,调用我们之前实现的ConvBNActivation。然后,是否使用SE模块,调用SE模块。最后,搭建1*1的卷积层,其中,激活函数使用activation_layer= nn.Identity,不做处理。将我们构造的有序字典layer传给序列,就能搭建出MBConv的主分支,dropout>0的情况,才处理。
接着,看一下,正向传播过程,将输入特征矩阵x经过block主分支,再通过droupout输出,再判断是否使用捷径分支,使用的话相加返回,不使用,直接输出。
class InvertedResidual(nn.Module):
def __init__(self,
cnf: InvertedResidualConfig,
norm_layer: Callable[..., nn.Module]):
super(InvertedResidual, self).__init__()
if cnf.stride not in [1, 2]:
raise ValueError("illegal stride value.")
self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)
layers = OrderedDict()
activation_layer = nn.SiLU # alias Swish
# expand
if cnf.expanded_c != cnf.input_c:
layers.update({"expand_conv": ConvBNActivation(cnf.input_c,
cnf.expanded_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_layer)})
# depthwise
layers.update({"dwconv": ConvBNActivation(cnf.expanded_c,
cnf.expanded_c,
kernel_size=cnf.kernel,
stride=cnf.stride,
groups=cnf.expanded_c,
norm_layer=norm_layer,
activation_layer=activation_layer)})
if cnf.use_se:
layers.update({"se": SqueezeExcitation(cnf.input_c,
cnf.expanded_c)})
# project
layers.update({"project_conv": ConvBNActivation(cnf.expanded_c,
cnf.out_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity)})
self.block = nn.Sequential(layers)
self.out_channels = cnf.out_c
self.is_strided = cnf.stride > 1
# 只有在使用shortcut连接时才使用dropout层
if self.use_res_connect and cnf.drop_rate > 0:
self.dropout = DropPath(cnf.drop_rate)
else:
self.dropout = nn.Identity()
def forward(self, x: Tensor) -> Tensor:
result = self.block(x)
result = self.dropout(result)
if self.use_res_connect:
result += x
return result
6.EfficientNet
定义EfficientNet类,传入width_coefficient:宽度倍率因子,depth_coefficient:深度倍率因子,num_classes类别数,dropout_rate对应整个框架中全连接层前面的,stage9中的dropout,drop_connect_rate对应MB中最后一个随机失活比例。 block就是MBConv模块,norm_layerBN结构。default_cnf记录stage2-8的默认配置文件cnf。depth_coefficient代表depth维度上的倍率因子(仅针对Stage2到Stage8),比如,EfficientNetB0 中Stage7的L=4 ,那么在B6中就是4×2.6=10.4,接着向上取整即11.
class EfficientNet(nn.Module):
def __init__(self,
width_coefficient: float,
depth_coefficient: float,
num_classes: int = 1000,
dropout_rate: float = 0.2,
drop_connect_rate: float = 0.2,
block: Optional[Callable[..., nn.Module]] = None,
norm_layer: Optional[Callable[..., nn.Module]] = None
):
super(EfficientNet, self).__init__()
# kernel_size, in_channel, out_channel, exp_ratio, strides, use_SE, drop_connect_rate, repeats
default_cnf = [[3, 32, 16, 1, 1, True, drop_connect_rate, 1],
[3, 16, 24, 6, 2, True, drop_connect_rate, 2],
[5, 24, 40, 6, 2, True, drop_connect_rate, 2],
[3, 40, 80, 6, 2, True, drop_connect_rate, 3],
[5, 80, 112, 6, 1, True, drop_connect_rate, 3],
[5, 112, 192, 6, 2, True, drop_connect_rate, 4],
[3, 192, 320, 6, 1, True, drop_connect_rate, 1]]
def round_repeats(repeats):
"""Round number of repeats based on depth multiplier."""
return int(math.ceil(depth_coefficient * repeats))
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.1)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,
width_coefficient=width_coefficient)
# build inverted_residual_setting
bneck_conf = partial(InvertedResidualConfig,
width_coefficient=width_coefficient)
b = 0
num_blocks = float(sum(round_repeats(i[-1]) for i in default_cnf))
inverted_residual_setting = []
for stage, args in enumerate(default_cnf):
cnf = copy.copy(args)
for i in range(round_repeats(cnf.pop(-1))):
if i > 0:
# strides equal 1 except first cnf
cnf[-3] = 1 # strides
cnf[1] = cnf[2] # input_channel equal output_channel
cnf[-1] = args[-2] * b / num_blocks # update dropout ratio
index = str(stage + 1) + chr(i + 97) # 1a, 2a, 2b, ...
inverted_residual_setting.append(bneck_conf(*cnf, index))
b += 1
# create layers
layers = OrderedDict()
# first conv
layers.update({"stem_conv": ConvBNActivation(in_planes=3,
out_planes=adjust_channels(32),
kernel_size=3,
stride=2,
norm_layer=norm_layer)})
# building inverted residual blocks
for cnf in inverted_residual_setting:
layers.update({cnf.index: block(cnf, norm_layer)})
# build top
last_conv_input_c = inverted_residual_setting[-1].out_c
last_conv_output_c = adjust_channels(1280)
layers.update({"top": ConvBNActivation(in_planes=last_conv_input_c,
out_planes=last_conv_output_c,
kernel_size=1,
norm_layer=norm_layer)})
self.features = nn.Sequential(layers)
self.avgpool = nn.AdaptiveAvgPool2d(1)
classifier = []
if dropout_rate > 0:
classifier.append(nn.Dropout(p=dropout_rate, inplace=True))
classifier.append(nn.Linear(last_conv_output_c, num_classes))
self.classifier = nn.Sequential(*classifier)
# initial weights
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode="fan_out")
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self, x: Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
def forward(self, x: Tensor) -> Tensor:
return self._forward_impl(x)
以上是关于EfficientNet代码解读笔记的主要内容,如果未能解决你的问题,请参考以下文章