深度学习模型参数量以及FLOPs计算工具
Posted 非晚非晚
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深度学习模型参数量以及FLOPs计算工具相关的知识,希望对你有一定的参考价值。
文章目录
0. 相关概念的理解
FLOPS
:注意全大写,是floating point operations per second的缩写,意指每秒浮点运算次数,理解为计算速度。是一个衡量硬件性能的指标。FLOPs
: 注意s小写,是浮点运算量floating point operations的缩写(s表复数),意指浮点运算数,理解为计算量。可以用来衡量算法/模型的复杂度。
下面的模型以vgg16为例进行介绍。
1. pytorch统计参数
- 方法一:
(1)统计所有参数,包括可学习和不学习的
sum(p.numel() for p in model.parameters())
(2)只统计可学习的参数
sum(p.numel() for p in model.parameters() if p.requires_grad)
举例
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
# from torchstat import stat
# stat(model.to(device), (3,224,224))
a = sum(p.numel() for p in model.parameters())
b = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(a)
print(b)
输出:
138357544
138357544
- 方法二:
params = list(model.parameters())
num_params = 0
for param in params:
curr_num_params = 1
for size_count in param.size():
curr_num_params *= size_count
num_params += curr_num_params
print("total number of parameters: " + str(num_params))
2. torchsummary
- 安装
pip install torchsummary
- 使用
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
import torchsummary
torchsummary.summary(model.cuda(),(3,244,244))
输出:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 244, 244] 1,792
ReLU-2 [-1, 64, 244, 244] 0
Conv2d-3 [-1, 64, 244, 244] 36,928
ReLU-4 [-1, 64, 244, 244] 0
MaxPool2d-5 [-1, 64, 122, 122] 0
Conv2d-6 [-1, 128, 122, 122] 73,856
ReLU-7 [-1, 128, 122, 122] 0
Conv2d-8 [-1, 128, 122, 122] 147,584
ReLU-9 [-1, 128, 122, 122] 0
MaxPool2d-10 [-1, 128, 61, 61] 0
Conv2d-11 [-1, 256, 61, 61] 295,168
ReLU-12 [-1, 256, 61, 61] 0
Conv2d-13 [-1, 256, 61, 61] 590,080
ReLU-14 [-1, 256, 61, 61] 0
Conv2d-15 [-1, 256, 61, 61] 590,080
ReLU-16 [-1, 256, 61, 61] 0
MaxPool2d-17 [-1, 256, 30, 30] 0
Conv2d-18 [-1, 512, 30, 30] 1,180,160
ReLU-19 [-1, 512, 30, 30] 0
Conv2d-20 [-1, 512, 30, 30] 2,359,808
ReLU-21 [-1, 512, 30, 30] 0
Conv2d-22 [-1, 512, 30, 30] 2,359,808
ReLU-23 [-1, 512, 30, 30] 0
MaxPool2d-24 [-1, 512, 15, 15] 0
Conv2d-25 [-1, 512, 15, 15] 2,359,808
ReLU-26 [-1, 512, 15, 15] 0
Conv2d-27 [-1, 512, 15, 15] 2,359,808
ReLU-28 [-1, 512, 15, 15] 0
Conv2d-29 [-1, 512, 15, 15] 2,359,808
ReLU-30 [-1, 512, 15, 15] 0
MaxPool2d-31 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-32 [-1, 512, 7, 7] 0
Linear-33 [-1, 4096] 102,764,544
ReLU-34 [-1, 4096] 0
Dropout-35 [-1, 4096] 0
Linear-36 [-1, 4096] 16,781,312
ReLU-37 [-1, 4096] 0
Dropout-38 [-1, 4096] 0
Linear-39 [-1, 1000] 4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.68
Forward/backward pass size (MB): 258.51
Params size (MB): 527.79
Estimated Total Size (MB): 786.98
----------------------------------------------------------------
3. thop
- 安装
pip install thop
- 使用
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
from thop import profile
from thop import clever_format
my_input = torch.zeros((1,3,224,224)).to(device)
flops, params = profile(model.to(device), inputs = (my_input, ))
flops, parsms = clever_format([flops, params], '%.3f')
print(flops,params)
输出:
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.pooling.MaxPool2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.container.Sequential'>.
[INFO] Register count_adap_avgpool() for <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
15.470G 138357544.0
4. torchstat
torchstat工具的输出比较多,推荐使用。
- 安装
pip install torchstat
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
from torchstat import stat
stat(model.to(device), (3,224,224))
输出:
[MAdd]: AdaptiveAvgPool2d is not supported!
[Flops]: AdaptiveAvgPool2d is not supported!
[Memory]: AdaptiveAvgPool2d is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
module name input shape output shape params memory(MB) MAdd Flops MemRead(B) MemWrite(B) duration[%] MemR+W(B)
0 features.0 3 224 224 64 224 224 1792.0 12.25 173,408,256.0 89,915,392.0 609280.0 12845056.0 3.15% 13454336.0
1 features.1 64 224 224 64 224 224 0.0 12.25 3,211,264.0 3,211,264.0 12845056.0 12845056.0 0.67% 25690112.0
2 features.2 64 224 224 64 224 224 36928.0 12.25 3,699,376,128.0 1,852,899,328.0 12992768.0 12845056.0 10.89% 25837824.0
3 features.3 64 224 224 64 224 224 0.0 12.25 3,211,264.0 3,211,264.0 12845056.0 12845056.0 0.65% 25690112.0
4 features.4 64 224 224 64 112 112 0.0 3.06 2,408,448.0 3,211,264.0 12845056.0 3211264.0 2.21% 16056320.0
5 features.5 64 112 112 128 112 112 73856.0 6.12 1,849,688,064.0 926,449,664.0 3506688.0 6422528.0 5.10% 9929216.0
6 features.6 128 112 112 128 112 112 0.0 6.12 1,605,632.0 1,605,632.0 6422528.0 6422528.0 0.09% 12845056.0
7 features.7 128 112 112 128 112 112 147584.0 6.12 3,699,376,128.0 1,851,293,696.0 7012864.0 6422528.0 9.20% 13435392.0
8 features.8 128 112 112 128 112 112 0.0 6.12 1,605,632.0 1,605,632.0 6422528.0 6422528.0 0.09% 12845056.0
9 features.9 128 112 112 128 56 56 0.0 1.53 1,204,224.0 1,605,632.0 6422528.0 1605632.0 1.07% 8028160.0
10 features.10 128 56 56 256 56 56 295168.0 3.06 1,849,688,064.0 925,646,848.0 2786304.0 3211264.0 4.85% 5997568.0
11 features.11 256 56 56 256 56 56 0.0 3.06 802,816.0 802,816.0 3211264.0 3211264.0 0.12% 6422528.0
12 features.1以上是关于深度学习模型参数量以及FLOPs计算工具的主要内容,如果未能解决你的问题,请参考以下文章
神经网络学习小记录72——Parameters参数量FLOPs浮点运算次数FPS每秒传输帧数等计算量衡量指标解析