深度学习模型参数量以及FLOPs计算工具

Posted 非晚非晚

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深度学习模型参数量以及FLOPs计算工具相关的知识,希望对你有一定的参考价值。

文章目录

0. 相关概念的理解

  • FLOPS:注意全大写,是floating point operations per second的缩写,意指每秒浮点运算次数,理解为计算速度。是一个衡量硬件性能的指标。
  • FLOPs: 注意s小写,是浮点运算量floating point operations的缩写(s表复数),意指浮点运算数,理解为计算量。可以用来衡量算法/模型的复杂度。

下面的模型以vgg16为例进行介绍。

1. pytorch统计参数

  • 方法一:

(1)统计所有参数,包括可学习和不学习的

sum(p.numel() for p in model.parameters())

(2)只统计可学习的参数

sum(p.numel() for p in model.parameters() if p.requires_grad)

举例

import torch
import torchvision

model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)

# from torchstat import stat

# stat(model.to(device), (3,224,224))

a = sum(p.numel() for p in model.parameters())

b = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(a)

print(b)

输出:

138357544
138357544
  • 方法二:
params = list(model.parameters())
num_params = 0
for param in params:
    curr_num_params = 1
    for size_count in param.size():
        curr_num_params *= size_count
    num_params += curr_num_params
print("total number of parameters: " + str(num_params))

2. torchsummary

  • 安装
pip install torchsummary
  • 使用
import torch
import torchvision

model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)

import torchsummary

torchsummary.summary(model.cuda(),(3,244,244))

输出:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 244, 244]           1,792
              ReLU-2         [-1, 64, 244, 244]               0
            Conv2d-3         [-1, 64, 244, 244]          36,928
              ReLU-4         [-1, 64, 244, 244]               0
         MaxPool2d-5         [-1, 64, 122, 122]               0
            Conv2d-6        [-1, 128, 122, 122]          73,856
              ReLU-7        [-1, 128, 122, 122]               0
            Conv2d-8        [-1, 128, 122, 122]         147,584
              ReLU-9        [-1, 128, 122, 122]               0
        MaxPool2d-10          [-1, 128, 61, 61]               0
           Conv2d-11          [-1, 256, 61, 61]         295,168
             ReLU-12          [-1, 256, 61, 61]               0
           Conv2d-13          [-1, 256, 61, 61]         590,080
             ReLU-14          [-1, 256, 61, 61]               0
           Conv2d-15          [-1, 256, 61, 61]         590,080
             ReLU-16          [-1, 256, 61, 61]               0
        MaxPool2d-17          [-1, 256, 30, 30]               0
           Conv2d-18          [-1, 512, 30, 30]       1,180,160
             ReLU-19          [-1, 512, 30, 30]               0
           Conv2d-20          [-1, 512, 30, 30]       2,359,808
             ReLU-21          [-1, 512, 30, 30]               0
           Conv2d-22          [-1, 512, 30, 30]       2,359,808
             ReLU-23          [-1, 512, 30, 30]               0
        MaxPool2d-24          [-1, 512, 15, 15]               0
           Conv2d-25          [-1, 512, 15, 15]       2,359,808
             ReLU-26          [-1, 512, 15, 15]               0
           Conv2d-27          [-1, 512, 15, 15]       2,359,808
             ReLU-28          [-1, 512, 15, 15]               0
           Conv2d-29          [-1, 512, 15, 15]       2,359,808
             ReLU-30          [-1, 512, 15, 15]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-32            [-1, 512, 7, 7]               0
           Linear-33                 [-1, 4096]     102,764,544
             ReLU-34                 [-1, 4096]               0
          Dropout-35                 [-1, 4096]               0
           Linear-36                 [-1, 4096]      16,781,312
             ReLU-37                 [-1, 4096]               0
          Dropout-38                 [-1, 4096]               0
           Linear-39                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.68
Forward/backward pass size (MB): 258.51
Params size (MB): 527.79
Estimated Total Size (MB): 786.98
----------------------------------------------------------------

3. thop

  • 安装
pip install thop
  • 使用
import torch
import torchvision

model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)

from thop import profile
from thop import clever_format

my_input = torch.zeros((1,3,224,224)).to(device)
flops, params = profile(model.to(device), inputs = (my_input, ))
flops, parsms = clever_format([flops, params], '%.3f')
print(flops,params)

输出:

[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.pooling.MaxPool2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.container.Sequential'>.
[INFO] Register count_adap_avgpool() for <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
15.470G 138357544.0

4. torchstat

torchstat工具的输出比较多,推荐使用。

  • 安装
pip install torchstat
import torch
import torchvision

model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)

from torchstat import stat

stat(model.to(device), (3,224,224))

输出:

[MAdd]: AdaptiveAvgPool2d is not supported!
[Flops]: AdaptiveAvgPool2d is not supported!
[Memory]: AdaptiveAvgPool2d is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
        module name  input shape output shape       params memory(MB)              MAdd             Flops   MemRead(B)  MemWrite(B) duration[%]    MemR+W(B)
0        features.0    3 224 224   64 224 224       1792.0      12.25     173,408,256.0      89,915,392.0     609280.0   12845056.0       3.15%   13454336.0
1        features.1   64 224 224   64 224 224          0.0      12.25       3,211,264.0       3,211,264.0   12845056.0   12845056.0       0.67%   25690112.0
2        features.2   64 224 224   64 224 224      36928.0      12.25   3,699,376,128.0   1,852,899,328.0   12992768.0   12845056.0      10.89%   25837824.0
3        features.3   64 224 224   64 224 224          0.0      12.25       3,211,264.0       3,211,264.0   12845056.0   12845056.0       0.65%   25690112.0
4        features.4   64 224 224   64 112 112          0.0       3.06       2,408,448.0       3,211,264.0   12845056.0    3211264.0       2.21%   16056320.0
5        features.5   64 112 112  128 112 112      73856.0       6.12   1,849,688,064.0     926,449,664.0    3506688.0    6422528.0       5.10%    9929216.0
6        features.6  128 112 112  128 112 112          0.0       6.12       1,605,632.0       1,605,632.0    6422528.0    6422528.0       0.09%   12845056.0
7        features.7  128 112 112  128 112 112     147584.0       6.12   3,699,376,128.0   1,851,293,696.0    7012864.0    6422528.0       9.20%   13435392.0
8        features.8  128 112 112  128 112 112          0.0       6.12       1,605,632.0       1,605,632.0    6422528.0    6422528.0       0.09%   12845056.0
9        features.9  128 112 112  128  56  56          0.0       1.53       1,204,224.0       1,605,632.0    6422528.0    1605632.0       1.07%    8028160.0
10      features.10  128  56  56  256  56  56     295168.0       3.06   1,849,688,064.0     925,646,848.0    2786304.0    3211264.0       4.85%    5997568.0
11      features.11  256  56  56  256  56  56          0.0       3.06         802,816.0         802,816.0    3211264.0    3211264.0       0.12%    6422528.0
12      features.1

以上是关于深度学习模型参数量以及FLOPs计算工具的主要内容,如果未能解决你的问题,请参考以下文章

卷积神经网络模型参数量和运算量计算方法

神经网络学习小记录72——Parameters参数量FLOPs浮点运算次数FPS每秒传输帧数等计算量衡量指标解析

计算模型的GFLOPs和参数量 & 举例VGG16和DETR

深度学习中的FLOPs是什么?如何计算的?

神经网络模型复杂度分析

算法基础TOPSTOPS和FLOPs的区别