FPN在faster_rcnn中实现细节代码说明

Posted ywheunji

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了FPN在faster_rcnn中实现细节代码说明相关的知识,希望对你有一定的参考价值。

代码参考自:https://github.com/DetectionTeamUCAS/FPN_Tensorflow

主要分析fpn多层金字塔结构的输出如何进行预测。

FPN金字塔结构插入在faster_rcnn的特征图获取之后,在rpn结构之前。

具体代码如下所示:

代码结构追溯至FPN部分:

train.py(line 46 :build_whole_detection_network函数)

    build_whole_network(line 372:  build_whole_detection_network函数)

    按照注释分别查看7个步骤:

    1. build base network
    
    2. build rpn
 
    3. generate_anchors
    4. postprocess rpn proposals. such as: decode, clip, NMS(所得第一次框处理)
    5. build Fast-RCNN(5,roipooling  6,inference rois to obtain fc   7,cls reg)
    6. postprocess_fastrcnn(最后框处理)
    FPN部分在build base network中,得到的plist即为金字塔特征图的集合
        build_base_network一步一步回溯找到原函数resnet_base(fpn操作在这里,如下代码)
  1 def resnet_base(img_batch, scope_name, is_training=True):
  2     ‘‘‘
  3     this code is derived from light-head rcnn.
  4     https://github.com/zengarden/light_head_rcnn
  5 
  6     It is convenient to freeze blocks. So we adapt this mode.
  7     ‘‘‘
  8     if scope_name == resnet_v1_50:
  9         middle_num_units = 6
 10     elif scope_name == resnet_v1_101:
 11         middle_num_units = 23
 12     else:
 13         raise NotImplementedError(We only support resnet_v1_50 or resnet_v1_101. Check your network name....yjr)
 14 
 15     blocks = [resnet_v1_block(block1, base_depth=64, num_units=3, stride=2),
 16               resnet_v1_block(block2, base_depth=128, num_units=4, stride=2),
 17               resnet_v1_block(block3, base_depth=256, num_units=middle_num_units, stride=2),
 18               resnet_v1_block(block4, base_depth=512, num_units=3, stride=1)]
 19     # when use fpn . stride list is [1, 2, 2]
 20 
 21     with slim.arg_scope(resnet_arg_scope(is_training=False)):
 22         with tf.variable_scope(scope_name, scope_name):
 23             # Do the first few layers manually, because ‘SAME‘ padding can behave inconsistently
 24             # for images of different sizes: sometimes 0, sometimes 1
 25             net = resnet_utils.conv2d_same(
 26                 img_batch, 64, 7, stride=2, scope=conv1)
 27             net = tf.pad(net, [[0, 0], [1, 1], [1, 1], [0, 0]])
 28             net = slim.max_pool2d(
 29                 net, [3, 3], stride=2, padding=VALID, scope=pool1)
 30 
 31     not_freezed = [False] * cfgs.FIXED_BLOCKS + (4-cfgs.FIXED_BLOCKS)*[True]
 32     # Fixed_Blocks can be 1~3
 33 
 34     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[0]))):
 35         C2, end_points_C2 = resnet_v1.resnet_v1(net,
 36                                                 blocks[0:1],
 37                                                 global_pool=False,
 38                                                 include_root_block=False,
 39                                                 scope=scope_name)
 40 
 41     # C2 = tf.Print(C2, [tf.shape(C2)], summarize=10, message=‘C2_shape‘)
 42     add_heatmap(C2, name=Layer2/C2_heat)
 43 
 44     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[1]))):
 45         C3, end_points_C3 = resnet_v1.resnet_v1(C2,
 46                                                 blocks[1:2],
 47                                                 global_pool=False,
 48                                                 include_root_block=False,
 49                                                 scope=scope_name)
 50 
 51     # C3 = tf.Print(C3, [tf.shape(C3)], summarize=10, message=‘C3_shape‘)
 52     add_heatmap(C3, name=Layer3/C3_heat)
 53     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[2]))):
 54         C4, end_points_C4 = resnet_v1.resnet_v1(C3,
 55                                                 blocks[2:3],
 56                                                 global_pool=False,
 57                                                 include_root_block=False,
 58                                                 scope=scope_name)
 59 
 60     add_heatmap(C4, name=Layer4/C4_heat)
 61 
 62     # C4 = tf.Print(C4, [tf.shape(C4)], summarize=10, message=‘C4_shape‘)
 63     with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
 64         C5, end_points_C5 = resnet_v1.resnet_v1(C4,
 65                                                 blocks[3:4],
 66                                                 global_pool=False,
 67                                                 include_root_block=False,
 68                                                 scope=scope_name)
 69     # C5 = tf.Print(C5, [tf.shape(C5)], summarize=10, message=‘C5_shape‘)
 70     add_heatmap(C5, name=Layer5/C5_heat)
 71 
 72     feature_dict = C2: end_points_C2[/block1/unit_2/bottleneck_v1.format(scope_name)],
 73                     C3: end_points_C3[/block2/unit_3/bottleneck_v1.format(scope_name)],
 74                     C4: end_points_C4[/block3/unit_/bottleneck_v1.format(scope_name, middle_num_units - 1)],
 75                     C5: end_points_C5[/block4/unit_3/bottleneck_v1.format(scope_name)],
 76                     # ‘C5‘: end_points_C5[‘/block4‘.format(scope_name)],
 77                     
 78 
 79     # feature_dict = ‘C2‘: C2,
 80     #                 ‘C3‘: C3,
 81     #                 ‘C4‘: C4,
 82     #                 ‘C5‘: C5
 83 
 84     pyramid_dict = 
 85     with tf.variable_scope(build_pyramid):
 86         with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(cfgs.WEIGHT_DECAY),
 87                             activation_fn=None, normalizer_fn=None):
 88 
 89             P5 = slim.conv2d(C5,
 90                              num_outputs=256,
 91                              kernel_size=[1, 1],
 92                              stride=1, scope=build_P5)
 93             if "P6" in cfgs.LEVLES:
 94                 P6 = slim.max_pool2d(P5, kernel_size=[1, 1], stride=2, scope=build_P6)
 95                 pyramid_dict[P6] = P6
 96 
 97             pyramid_dict[P5] = P5
 98 
 99             for level in range(4, 1, -1):  # build [P4, P3, P2]
100 
101                 pyramid_dict[P%d % level] = fusion_two_layer(C_i=feature_dict["C%d" % level],
102                                                                P_j=pyramid_dict["P%d" % (level+1)],
103                                                                scope=build_P%d % level)
104             for level in range(4, 1, -1):
105                 pyramid_dict[P%d % level] = slim.conv2d(pyramid_dict[P%d % level],
106                                                           num_outputs=256, kernel_size=[3, 3], padding="SAME",
107                                                           stride=1, scope="fuse_P%d" % level)
108     for level in range(5, 1, -1):
109         add_heatmap(pyramid_dict[P%d % level], name=Layer%d/P%d_heat % (level, level))
110 
111     # return [P2, P3, P4, P5, P6]
112     print("we are in Pyramid::-======>>>>")
113     print(cfgs.LEVLES)
114     print("base_anchor_size are: ", cfgs.BASE_ANCHOR_SIZE_LIST)
115     print(20 * "__")
116     return [pyramid_dict[level_name] for level_name in cfgs.LEVLES]
117     # return pyramid_dict  # return the dict. And get each level by key. But ensure the levels are consitant
118     # return list rather than dict, to avoid dict is unordered

观察原特征图的结构C2,C3,C4,C5,  以及特征金字塔的结构P5,P4,P3,P2,为5层的特征金字塔结构 。

操作如图:

技术图片

金字塔结构的总层数为(p5,p6,p4,p3,p2)

P5 = conv2d(C5)                 (因金字塔特征图每层的构造是 上面一层的2x upsaming 和左边的1*1conv后的结果相加)

P6 = max_pool(P5)

 

核心的融合部分在下面代码中显示:

P4 = C4 + P5

P3 = C3 + P4

P2 = C2 + P3

1             for level in range(4, 1, -1):  # build [P4, P3, P2]
2 
3                 pyramid_dict[P%d % level] = fusion_two_layer(C_i=feature_dict["C%d" % level],
4                                                                P_j=pyramid_dict["P%d" % (level+1)],
5                                                                scope=build_P%d % level)

 

最后的P_LIST共有:  LEVLES = [‘P2‘, ‘P3‘, ‘P4‘, ‘P5‘, ‘P6‘]层级

 

得到的特征图为多个,后面的识别过程,都是分别对于每一层做识别,设置循环在p_list中;

多个金字塔特征图与原先一个特征图在roipooling之后的处理过程都一样了;

原来是一个map进行predict产生一些proposals;经过处理之后,送入全连接层之后进行cls  and reg;

FPN现在是多个map进行predict产生更多不同尺度的proposals,经过处理之后,也是送入全连接层之后进行cls and reg。

 

 

 

 



      

以上是关于FPN在faster_rcnn中实现细节代码说明的主要内容,如果未能解决你的问题,请参考以下文章

设计模式-Template(行为模式) 采用 继承的方式 将算法封装在抽象基类中,在子类中实现细节。利用面向对象中的多态实现算法实现细节和高层接口的松耦合。

Object Detection faster_rcnn训练日志

在 Python 中实现 XOR

如何在 iPhone 应用中实现谷歌聊天

如何在onTouch中实现双击事件?要求代码尽可能简单

在mysql中实现split函数的几种方法