Faster RCNN建立数据集工厂类,并注册数据集类
Posted ssozhno-1
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Faster RCNN建立数据集工厂类,并注册数据集类相关的知识,希望对你有一定的参考价值。
在Faster RCNN中,首先使用基类imdbs创建一个工厂类。然后建立自己的具体数据集的类。
然后可以将类当做为函数,使用lambda方法进行调用实例化。
在这里,我们讲解一下lambda方法:
var = (lamdba para1,para2 = func(para1,para2))
其中,var变量存储的不是func的返回。而是func本身,如果我们输出var:
print(var) #<function <lambda> at 0x000001B46BF97C80>
而如果我们想得到var的结果。则应该后面加上括号和参数
print(var(para1,para2)) # 这个时候,如果func是函数 则var = return 如果 func是一个类,对类进行了实例化
而 在datasets中的factory.py文件中。是使用lambda 函数把该文件夹下的各个数据集的子类进行注册,并放在字典__sets中,sets中Key value值分别是数据集的名字,已经数据集对应类所在的一个lambda 所在地址。
for year in [‘2015‘]: for split in [‘test‘, ‘test-dev‘]: name = ‘coco__‘.format(year, split) __sets[name] = (lambda split=split, year=year: coco(split, year))
注意 例如上面一段代码,是直接给定参数单位执行的类,即这里的__sets[name]的value仍然是一个指针。而如果在后面加了一个括号,则相当于进行了给定参数的实例化,即执行了__init__()函数。
如果把上面进行拆分相当于执行了一下几个语句:
__sets[‘coco_2015_test‘] = (lambda split=‘test‘,year=‘2015‘:coco(split,year)) __sets[‘coco_2015_test-dev‘] = (lambda split=‘test-dev‘,year=‘2015‘:coco(split,year))
而如果__sets[name]()则相当于执行了:
__sets[‘coco_2015_test‘]() = coco(‘test‘,‘2015‘)
当我们得到原始的imdb数据集后,即对数据集进行实例化:
imdb = get_imdb(args.imdb_name)
(效果如上面讲解的lambda函数实例化一个工厂数据集类,实例化名字就叫imdb)。而实例化该类时含有属性roidb,如下:
@property def roidb(self): # A roidb is a list of dictionaries, each with the following keys: # boxes # gt_overlaps # gt_classes # flipped if self._roidb is not None: return self._roidb self._roidb = self.roidb_handler() return self._roidb
此时实例化的数据集类,并不能直接拿来训练,需要对roidb进行进一步的丰富,即执行
def get_training_roidb(imdb): # ... rdl_roidb.prepare_roidb(imdb) # 一般就是执行这个。 print(‘done‘) return imdb.roidb
具体而言就是其中的prepare_roidb方法:
def prepare_roidb(imdb): """Enrich the imdb‘s roidb by adding some derived quantities that are useful for training. This function precomputes the maximum overlap, taken over ground-truth boxes, between each ROI and each ground-truth box. The class with maximum overlap is also recorded. """ """ before this file, roidb = imdb.__roidb=(gt_roidb/_roidb_handler/self._roidb = self.roidb_handler()) should be [,,] roidb[index][‘image‘]=path roidb[index][‘width‘]... """ sizes = [PIL.Image.open(imdb.image_path_at(i)).size for i in xrange(imdb.num_images)] roidb = imdb.roidb for i in xrange(len(imdb.image_index)): roidb[i][‘image‘] = imdb.image_path_at(i) # 这个是给list中的element(字典)添加KV roidb[i][‘width‘] = sizes[i][0] roidb[i][‘height‘] = sizes[i][1] # need gt_overlaps as a dense array for argmax gt_overlaps = roidb[i][‘gt_overlaps‘].toarray() # max overlap with gt over classes (columns) max_overlaps = gt_overlaps.max(axis=1) # gt class that had the max overlap max_classes = gt_overlaps.argmax(axis=1) roidb[i][‘max_classes‘] = max_classes roidb[i][‘max_overlaps‘] = max_overlaps # sanity checks # max overlap of 0 => class should be zero (background) zero_inds = np.where(max_overlaps == 0)[0] assert all(max_classes[zero_inds] == 0) # max overlap > 0 => class should not be zero (must be a fg class) nonzero_inds = np.where(max_overlaps > 0)[0] assert all(max_classes[nonzero_inds] != 0)
注意,我们已知的原始的roidb是一个list,list的每个元素是一个字典。
因此,这里的range循环添加的就是对每个元素中的字典进行KV添加。对于Faster RCNN中元素添加gt_roidb到training_roidb的区别。如上,对每一个图片对应的Index中添加KV。下面我给出了一个简单的类似的例子:
a =list() a1 = ‘name‘:‘a1‘,‘num‘:5 a2 = ‘name‘:‘a2‘,‘num‘:6 a3 = ‘name‘:‘a3‘,‘num‘:7 a.append(a1) a.append(a2) a.append(a3) print(a) # [‘name‘: ‘a1‘, ‘num‘: 5, ‘name‘: ‘a2‘, ‘num‘: 6, ‘name‘: ‘a3‘, ‘num‘: 7] a[2][‘width‘]=56 print(a) # [‘name‘: ‘a1‘, ‘num‘: 5, ‘name‘: ‘a2‘, ‘num‘: 6, ‘name‘: ‘a3‘, ‘num‘: 7, ‘width‘: 56]
以上 就是完整的imdb类下更新后的roidb属性了。
然后获取output_dir,device_name,network
最后开始训练:
train_net(network, imdb, roidb, output_dir, pretrained_model=args.pretrained_model, max_iters=args.max_iters)
整个训练放在train.py下的类SolverWrapper中的train_model方法中。
整个方法包括:
1.创建一个ROIDataLayer
data_layer = get_data_layer(self.roidb, self.imdb.num_classes) #相当于data_layer = RoIDataLayer(roidb, num_classes)而RoIDataLayer是roi_data_layer/layer.py中的一个类。
2.设置损失函数final loss:
loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
这个损失函数是一个总的损失函数,包括RPN的分类损失函数(rpn_cross_entropy),rpn的bounding bbox回归的L1 loss(rpn_smooth_l1),R-CNN分类损失函数(cross_entropy)和R-CNN的bbox 回归L1 loss(loss_box)。
2.1RPN分类损失函数(classification loss)
输入:rpn_cls_score,rpn_label
输出:交叉熵,rpn_cross_entropy
# get_output方法在基类network中,返回字典self.layer key对应的Value rpn_cls_score = tf.reshape(self.net.get_output(‘rpn_cls_score_reshape‘),[-1,2]) rpn_label = tf.reshape(self.net.get_output(‘rpn-data‘)[0],[-1]) rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score,tf.where(tf.not_equal(rpn_label,-1))),[-1,2]) rpn_label = tf.reshape(tf.gather(rpn_label,tf.where(tf.not_equal(rpn_label,-1))),[-1]) rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))
注意这里的rpn_label和rpn_cls_score都是self.net = network中的方法。而rpn_cross_entropy则是包括3步骤,前两步骤为里面tf.nn.sparse_softmax_cross_entropy_with_logits函数执行,包括:
[1]计算logits=rpn_cls_score的softmax值,
[2]计算softmax和labels之间的entropy
[3]由tf.reduce_mean函数执行,即对交叉熵(entropy)求平均。
2.2rpn的bounding bbox回归的L1 loss(rpn_smooth_l1)
输入:rpn_bbox_pred,rpn_bbox_pred,rpn_bbox_inside_weights,rpn_bbox_outside_weights
输出:rpn_smooth_l1 --> rpn_loss_box(求平均,tf.reduce_mean)
# bounding box regression L1 loss rpn_bbox_pred = self.net.get_output(‘rpn_bbox_pred‘) rpn_bbox_targets = tf.transpose(self.net.get_output(‘rpn-data‘)[1],[0,2,3,1]) rpn_bbox_inside_weights = tf.transpose(self.net.get_output(‘rpn-data‘)[2],[0,2,3,1]) rpn_bbox_outside_weights = tf.transpose(self.net.get_output(‘rpn-data‘)[3],[0,2,3,1]) rpn_smooth_l1 = self._modified_smooth_l1(3.0, rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights) rpn_loss_box = tf.reduce_mean(tf.reduce_sum(rpn_smooth_l1, reduction_indices=[1, 2, 3]))
同2.1一样,还是先从net中提取预测值,通过转置(tf.transpose),以及自定义方法计算smooth_l1:
def _modified_smooth_l1(self, sigma, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights): """ ResultLoss = outside_weights * SmoothL1(inside_weights * (bbox_pred - bbox_targets)) SmoothL1(x) = 0.5 * (sigma * x)^2, if |x| < 1 / sigma^2 |x| - 0.5 / sigma^2, otherwise """
另外两个求解R-CNN的交叉熵和回归损失函数的方法类似。
3.设置optimizer,学习速率learning rate
4.对变量进行初始化
# 首先对模型进行初始化 sess.run(tf.global_variables_initializer()) # 然后判断是否具有预训练模型,如果有把上面的初始化模型覆盖掉 if self.pretrained_model is not None: print (‘Loading pretrained model ‘ ‘weights from :s‘).format(self.pretrained_model) self.net.load(self.pretrained_model, sess, self.saver, True) # 设置snapshot的迭代值 last_snapshot_iter = -1
5.使用for训练对模型进行训练,主要包括三个部分,训练前,训练模型,训练善后(这里我们假设获取的net是VGGnet_train类的实例化):
for iter in range(max_iters): # 训练前,包括获取一个batch,对随机梯度下降进行更新,即设置下列红色字体参数 # get one batch blobs = data_layer.forward() # Make one SGD update feed_dict=self.net.data: blobs[‘data‘], self.net.im_info: blobs[‘im_info‘], self.net.keep_prob: 0.5, self.net.gt_boxes: blobs[‘gt_boxes‘] run_options = None run_metadata = None if cfg.TRAIN.DEBUG_TIMELINE: run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() # 训练中,计时,使用sess.run训练,计时 timer.tic() rpn_loss_cls_value, rpn_loss_box_value,loss_cls_value, loss_box_value, _ = sess.run([rpn_cross_entropy, rpn_loss_box, cross_entropy, loss_box, train_op], feed_dict=feed_dict, options=run_options, run_metadata=run_metadata) timer.toc() # 训练结束,在指定迭代次数下输出,snapshot if cfg.TRAIN.DEBUG_TIMELINE: trace = timeline.Timeline(step_stats=run_metadata.step_stats) trace_file = open(str(long(time.time() * 1000)) + ‘-train-timeline.ctf.json‘, ‘w‘) trace_file.write(trace.generate_chrome_trace_format(show_memory=False)) trace_file.close() if (iter+1) % (cfg.TRAIN.DISPLAY) == 0: print ‘iter: %d / %d, total loss: %.4f, rpn_loss_cls: %.4f, rpn_loss_box: %.4f, loss_cls: %.4f, loss_box: %.4f, lr: %f‘% (iter+1, max_iters, rpn_loss_cls_value + rpn_loss_box_value + loss_cls_value + loss_box_value ,rpn_loss_cls_value, rpn_loss_box_value,loss_cls_value, loss_box_value, lr.eval()) print ‘speed: :.3fs / iter‘.format(timer.average_time) if (iter+1) % cfg.TRAIN.SNAPSHOT_ITERS == 0: last_snapshot_iter = iter self.snapshot(sess, iter)
这个地方,我们主要关注一下sess.run这方法:
run( fetches, feed_dict=None, options=None, run_metadata=None )
运行操作, 并评估获取(fetches)的张量。
Runs operations and evaluates tensors in fetches.
此方法运行一“步”TensorFlow的计算图,通过运行必要的计算图片段来执行每个操作并评估获取(fetches)的每个Tensor,将feed_dict中的值替换为相应的输入值。
This method runs one "step" of TensorFlow computation, by running the necessary graph fragment to execute every Operation and evaluate every Tensor in fetches, substituting the values in feed_dict for the corresponding input values.
The fetches 参数可能是一个单一计算图元素,或者是任意一个嵌套的list,tuple,nametuple,dict 或包含了计算图在该元素那一层的OrderedDict。一个计算图元素可以是以下的任何一个类型:
A tf.Operation.相应的fetches将会是None
A tf.Tensor。相应的fetches将是包含该张量值的numpy ndarray
A tf.Sparase Tensor.相应的fetches将是包含该稀疏张量值的tf.compat.v1.SparseTensorValue。
A get_tensor_handle op.相应的fetches将是包含该张量的句柄的numpy ndarray。
A string which is the name of a tensor or operation in the graph。
而本代码中的fetches=[rpn_cross_entropy, rpn_loss_box, cross_entropy, loss_box, train_op]。
以上是关于Faster RCNN建立数据集工厂类,并注册数据集类的主要内容,如果未能解决你的问题,请参考以下文章