Fast R-CNN

2018-06-18  本文已影响18人  初七123

Introduction

R-CNN 多阶段的训练以及测试速度非常慢
SPPnet 借助空间金字塔池化明显的提升了效率

我们提出的方案改进了R-CNN以及SPPnet
1.Higher detection quality (mAP) than R-CNN, SPPnet
2.Training is single-stage, using a multi-task loss
3.Training can update all network layers
4.No disk storage is required for feature caching

Fast R-CNN architecture and training

The RoI pooling layer uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H ×W (e.g., 7×7), where H and W are layer hyper-parameters that are independent of any particular RoI

RoI池化把特征转换为固定的空间大小

In Fast RCNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image.

Fast R-CNN 使用层次采样的方法,先选N张图片,然后每张图片选R/N个样本作为batch

Mini-batch sampling

  1. Each SGD mini-batch is constructed from N = 2 images, chosen uniformly at random
  2. We use mini-batches of size R = 128, sampling 64 RoIs from each image
  3. we take 25% of the RoIs from object proposals that have intersection over union (IoU) overlap with a groundtruth bounding box of at least 0.5.
  4. The remaining RoIs are sampled from object proposals that have a maximum IoU with ground truth in the interval [0.1,0.5), following [11].
  5. The lower threshold of 0.1 appears to act as a heuristic for hard example mining [8]
  6. During training, images are horizontally flipped with probability 0.5

Multi-task loss
Each training RoI is labeled with a ground-truth class u and a ground-truth bounding-box regression target v. We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression.

每个训练样本都标注了真实的边界,我们使用多任务损失来训练分类和回归网络。

u 表示样本中是否存在目标
t 是预测坐标
v 是标注坐标
λ 是平衡参数,论文中用的 λ =1

Back-propagation through RoI pooling layers

In words, for each mini-batch RoI r and for each pooling output unit yrj, the partial derivative ∂L/∂ yrj is accumulated if i is the argmax selected for yrj by max pooling. In back-propagation,the partial derivatives ∂L/∂ yrj arealready computed by the backwards function of the layer on top of the RoI pooling layer.

Fast R-CNN detection

Large fully connected layers are easily accelerated by compressing them with truncated SVD

In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as

Main results

  1. State-of-the-art mAP on VOC07, 2010, and 2012
  2. Fast training and testing compared to R-CNN, SPPnet
  3. Fine-tuning conv layers in VGG16 improves mAP

精度


时间


Design evaluation

Does multi-task training help?


Scale invariance: to brute force or finesse?


image.png

Do SVMs outperform softmax?


Do we need more training data?
Yes

Are more proposals always better?


上一篇下一篇

猜你喜欢

热点阅读