机器学习与计算机视觉深度学习-推荐系统-CV-NLPArtificial Intelligence

Reading Note: S^3FD: Single Shot

2018-03-07  本文已影响8人  joshua_1988

TITLE: $S^3FD$: Single Shot Scale-invariant Face Detector

AUTHOR: Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, Stan Z. Li

ASSOCIATION: Chinese Academy of Sciences

FROM: arXiv:1708.05237

CONTRIBUTION

  1. Proposing a scale-equitable face detection framework with a wide range of anchor-associated layers and a series of reasonable anchor scales so as to handle dif- ferent scales of faces well.
  2. Presenting a scale compensation anchor matching strategy to improve the recall rate of small faces.
  3. Introducing a max-out background label to reduce the high false positive rate of small faces.
  4. Achieving state-of-the-art results on AFW, PASCAL face, FDDB and WIDER FACE with real-time speed.

METHOD

There are mainly three reasons that why the performance of anchor-based detetors drop dramatically as the objects becoming smaller:

  1. Biased Framework. Firstly, the stride size of the lowest anchor-associated layer is too large, thus few features are reliable for small faces. Secondly, anchor scale mismatches receptive field and both are too large to fit small faces.
  2. Anchor Matching Strategy. Anchor scales are discrete but face scale is continuous. Those faces whose scale distribute away from anchor scales can not match enough anchors, such as tiny and outer face.
  3. Background from Small Anchors. Small anchors lead to sharp increase in the number of negative anchors on the background, bringing about many false positive faces.

The architecture of Single Shot Scale-invariant Face Detector is shown in the following figure.

Framework

{: .center-image .image-width-640}

Scale-equitable framework

Constructing Architecture

Designing scales for anchors

Scale compensaton anchor matching strategy

To solve the problems that 1) the average number of matched anchors is about 3 which is not enough to recall faces with high scores; 2) the number of matched anchors is highly related to the anchor scales, a scale compensation anchor matching strategy is proposed. There are two stages:

Max-out background label

For conv3_3 detection layer, a max-out background label is applied. For each of the smallest anchors, $N_m$ scores are predicted for background label and then choose the highest as its final score.

Training

  1. Training dataset and data augmentation, including color distort, random crop and horizontal flip.
  2. Loss function is a multi-task loss defined in RPN.
  3. Hard negative mining.

The experiment result on WIDER FACE is illustrated in the following figure.

Experiment

{: .center-image .image-width-640}

上一篇 下一篇

猜你喜欢

热点阅读