YOLOv3: An Increment alI improve

2018-07-29 本文已影响43人初七123

Introduction

我们改进了一下YOLO

The Deal

Bounding Box Prediction

坐标计算

YOLOv3 predicts an objectness score for each bounding box using logistic regression.

Class Prediction

Each box predicts the classes the bounding box may contain using multilabel classification. We do not use a softmax as we have found it is unnecessary for good performance, instead we simply use independent logistic classifiers.

Predictions Across Scales

YOLOv3 predicts boxes at 3 different scales. Our sys-tem extracts features from those scales using a similar con-cept to feature pyramid networks [8].

In our experiments with COCO [10] we predict 3 boxes at each scale so the tensor is N×N×[3∗(4 + 1 + 80)] for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.

Feature Extractor

We use a new network for performing feature extraction.Our new network is a hybrid approach between the network used in YOLOv2, Darknet-19, and that newfangled residual network stuff.

Training

We still train on full images with no hard negative mining or any of that stuff. We use multi-scale training, lots of data augmentation, batch normalization, all the standard stuff. We use the Darknet neural network framework for training and testing [14].

How We Do

Things We Tried That Didn’t Work

Anchor box x,y offset predictions
Linear x,y predictions instead of logistic

Focal loss.
YOLOv3 may already be robust to the problem focal loss is trying to solve because it has separate objectness predictions and conditional class predictions.