YOLOv3: An Increment alI improve
Introduction
我们改进了一下YOLO
The Deal
Bounding Box Prediction
坐标计算
YOLOv3 predicts an objectness score for each bounding box using logistic regression.
Class Prediction
Each box predicts the classes the bounding box may contain using multilabel classification. We do not use a softmax as we have found it is unnecessary for good performance, instead we simply use independent logistic classifiers.
Predictions Across Scales
YOLOv3 predicts boxes at 3 different scales. Our sys-tem extracts features from those scales using a similar con-cept to feature pyramid networks [8].
In our experiments with COCO [10] we predict 3 boxes at each scale so the tensor is N×N×[3∗(4 + 1 + 80)] for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.
Feature Extractor
We use a new network for performing feature extraction.Our new network is a hybrid approach between the network used in YOLOv2, Darknet-19, and that newfangled residual network stuff.
Training
We still train on full images with no hard negative mining or any of that stuff. We use multi-scale training, lots of data augmentation, batch normalization, all the standard stuff. We use the Darknet neural network framework for training and testing [14].
How We Do
Things We Tried That Didn’t Work
Anchor box x,y offset predictions
Linear x,y predictions instead of logistic
Focal loss.
YOLOv3 may already be robust to the problem focal loss is trying to solve because it has separate objectness predictions and conditional class predictions.