[yolo] - 如何理解yolo(Darknet)的cfg文件
yolo的cfg文件内容比较丰富,可以用于配置很多网络参数,暂时我还未发现有特别详细的介绍,根据网络上零星的描述,现整理如下:
来自darknet原著作者的解释:
-
saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
The larger the value, the more invariance would neural network to change of lighting and color of the objects. -
steps and scales values -
steps
is a checkpoints (number of itarations) at whichscales
will be applied,scales
is a coefficients at whichlearning_rate
will be multipled at this checkpoints.
Determines how the learning_rate will be changed during increasing number of iterations during training. -
anchors, bias_match
anchors
are frequent initial <width,height> of objects in terms of output network resolution.
bias_match
used only for training, if bias_match=1 then detected object will have <width,height> the same as in one of anchor, else if bias_match=0 then <width,height> of anchor will be refined by a neural network:Lines 275 to 283 in c190406
| | box pred = get_region_box(l.output, l.biases, n, index, i, j, l.w, l.h); |
| | if(l.bias_match){ |
| | pred.w = l.biases[2n]; |
| | pred.h = l.biases[2n+1]; |
| | if(DOABS){ |
| | pred.w = l.biases[2n]/l.w; |
| | pred.h = l.biases[2n+1]/l.h; |
| | } |
| | } |If you train with
height=416,width=416,random=0
, then max values of anchors will be13,13
.
But if you train withrandom=1
, then max input resolution can be 608x608, and max values of anchors can be19,19
. -
jitter, rescore, thresh
jitter
can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects:Lines 513 to 528 in c190406
| | int dw = (owjitter); |
| | int dh = (ohjitter); |
| | |
| | int pleft = rand_uniform(-dw, dw); |
| | int pright = rand_uniform(-dw, dw); |
| | int ptop = rand_uniform(-dh, dh); |
| | int pbot = rand_uniform(-dh, dh); |
| | |
| | int swidth = ow - pleft - pright; |
| | int sheight = oh - ptop - pbot; |
| | |
| | float sx = (float)swidth / ow; |
| | float sy = (float)sheight / oh; |
| | |
| | int flip = random_gen()%2; |
| | image cropped = crop_image(orig, pleft, ptop, swidth, sheight); |
rescore
determines what the loss (delta, cost, ...) function will be used - more about this: #185 (comment)
Lines 302 to 305 in c190406
| | l.delta[best_index + 4] = l.object_scale * (1 - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | if (l.rescore) { |
| | l.delta[best_index + 4] = l.object_scale * (iou - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | } |
thresh
is a minimum IoU when should be used delta_region_class()
during training:
Line 235 in c190406
| | if (best_iou > l.thresh) { |
- object_scale, noobject_scale, class_scale, coord_scale values - all used for training
-
object_scale used for loss (delta, cost, ...) function for objects: #185 (comment)
-
noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds:
Lines 232 to 233 in c190406
| | l.delta[index + 4] = l.noobject_scale * ((0 - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); |
| | if(l.classfix == -1) l.delta[index + 4] = l.noobject_scale * ((best_iou - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); | -
class_scale - used as scale in the
delta_region_class()
:Line 108 in c190406
void delta_region_class(float *output, float *delta, int index, int class, int classes, tree *hier, float scale, float *avg_cat) |
-
coord_scale - used as scale in the
delta_region_box()
:Line 87 in c190406
float delta_region_box(box truth, float *x, float *biases, int n, int index, int i, int j, int w, int h, float *delta, float scale) |
- absolute - isn't used
来自Stack Overflow上的解释
Here is my current understanding of some of the variables. Not necessarily correct though:
[net]
- batch: That many images+labels are used in the forward pass to compute a gradient and update the weights via backpropagation.
- subdivisions: The batch is subdivided in this many "blocks". The images of a block are ran in parallel on the gpu.
- decay: Maybe a term to diminish the weights to avoid having large values. For stability reasons I guess.
- channels: Better explained in this image :
On the left we have a single channel with 4x4 pixels, The reorganization layer reduces the size to half then creates 4 channels with adjacent pixels in different channels.
figure- momentum: I guess the new gradient is computed by momentum * previous_gradient + (1-momentum) * gradient_of_current_batch. Makes the gradient more stable.
- adam: Uses the adam optimizer? Doesn't work for me though
- burn_in: For the first x batches, slowly increase the learning rate until its final value (your learning_rate parameter value). Use this to decide on a learning rate by monitoring until what value the loss decreases (before it starts to diverge).
- policy=steps: Use the steps and scales parameters below to adjust the learning rate during training
- steps=500,1000: Adjust the learning rate after 500 and 1000 batches
- scales=0.1,0.2: After 500, multiply the LR by 0.1, then after 1000 multiply again by 0.2
- angle: augment image by rotation up to this angle (in degree)
layers
- filters: How many convolutional kernels there are in a layer.
- activation: Activation function, relu, leaky relu, etc. See src/activations.h
- stopbackward: Do backpropagation until this layer only. Put it in the panultimate convolution layer before the first yolo layer to train only the layers behind that, e.g. when using pretrained weights.
- random: Put in the yolo layers. If set to 1 do data augmentation by resizing the images to different sizes every few batches. Use to generalize over object sizes.
Many things are more or less self-explanatory (size, stride, batch_normalize, max_batches, width, height). If you have more questions, feel free to comment.
Again, please keep in mind that I am not 100% certain about many of those.
以上内容摘抄自:
https://github.com/AlexeyAB/darknet/issues/279
https://stackoverflow.com/questions/50390836/understanding-darknets-yolo-cfg-config-files