[yolo] - 如何理解yolo(Darknet)的cfg文件

2018-12-13 本文已影响534人 phoenixmy

yolo的cfg文件内容比较丰富，可以用于配置很多网络参数，暂时我还未发现有特别详细的介绍，根据网络上零星的描述，现整理如下：

来自darknet原著作者的解释：

saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
The larger the value, the more invariance would neural network to change of lighting and color of the objects.
steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints.
Determines how the learning_rate will be changed during increasing number of iterations during training.
anchors, bias_match
anchors are frequent initial <width,height> of objects in terms of output network resolution.
bias_match used only for training, if bias_match=1 then detected object will have <width,height> the same as in one of anchor, else if bias_match=0 then <width,height> of anchor will be refined by a neural network:

darknet/src/region_layer.c

Lines 275 to 283 in c190406

| | box pred = get_region_box(l.output, l.biases, n, index, i, j, l.w, l.h); |
| | if(l.bias_match){ |
| | pred.w = l.biases[2n]; |
| | pred.h = l.biases[2n+1]; |
| | if(DOABS){ |
| | pred.w = l.biases[2n]/l.w; |
| | pred.h = l.biases[2n+1]/l.h; |
| | } |
| | } |

If you train with height=416,width=416,random=0, then max values of anchors will be 13,13.
But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.
jitter, rescore, thresh
jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects:

darknet/src/data.c

Lines 513 to 528 in c190406

| | int dw = (owjitter); |
| | int dh = (ohjitter); |
| | |
| | int pleft = rand_uniform(-dw, dw); |
| | int pright = rand_uniform(-dw, dw); |
| | int ptop = rand_uniform(-dh, dh); |
| | int pbot = rand_uniform(-dh, dh); |
| | |
| | int swidth = ow - pleft - pright; |
| | int sheight = oh - ptop - pbot; |
| | |
| | float sx = (float)swidth / ow; |
| | float sy = (float)sheight / oh; |
| | |
| | int flip = random_gen()%2; |
| | image cropped = crop_image(orig, pleft, ptop, swidth, sheight); |

rescore determines what the loss (delta, cost, ...) function will be used - more about this: #185 (comment)

Lines 302 to 305 in c190406

| | l.delta[best_index + 4] = l.object_scale * (1 - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | if (l.rescore) { |
| | l.delta[best_index + 4] = l.object_scale * (iou - l.output[best_index + 4]) * logistic_gradient(l.output[best_index + 4]); |
| | } |

thresh is a minimum IoU when should be used delta_region_class() during training:

darknet/src/region_layer.c

Line 235 in c190406

| | if (best_iou > l.thresh) { |

object_scale, noobject_scale, class_scale, coord_scale values - all used for training

object_scale used for loss (delta, cost, ...) function for objects: #185 (comment)
noobject_scale - used for loss (delta, cost, ...) function for objects and backgrounds:

darknet/src/region_layer.c

Lines 232 to 233 in c190406

| | l.delta[index + 4] = l.noobject_scale * ((0 - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); |
| | if(l.classfix == -1) l.delta[index + 4] = l.noobject_scale * ((best_iou - l.output[index + 4]) * logistic_gradient(l.output[index + 4])); |
class_scale - used as scale in the delta_region_class():

darknet/src/region_layer.c

Line 108 in c190406

void delta_region_class(float *output, float *delta, int index, int class, int classes, tree *hier, float scale, float *avg_cat) |

coord_scale - used as scale in the delta_region_box():

darknet/src/region_layer.c

Line 87 in c190406

float delta_region_box(box truth, float *x, float *biases, int n, int index, int i, int j, int w, int h, float *delta, float scale) |

absolute - isn't used

来自Stack Overflow上的解释

Here is my current understanding of some of the variables. Not necessarily correct though:

[net]

batch: That many images+labels are used in the forward pass to compute a gradient and update the weights via backpropagation.
subdivisions: The batch is subdivided in this many "blocks". The images of a block are ran in parallel on the gpu.
decay: Maybe a term to diminish the weights to avoid having large values. For stability reasons I guess.
channels: Better explained in this image :

On the left we have a single channel with 4x4 pixels, The reorganization layer reduces the size to half then creates 4 channels with adjacent pixels in different channels.

figure

momentum: I guess the new gradient is computed by momentum * previous_gradient + (1-momentum) * gradient_of_current_batch. Makes the gradient more stable.
adam: Uses the adam optimizer? Doesn't work for me though
burn_in: For the first x batches, slowly increase the learning rate until its final value (your learning_rate parameter value). Use this to decide on a learning rate by monitoring until what value the loss decreases (before it starts to diverge).
policy=steps: Use the steps and scales parameters below to adjust the learning rate during training
steps=500,1000: Adjust the learning rate after 500 and 1000 batches
scales=0.1,0.2: After 500, multiply the LR by 0.1, then after 1000 multiply again by 0.2
angle: augment image by rotation up to this angle (in degree)

layers

filters: How many convolutional kernels there are in a layer.
activation: Activation function, relu, leaky relu, etc. See src/activations.h
stopbackward: Do backpropagation until this layer only. Put it in the panultimate convolution layer before the first yolo layer to train only the layers behind that, e.g. when using pretrained weights.
random: Put in the yolo layers. If set to 1 do data augmentation by resizing the images to different sizes every few batches. Use to generalize over object sizes.

Many things are more or less self-explanatory (size, stride, batch_normalize, max_batches, width, height). If you have more questions, feel free to comment.

Again, please keep in mind that I am not 100% certain about many of those.

以上内容摘抄自：
https://github.com/AlexeyAB/darknet/issues/279
https://stackoverflow.com/questions/50390836/understanding-darknets-yolo-cfg-config-files

[yolo] - 如何理解yolo(Darknet)的cfg文件

来自darknet原著作者的解释：

来自Stack Overflow上的解释

[net]

layers

猜你喜欢

热点阅读