(1) caffe--使用篇

2019-05-16 本文已影响0人 Alfie20

本文主要从算法工程师角度如何使用caffe。主要分为训练和推理两大部分。

1. 训练

启动caffe的训练的命令示例如下：

caffe train --solver=*.prototxt  --weights=snapshots/*.caffemodel --snapshot=snapshots --gpu=5,6 2>&1 | tee log/*.log

入参说明：

--solver   此参数指明采取那个网络及相关超参
--weights  此参数指明基于那个参数集上开展
--gpu      此参数指明此动作使用GPU序号
--snapshot 此参数指明动作保存的快照地址

*.prototxt一般包含如下内容，相关字段的定义参考SolverParameter的说明, 见文章最后附的源码。

net: "pelee.prototxt"  #指明使用的网络结构
test_iter: 55     #测试迭代次数
test_interval: 210 #两次测试迭代次数
test_initialization: false #测试起始条件，如为真则在第一个迭代前进行初始化测试
display: 100 #指定多少次迭代进行打印
max_iter: 100000  #最大的迭代次数
base_lr:  0.01 #起始学习速率
lr_policy: "step" #学习速率更新策略, 具体含义见下方具体描述
gamma: 0.1 #学习速率更新参数, 在训练的过程中，如果loss开始出现稳定水平时，对learning rate乘以一个常数因子（比如，10），这样的过程重复多次。
momentum: 0.9 #让使用SGD的深度学习方法更加稳定以及快速，这次初始参数参论文ImageNet Classification with Deep Convolutional Neural Networks
weight_decay: 1e-06 #防止过拟合的惩罚项的权值，见下面详细描述
stepsize: 8000 #更新lr的步长，见下面详细描述
clip_gradients:40 #防止梯度下降的参数，，见下面详细描述
regularization_type: "L1"  #L1和L2正则都是比较常见和常用的正则化项，都可以达到防止过拟合的效果。L1正则化的解具有稀疏性，可用于特征选择。L2正则化的解都比较小，抗扰动能力强。
snapshot: 500 #训练中间结果的保存步长，即每 500 iterations就可以得到model_iter_xxx.caffemodel 和model_iter_xxx.solverstate的中间文件
snapshot_prefix: "snapshots" #设置快照保存路径
solver_mode: GPU   #运行的计算模式，GPU还是CPU

其中lr_policy的值其含义是：

- fixed:　　 保持base_lr不变.
- step: 　　 如果设置为step,则还需要设置一个stepsize,  返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter表示当前的迭代次数
- exp:   　　返回base_lr * gamma ^ iter， iter为当前迭代次数
- inv:　　    如果设置为inv,还需要设置一个power, 返回base_lr * (1 + gamma * iter) ^ (- power)
- multistep: 如果设置为multistep,则还需要设置一个stepvalue。这个参数和step很相似，step是均匀等间隔变化，而multistep则是根据stepvalue值变化
- poly: 　　  学习率进行多项式误差, 返回 base_lr (1 - iter/max_iter) ^ (power)
- sigmoid:　学习率进行sigmod衰减，返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))

关于超参，总结下：
(1) lr什么时候衰减与stepsize有关，减少多少与gamma有关，即:若stepsize=500, base_lr=0.01, gamma=0.1,则当迭代到第一个500次时，lr第一次衰减，衰减后的lr=lrgamma=0.010.1=0.001,以后重复该过程，所以stepsize是lr的衰减步长，gamma是lr的衰减系数；
(2) 在训练过程中，每到一定的迭代次数都会测试，迭代次数是由test-interval决定的，如test_interval=1000，则训练集每迭代1000次测试一遍网络，而 test_size, test_iter, 和test图片的数量决定了怎样test, test-size决定了test时每次迭代输入图片的数量，test_iter就是test所有的图片的迭代次数，如：500张test图片，test_iter=100，则test_size=5, 而solver文档里只需要根据test图片总数量来设置test_iter，以及根据需要设置test_interval即可。
(3) 在机器学习或者模式识别中，会出现overfitting，而当网络逐渐overfitting时网络权值逐渐变大，weight decay（权值衰减）使用的目的是防止过拟合。在损失函数中，weight decay是放在正则项（regularization）前面的一个系数，正则项一般指示模型的复杂度，所以weight decay的作用是调节模型复杂度对损失函数的影响，若weight decay很大，则复杂的模型损失函数的值也就大。
(4) clip_gradient 的引入是为了处理gradient explosion的问题。当在一次迭代中权重的更新过于迅猛的话，很容易导致loss divergence。clip_gradient 的直观作用就是让权重的更新限制在一个合适的范围。具体的细节是，
1). 在solver中先设置一个clip_gradient
2). 在前向传播与反向传播之后，我们会得到每个权重的梯度diff，这时不像通常那样直接使用这些梯度进行权重更新，而是先求所有权重梯度的平方和sumsq_diff，如果sumsq_diff > clip_gradient，则求缩放因子scale_factor = clip_gradient / sumsq_diff。这个scale_factor在(0,1)之间。如果权重梯度的平方和sumsq_diff越大，那缩放因子将越小。
3). 最后将所有的权重梯度乘以这个缩放因子，这时得到的梯度才是最后的梯度信息。
这样就保证了在一次迭代更新中，所有权重的梯度的平方和在一个设定范围以内，这个范围就是clip_gradient.

关于数据的输入指定是通过网络的protxt文件中输入层完成的，举例如下：

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN#TEST
  }
  image_data_param {
    source: "/root/data/total_data/train_seven_wuhe.txt"
    batch_size: 64
    shuffle: true
    label_size: 2
  }
}

从上面我们可以看出，这一个layer的top与自己的name相同，所以是输入层。在image_data_param中有对输入数据进行相关设置，比如路径，batch_size, shuffle, label_size等等。

2. 训练的评价

方式一：通过相应打印可以查看，下面2个图是典型的打印（通过log分析进行），表面训练过程中验证集的表现，如果发现训练过程中loss没有收敛，那说明训练失败，需要进行停止当前训练进行调整。

[solver.cpp:218] Iteration 87900 (1.3223 iter/s, 75.626s/100 iters), loss = 14.7485
[solver.cpp:237]     Train net output #0: lambda = 1
[solver.cpp:237]     Train net output #1: lambda2 = 1
[solver.cpp:237]     Train net output #2: loss = 7.81631 (* 1 = 7.81631 loss)
[solver.cpp:237]     Train net output #3: loss2 = 6.93214 (* 1 = 6.93214 loss)
[sgd_solver.cpp:105] Iteration 87900, lr = 1e-12

[solver.cpp:449] Snapshotting to binary proto file snapshots/*.caffemodel
[sgd_solver.cpp:273] Snapshotting solver state to binary proto file snapshots/*.solverstate

方式二：对于生成的一些模型，使用测试集进行测试推理结果，从中选出一个较好的模型。示例代码：

names = [name for name in files if name.endswith(".caffemodel")]
  names.sort()
  for caffemodel in names:
    caffe_model = os.path.join(root, caffemodel)
    net = caffe.Net(net_file, caffe_model, caffe.TEST)
    ......
    net.blobs['data'].data[...] = _in_[...]
    out = net.forward()
    resultData = net.blobs['prob'].data[0]
    output = np.argmax(resultData)
    .....
#log的分析进行模型优劣的对比

3. 推理

使用caffe进行网络推理的方式较简单，主要分为如下几个步骤

模型加载

    Caffe::set_mode(Caffe::GPU);  //CPU or GPU mode
    /* Load the network. */
    net_.reset(new Net<float>(model_file, TEST)); //model_file是指网络结构文件,即*.prototxt， 在训练阶段生产，必选项
    net_->CopyTrainedLayersFrom(trained_file);//trainded_file是指模型文件，即*.caffemodel， 在训练阶段生产，必选项

    std::shared_ptr<Net<float> > net_;    
    /*Get some Value*/
    Blob<float>* input_layer = net_->input_blobs()[0];
    num_channels_ = input_layer->channels();
    /* Load Param */
    SetParam(param_file); ;//trainded_file是指校准文件，即.binaryproto， 在训练阶段生产，可以没有

关于Blob的Reshape, Net的Reshap等相关的的定义和使用，请查阅本人文章https://www.jianshu.com/writer#/notebooks/36219573/notes/45233141
关于mean,指进行归一化处理，主要是从para_file中取出相关的值，对待推理的数据进行相应的预处理，给出一个示例如下：

SetParam(const std::string& param_file){
   input_geometry_ = cv::Size(256, 256);
    //读取param_file文件
    ......
    meanB_ = 127.5;
    meanG_ = 127.5;
    meanR_ = 127.5;
    scale_ = 0.0078125;
    cv::Mat(input_geometry_,CV_32FC3,cv::Scalar(meanB_,meanG_,meanR_));
}

模型调用

1. 预处理
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1, num_channels_, input_geometry_.height, input_geometry_.width);
net_->Reshape();/* Forward dimension change to all layers. */
2. 准备数据
std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels);
Preprocess(img, &input_channels);
3. 推理
net_->Forward();
4. 获取推理后的数据 
/* Copy the output layer to a std::vector */
Blob<float>* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + output_layer->channels();
std::vector<float>(begin, end);

关于准备数据，无非就是按照要求把数据进行填充，示例如下:

void WrapInputLayer(std::vector<cv::Mat>* input_channels) {
    Blob<float>* input_layer = net_->input_blobs()[0];
    int width = input_layer->width();
    int height = input_layer->height();
    float* input_data = input_layer->mutable_cpu_data();
    for (int i = 0; i < input_layer->channels(); ++i)
    {
        cv::Mat channel(height, width, CV_32FC1, input_data);
        input_channels->push_back(channel);
        input_data += width * height;
    }
}

关于预处理，无非就是按照算法的要求对图片做灰度，resize，归一化，BGR拆分等。示例如下：

void Preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels)
{
    /* Convert the input image to the input image format of the network. */
    cv::Mat sample;
    cv::cvtColor(img, sample, cv::COLOR_BGR2GRAY);
    cv::resize(sample, sample_resized, input_geometry_);
    cv::Mat sample_float;
    sample_resized.convertTo(sample_float, CV_32FC3);
    //减均值 归一化
    cv::subtract(sample_float, mean_, sample_normalized);
    sample_normalized *= scale_;
    /* This operation will write the separate BGR planes directly to the
    * input layer of the network because it is wrapped by the cv::Mat
    * objects in input_channels. */
    cv::split(sample_normalized, *input_channels);
}

4. SolverParameter 源码

附上caffe源码中SolverParameter的定义，便于查阅相关字段含义

message SolverParameter {
  //////////////////////////////////////////////////////////////////////////////
  // Proto filename for the train net, possibly combined with one or more
  // test nets.
  optional string net = 24;
  // Inline train net param, possibly combined with one or more test nets.
  optional NetParameter net_param = 25;

  optional string train_net = 1; // Proto filename for the train net.
  repeated string test_net = 2; // Proto filenames for the test nets.
  optional NetParameter train_net_param = 21; // Inline train net params.
  repeated NetParameter test_net_param = 22; // Inline test net params.

  optional NetState train_state = 26;
  repeated NetState test_state = 27;

  // The number of iterations for each test net.
  repeated int32 test_iter = 3;

  // The number of iterations between two testing phases.
  optional int32 test_interval = 4 [default = 0];
  optional bool test_compute_loss = 19 [default = false];
  // If true, run an initial test pass before the first iteration,
  // ensuring memory availability and printing the starting value of the loss.
  optional bool test_initialization = 32 [default = true];
  optional float base_lr = 5; // The base learning rate
  // the number of iterations between displaying info. If display = 0, no info
  // will be displayed.
  optional int32 display = 6;
  // Display the loss averaged over the last average_loss iterations
  optional int32 average_loss = 33 [default = 1];
  optional int32 max_iter = 7; // the maximum number of iterations
  // accumulate gradients over `iter_size` x `batch_size` instances
  optional int32 iter_size = 36 [default = 1];

  optional string lr_policy = 8;
  optional float gamma = 9; // The parameter to compute the learning rate.
  optional float power = 10; // The parameter to compute the learning rate.
  optional float momentum = 11; // The momentum value.
  optional float weight_decay = 12; // The weight decay.
  // regularization types supported: L1 and L2
  // controlled by weight_decay
  optional string regularization_type = 29 [default = "L2"];
  // the stepsize for learning rate policy "step"
  optional int32 stepsize = 13;
  // the stepsize for learning rate policy "multistep"
  repeated int32 stepvalue = 34;

  optional float clip_gradients = 35 [default = -1];

  optional int32 snapshot = 14 [default = 0]; // The snapshot interval

  optional string snapshot_prefix = 15;
 
  optional bool snapshot_diff = 16 [default = false];
  enum SnapshotFormat {
    HDF5 = 0;
    BINARYPROTO = 1;
  }
  optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];
  // the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
  enum SolverMode {
    CPU = 0;
    GPU = 1;
  }
  optional SolverMode solver_mode = 17 [default = GPU];
  // the device_id will that be used in GPU mode. Use device_id = 0 in default.
  optional int32 device_id = 18 [default = 0];
  // If non-negative, the seed with which the Solver will initialize the Caffe
  // random number generator -- useful for reproducible results. Otherwise,
  // (and by default) initialize using a seed derived from the system clock.
  optional int64 random_seed = 20 [default = -1];
  // type of the solver
  optional string type = 40 [default = "SGD"];
  // numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
  optional float delta = 31 [default = 1e-8];
  // parameters for the Adam solver
  optional float momentum2 = 39 [default = 0.999];
  optional float rms_decay = 38 [default = 0.99];
  optional bool debug_info = 23 [default = false];
  // If false, don't save a snapshot after training finishes.
  optional bool snapshot_after_train = 28 [default = true];
  // DEPRECATED: old solver enum types, use string instead
  enum SolverType {
    SGD = 0;
    NESTEROV = 1;
    ADAGRAD = 2;
    RMSPROP = 3;
    ADADELTA = 4;
    ADAM = 5;
  }
  // DEPRECATED: use type instead of solver_type
  optional SolverType solver_type = 30 [default = SGD];
  // Overlap compute and communication for data parallel training
  optional bool layer_wise_reduce = 41 [default = true];
  repeated string weights = 42;
}

(1) caffe--使用篇

1. 训练

2. 训练的评价

3. 推理

4. SolverParameter 源码

猜你喜欢

热点阅读