在caffe中自定义cuda版loss

2019-06-17 本文已影响0人逆风g

上次说到了cuda加速初体验，这次则说明如何实现它，推荐大家仿照系统的loss-layer层来实现，如果您的loss层需要传额外的参数，例如Crop层：

layer { bottom: "side5_up" bottom: "data" top: "side5_crop" type: "Crop" name: "side5_crop"  crop_param { axis: 2 offset: 4 offset: 4 } }

这里的crop_param就是所指的额外参数，则需要修改caffe.proto文件。除此之外你的loss layer层必不可少的三个文件，头文件loss_yourname.hpp，cpu版实现loss_yourname.cpp，gpu版实现loss_yourname.cu。总结一下，核心文件有：

loss_yourname.hpp
写好之后放入code_root/src/caffe/layers下
loss_yourname.cpp
写好之后放入code_root/include/caffe/layers下
loss_yourname.cu
写好之后放入code_root/include/caffe/layers下
caffe.proto（可选）
写好之后放入code_root/src/caffe/proto下

这里将参照caffe自带的sigmoid_cross_entropy_loss_layer层做修改，仅指出需要注意的点。

loss_yourname.hpp的实现

#ifndef CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_
#define CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_

#include <vector>

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#include "caffe/layers/loss_layer.hpp"
#include "caffe/layers/sigmoid_layer.hpp"

namespace caffe {

template <typename Dtype>
class SigmoidCrossEntropyLossLayer : public LossLayer<Dtype> {
 public:
  explicit SigmoidCrossEntropyLossLayer(const LayerParameter& param)
      : LossLayer<Dtype>(param),
          sigmoid_layer_(new SigmoidLayer<Dtype>(param)),
          sigmoid_output_(new Blob<Dtype>()) {}
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual inline const char* type() const { return "SigmoidCrossEntropyLoss"; }

 protected:
  /// @copydoc SigmoidCrossEntropyLossLayer
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

 
  virtual Dtype get_normalizer(
      LossParameter_NormalizationMode normalization_mode, int valid_count);

  /// The internal SigmoidLayer used to map predictions to probabilities.
  shared_ptr<SigmoidLayer<Dtype> > sigmoid_layer_;
  /// sigmoid_output stores the output of the SigmoidLayer.
  shared_ptr<Blob<Dtype> > sigmoid_output_;
  /// bottom vector holder to call the underlying SigmoidLayer::Forward
  vector<Blob<Dtype>*> sigmoid_bottom_vec_;
  /// top vector holder to call the underlying SigmoidLayer::Forward
  vector<Blob<Dtype>*> sigmoid_top_vec_;

  /// Whether to ignore instances with a certain label.
  bool has_ignore_label_;
  /// The label indicating that an instance should be ignored.
  int ignore_label_;
  /// How to normalize the loss.
  LossParameter_NormalizationMode normalization_;
  Dtype normalizer_;
  int outer_num_, inner_num_;
};

}  // namespace caffe

#endif  // CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_

核心注意需要修改的地方有：

class SigmoidCrossEntropyLossLayer : public LossLayer<Dtype>

需要修改成自己对应class名

virtual inline const char* type() const { return "SigmoidCrossEntropyLoss"; }，

其中的SigmoidCrossEntropyLoss替换成你自己定义的layer它的type。如最上面的例子，Crop就是它的type。

virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

其中函数Forward_cpu和Backward_cpu分别是cpu上的前传和后传，在loss_yourname.cpp中将是你实现的重点。而函数Forward_gpu和Backward_gpu分别是gpu上的前传和后传，在loss_yourname.cu中将是你实现的重点。如果你不需要实现cuda版loss的话，则把对应的两个gpu实现函数去掉即可。

#ifndef CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_
#define CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_
#endif  CAFFE_SIGMOID_CROSS_ENTROPY_LOSS_LAYER_HPP_

这里记得修改成你自己的宏定义名。
注意：其它变量及相关函数可以根据自己需求增删。

loss_yourname.cpp

#include <algorithm>
#include <vector>

#include "caffe/layers/sigmoid_cross_entropy_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::LayerSetUp(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
    /*
    *略
    */
}

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Reshape(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
    /*
    *略
    */
}

// TODO(shelhamer) loss normalization should be pulled up into LossLayer,
// instead of duplicated here and in SoftMaxWithLossLayer
template <typename Dtype>
Dtype SigmoidCrossEntropyLossLayer<Dtype>::get_normalizer(
    LossParameter_NormalizationMode normalization_mode, int valid_count) {
    /*
    *略
    */
}

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  // The forward pass computes the sigmoid outputs.
  sigmoid_bottom_vec_[0] = bottom[0];
  sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
  // Compute the loss (negative log likelihood)
  // Stable version of loss computation from input data
  const Dtype* input_data = bottom[0]->cpu_data();
  const Dtype* target = bottom[1]->cpu_data();
  int valid_count = 0;
  Dtype loss = 0;
  for (int i = 0; i < bottom[0]->count(); ++i) {
    const int target_value = static_cast<int>(target[i]);
    if (has_ignore_label_ && target_value == ignore_label_) {
      continue;
    }
    loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
    ++valid_count;
  }
  normalizer_ = get_normalizer(normalization_, valid_count);
  top[0]->mutable_cpu_data()[0] = loss / normalizer_;
}

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Backward_cpu(
    const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down[0]) {
    // First, compute the diff
    const int count = bottom[0]->count();
    const Dtype* sigmoid_output_data = sigmoid_output_->cpu_data();
    const Dtype* target = bottom[1]->cpu_data();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    caffe_sub(count, sigmoid_output_data, target, bottom_diff);
    // Zero out gradient of ignored targets.
    if (has_ignore_label_) {
      for (int i = 0; i < count; ++i) {
        const int target_value = static_cast<int>(target[i]);
        if (target_value == ignore_label_) {
          bottom_diff[i] = 0;
        }
      }
    }
    // Scale down gradient
    Dtype loss_weight = top[0]->cpu_diff()[0] / normalizer_;
    caffe_scal(count, loss_weight, bottom_diff);
  }
}

#ifdef CPU_ONLY
STUB_GPU(SigmoidCrossEntropyLossLayer);
#endif

INSTANTIATE_CLASS(SigmoidCrossEntropyLossLayer);
REGISTER_LAYER_CLASS(SigmoidCrossEntropyLoss);

}  // namespace caffe

核心注意需要修改的地方有：

#include "caffe/layers/sigmoid_cross_entropy_loss_layer.hpp"

导入对应的头文件

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top)

修改对应的前传函数名及实现它

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Backward_cpu(
    const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom)

修改对应的后传函数名及实现它

#ifdef CPU_ONLY
STUB_GPU(SigmoidCrossEntropyLossLayer);
#endif

INSTANTIATE_CLASS(SigmoidCrossEntropyLossLayer);
REGISTER_LAYER_CLASS(SigmoidCrossEntropyLoss);

这里是一个不能忽视的地方！把SigmoidCrossEntropyLossLayer和SigmoidCrossEntropyLoss修改成自己的layer。举个例子，如果没有实现REGISTER_LAYER_CLASS(SigmoidCrossEntropyLoss);，那么我们在调用自定义loss层时，会提示找不到layer对应的type

loss_yourname.cu

#include <vector>

#include "caffe/layers/sigmoid_cross_entropy_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {


template <typename Dtype>
__global__ void SigmoidCrossEntropyLossForwardGPU(const int nthreads,
          const Dtype* input_data, const Dtype* target, Dtype* loss,
          const bool has_ignore_label_, const int ignore_label_,
          Dtype* counts) {
  CUDA_KERNEL_LOOP(i, nthreads) {
    const int target_value = static_cast<int>(target[i]);
    if (has_ignore_label_ && target_value == ignore_label_) {
      loss[i] = 0;
      counts[i] = 0;
    } else {
      loss[i] = input_data[i] * (target[i] - (input_data[i] >= 0)) -
          log(1 + exp(input_data[i] - 2 * input_data[i] *
          (input_data[i] >= 0)));
      counts[i] = 1;
    }
  }
}

template <typename Dtype>
__global__ void SigmoidCrossEntropyLossIgnoreDiffGPU(const int count,
    const int ignore_label, const Dtype* target, Dtype* diff) {
  CUDA_KERNEL_LOOP(i, count) {
    const int target_value = static_cast<int>(target[i]);
    if (target_value == ignore_label) {
      diff[i] = 0;
    }
  }
}


template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_gpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  // The forward pass computes the sigmoid outputs.
  sigmoid_bottom_vec_[0] = bottom[0];
  sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
  // Compute the loss (negative log likelihood)
  const int count = bottom[0]->count();
  // Stable version of loss computation from input data
  const Dtype* input_data = bottom[0]->gpu_data();
  const Dtype* target = bottom[1]->gpu_data();
  // Since this memory is not used for anything until it is overwritten
  // on the backward pass, we use it here to avoid having to allocate new GPU
  // memory to accumulate intermediate results in the kernel.
  Dtype* loss_data = bottom[0]->mutable_gpu_diff();
  Dtype* count_data = bottom[1]->mutable_gpu_diff();
  Dtype valid_count;
  // NOLINT_NEXT_LINE(whitespace/operators)
  SigmoidCrossEntropyLossForwardGPU<Dtype><<<CAFFE_GET_BLOCKS(count),
      CAFFE_CUDA_NUM_THREADS>>>(count, input_data, target, loss_data,
      has_ignore_label_, ignore_label_, count_data);
  // Only launch another CUDA kernel if we actually need the valid count.
  if (normalization_ == LossParameter_NormalizationMode_VALID &&
      has_ignore_label_) {
    caffe_gpu_asum(count, count_data, &valid_count);
  } else {
    valid_count = count;
  }
  Dtype loss;
  caffe_gpu_asum(count, loss_data, &loss);
  normalizer_ = get_normalizer(normalization_, valid_count);
  top[0]->mutable_cpu_data()[0] = loss / normalizer_;
}

template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Backward_gpu(
    const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down[0]) {
    // First, compute the diff
    const int count = bottom[0]->count();
    const Dtype* sigmoid_output_data = sigmoid_output_->gpu_data();
    const Dtype* target = bottom[1]->gpu_data();
    Dtype* bottom_diff = bottom[0]->mutable_gpu_diff();
    caffe_copy(count, sigmoid_output_data, bottom_diff);
    caffe_gpu_axpy(count, Dtype(-1), target, bottom_diff);
    // Zero out gradient of ignored targets.
    if (has_ignore_label_) {
      // NOLINT_NEXT_LINE(whitespace/operators)
      SigmoidCrossEntropyLossIgnoreDiffGPU<Dtype><<<CAFFE_GET_BLOCKS(count),
        CAFFE_CUDA_NUM_THREADS>>>(count, ignore_label_, target, bottom_diff);
    }
    // Scale down gradient
    Dtype loss_weight = top[0]->cpu_diff()[0] / normalizer_;
    caffe_gpu_scal(count, loss_weight, bottom_diff);
  }
}

INSTANTIATE_LAYER_GPU_FUNCS(SigmoidCrossEntropyLossLayer);

}  // namespace caffe

核心注意需要修改的地方有：

#include "caffe/layers/sigmoid_cross_entropy_loss_layer.hpp"

导入对应的头文件

template <typename Dtype>
__global__ void SigmoidCrossEntropyLossForwardGPU

template <typename Dtype>
__global__ void SigmoidCrossEntropyLossIgnoreDiffGPU

这里有两个kernel函数，是cuda加速的关键！！！，其中nthreads是线程数，而i是线程的索引，在这里是给所有像素点都开辟了单独的线程，做并行计算。所有速度非常快，不像上面cpu版使用的是for循环遍历每个像素点来做计算。所以借鉴这里的思想，大家也可以利用kernel函数来并行计算自己loss里需要大量运算的代码。

void SigmoidCrossEntropyLossLayer<Dtype>::Forward_gpu

修改对应的前传函数名及实现它

 SigmoidCrossEntropyLossForwardGPU<Dtype><<<CAFFE_GET_BLOCKS(count),
      CAFFE_CUDA_NUM_THREADS>>>(count, input_data, target, loss_data,
      has_ignore_label_, ignore_label_, count_data);

这里是具体调用kernel函数的代码

  Dtype* loss_data = bottom[0]->mutable_gpu_diff();
  Dtype* count_data = bottom[1]->mutable_gpu_diff();

这里存储并行运算产生的结果，利用到了存储梯度的空间。避免重新申请空间。很重要的思想，大家也尽量避免自己去开辟空间。

void SigmoidCrossEntropyLossLayer<Dtype>::Backward_gpu

修改对应的后传函数名及实现它

SigmoidCrossEntropyLossIgnoreDiffGPU<Dtype><<<CAFFE_GET_BLOCKS(count),
        CAFFE_CUDA_NUM_THREADS>>>(count, ignore_label_, target, bottom_diff);

同样调用后传时的kernel函数

INSTANTIATE_LAYER_GPU_FUNCS(SigmoidCrossEntropyLossLayer);

把当前类名SigmoidCrossEntropyLossLayer实例化到GPU上

除此之外需要注意点，相比cpu版，在这里把所有的字样cpu都替换成gpu，因为处理的数据都需要在gpu上，当然这两个除外：

top[0]->mutable_cpu_data()[0]

top[0]->cpu_diff()[0]

宝贵的采坑经验！！！

在.cu文件中的前传或后传函数中若使用cpu上的空间申请方式，并把它直接传到kernel函数里去，程序会中断哦。也不推荐先在cpu上申请空间，然后拷贝到gpu上，会大大降低效率。直接使用cudaMalloc申请一维空间吧，二维、三维数组的话，自己去查找对应申请函数，最后记得释放，释放，释放！！！
如果在.cu文件中的前传或后传函数中定义常量，把它直接传到kernel函数里去，是可以滴！例如

Dtype w_neg = 1.0;

为什么说这个呢，因为后面我会扩展点内容，有需求的可以看到最后。需求为：
在loss中动态获取当前的迭代次数，根据当前迭代次数动态设置loss中权重问题。

caffe.proto中设置额外参数

比如需要给layer添加一个这样的参数：

auto_weight_param { init_pos: 0.9 }

添加message，如下

message AutoWeightParameter { 
  optional float init_pos = 1 [default=0.9];
}

2.在message LayerParameter{}中添加

  optional AutoWeightParameter auto_weight_param = 147;

这里的编号147是根据前面的注释查看到的，我之前可用的ID就为147，当大家添加自己的ID之后记得修改过来，如下：

// LayerParameter next available layer-specific ID: 148 (last added: auto_weight_param)

在loss_yourname.cpp或loss_yourname.cpp中获取额外参数：

const Dtype init_pos = this->layer_param_.auto_weight_param().init_pos();//0.9

如何动态的获取当前迭代次数

参考地址，这里只给出需要修改的地方，其它代码不变。

common.hpp中申明setter、getter方法以及变量

Caffe  { 
  public: static  Caffe& Get();

  //　Return the current iteration
  inline static int cur_iter(){return Get().cur_iter_;}
  //　Set the current iteration
  inline static void set_cur_iter(int iter){Get().cur_iter_=iter;}
  //　Return the maxiters
  inline static int max_iter(){return Get().max_iter_;}
  //　Set the maxiters
  inline static void set_max_iter(int iter){Get().max_iter_=iter;}

 protected:
  static int cur_iter_;
  static int max_iter_;

common.cpp中初始化变量

namespace caffe {
  int Caffe::cur_iter_ = 0;
  int Caffe::max_iter_ = 0;
}

solver.cpp中给变量动态赋值

void Solver<Dtype>::Step(int iters) {
  const int start_iter = iter_;
  const int stop_iter = iter_ + iters;

  Caffe::set_max_iter(stop_iter);

  while (iter_ < stop_iter) {

    Caffe::set_cur_iter(iter_);

  }
}

在loss_yourname.cpp或loss_yourname.cpp中动态获取当前迭代次数及最大迭代次数：

    const int cur_iter = Caffe::cur_iter();
    const int max_iter = Caffe::max_iter();

在caffe中自定义cuda版loss

loss_yourname.hpp的实现

loss_yourname.cpp

loss_yourname.cu

宝贵的采坑经验！！！

caffe.proto中设置额外参数

如何动态的获取当前迭代次数

猜你喜欢

热点阅读