【Tool】Caffe 使用指南

2018-11-14  本文已影响0人  ItchyHiker

Tags: Caffe

实习的时候使用过Caffe一段时间,到现在的公司几个月使用的是Tensorflow+Keras。现在有个项目前面是使用Caffe做的,我现在接手,很多命令都忘记了,做点笔记,记录下基础命令,方便查阅。

基础使用

1. 训练测试 build/tools/caffe 使用

usage: caffe <command> <args>

commands:
  train           train or finetune a model
  test            score a model
  device_query    show GPU diagnostic information
  time            benchmark model execution time

  Flags from /home/ubuntu/src/caffe_python_3/tools/caffe.cpp:
    -gpu (Optional; run in GPU mode on given device IDs separated by ','.Use
      '-gpu all' to run on all available GPUs. The effective training batch
      size is multiplied by the number of devices.) type: string default: ""
    -iterations (The number of iterations to run.) type: int32 default: 50
    -level (Optional; network level.) type: int32 default: 0
    -model (The model definition protocol buffer text file.) type: string
      default: ""
    -phase (Optional; network phase (TRAIN or TEST). Only used for 'time'.)
      type: string default: ""
    -sighup_effect (Optional; action to take when a SIGHUP signal is received:
      snapshot, stop or none.) type: string default: "snapshot"
    -sigint_effect (Optional; action to take when a SIGINT signal is received:
      snapshot, stop or none.) type: string default: "stop"
    -snapshot (Optional; the snapshot solver state to resume training.)
      type: string default: ""
    -solver (The solver definition protocol buffer text file.) type: string
      default: ""
    -stage (Optional; network stages (not to be confused with phase), separated
      by ','.) type: string default: ""
    -weights (Optional; the pretrained weights to initialize finetuning,
      separated by ','. Cannot be set simultaneously with snapshot.)
      type: string default: ""

参数解释:
-gpu: 是否使用gpu
-iterations: 迭代次数
-level: ?
-model: 模型文件
-phase: TRAIN/TEST
-sigint_effect:
-snapshot: 继续上次中断训练的记录文件路径
-solver: 训练配置文件路径
-stage: ?
-weights: 网络权重用来finetune
示例:

../caffe/build/tools/caffe train \
-model /home/ubuntu/Zmoji/traffic_sign/models/traffic_sign_train.prototxt \
-solver /home/ubuntu/Zmoji/traffic_sign/models/solver.prototxt

2. 数据格式转换 build/tools/convert_imageset 使用

./build/tools/convert_imageset 
convert_imageset: Convert a set of images to the leveldb/lmdb
format used as input for Caffe.
Usage:
    convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME

  Flags:
    -backend (The backend {lmdb, leveldb} for storing the result) type: string
      default: "lmdb"
    -check_size (When this option is on, check that all the datum have the same
      size) type: bool default: false
    -encode_type (Optional: What type should we encode the image as
      ('png','jpg',...).) type: string default: ""
    -encoded (When this option is on, the encoded image will be save in datum)
      type: bool default: false
    -gray (When this option is on, treat images as grayscale ones) type: bool
      default: false
    -resize_height (Height images are resized to) type: int32 default: 0
    -resize_width (Width images are resized to) type: int32 default: 0
    -shuffle (Randomly shuffle the order of images and their labels) type: bool
      default: false

FLAGS:
数据转换参数
-backend: 数据格式, lmdb/leveldb
-check_size: 检查是否所有的图片大小都是一致的
-encode_type: 图片编码格式, png/jpg...
-encoded:
-gray: 将图像转换为灰度
-resize_height: 图像高度转换
-resize_width: 图像宽度转换
-shuffle: 随机打乱图像
ROOTFOLDER:
图片保存路径
LISTFILE:
含有图片名及分类的的txt文件
DB_NAME:
生成lmdb/leveldb的保存路径
示例:

echo "convert train set..."
rm -rf train_lmdb
../caffe/build/tools/convert_imageset \
-backend=lmdb \
-resize_height=112 \
-resize_width=112 \
-shuffle=true \
/home/ubuntu/ihandy_seg/data/traffic_sign \
train.txt \
train_lmdb
echo "convert val set..."
rm -rf val_lmdb
../caffe/build/tools/convert_imageset \
-backend=lmdb \
-resize_height=112 \
-resize_width=112 \
-shuffle=true \
/home/ubuntu/ihandy_seg/data/traffic_sign \
val.txt \
val_lmdb

3. 均值文件计算工具 build/tools/compute_image_mean 使用

./build/tools/compute_image_mean 
compute_image_mean: Compute the mean_image of a set of images given by a leveldb/lmdb
Usage:
    compute_image_mean [FLAGS] INPUT_DB [OUTPUT_FILE]

FLAGS:
文件类型, lmdb/leveldb
INPUT_DB:
数据保存路径
OUTPUT_FILE:
生成的均值文件保存路径一般后缀为.binaryproto
示例:

../caffe/build/tools/compute_image_mean train_lmdb train_mean.binaryproto
../caffe/build/tools/compute_image_mean val_lmdb val_mean.binaryproto

4. 特征提取工具 build/tools/extract_feature

This program takes in a trained network and an input data layer,
 and then extract features of the input data produced by the net.
Usage: 
extract_features  \ 
pretrained_net_param \
feature_extraction_proto_file  \
extract_feature_blob_name1[,name2,...]  \
save_feature_dataset_name1[,name2,...]  \
num_mini_batches \
db_type \
[CPU/GPU] [DEVICE_ID=0]
Note: you can extract multiple features in one pass by specifying
 multiple feature blob names and dataset names separated by ','. 
The names cannot contain white space characters and the number of 
blobs and datasets must be equal.

pretrained_net_param:
保存的权重文件
feature_extraction_proto_file:
模型定义文件
extract_feature_blob_name1[,name2,...]:
提取的特征节点名称,可以定义多个
save_feature_dataset_name1[,name2,...]:
提取的特征保存路径
num_mini_batches:
小batch数目
db_type:
生成数据类型lmdb或者leveldb
CPU/GPU:
使用CPU还是GPU
DEVICE_ID:
使用哪一块GPU
那么得到lmdb数据之后如何解析为python数据格式方便处理呢?

import sys
caffe_root = 'path_of_your_caffe'
sys.path.insert(0, caffe_root + 'python')
import caffe
import lmdb

lmdb_env = lmdb.open('directory_containing_lmdb')
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
datum = caffe.proto.caffe_pb2.Datum()
# 得到的data就是提取处理出来的特征, label是前面的file_list.txt中提供的label
for key, value in lmdb_cursor:
    datum.ParseFromString(value)
    label = datum.label
    data = caffe.io.datum_to_array(datum)

参考:
https://stackoverflow.com/questions/33117607/caffe-reading-lmdb-from-python

配置文件

1. model 文件
实例model配置文件

name: "Traffic_Sign_Net"
layer {
    name: "data"
    type: "Data"
    top: "data"
    top: "label"
    include {
        phase: TRAIN
    }
    transform_param {
        mirror: true
        crop_size: 112
        mean_file: "/home/ubuntu/Zmoji/traffic_sign/data/train_mean.binaryproto"
    }
    data_param {
        source: "/home/ubuntu/Zmoji/traffic_sign/data/train_lmdb"
        batch_size: 16
        backend: LMDB
    }
}
layer {
    name: "data"
    type: "Data"
    top: "data"
    top: "label"
    include {
        phase: TEST
    }
    transform_param {
        mirror: false
        crop_size: 112
        mean_file: "/home/ubuntu/Zmoji/traffic_sign/data/train_mean.binaryproto"
    }
    data_param {
        source: "/home/ubuntu/Zmoji/traffic_sign/data/val_lmdb"
        batch_size: 16
        backend: LMDB
    }
}
layer {
  name: "conv1_0"
  type: "Convolution"
  bottom: "data"
  top: "conv1_0"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 7
    stride: 2
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn1_0"
  type: "BatchNorm"
  bottom: "conv1_0"
  top: "bn1_0"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale1_0"
  type: "Scale"
  bottom: "bn1_0"
  top: "scale1_0"
  scale_param {
    bias_term: true
  }
}

layer {
  name: "relu1_0"
  type: "ReLU"
  bottom: "scale1_0"
  top: "relu1_0"
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "relu1_0"
  top: "conv1_1"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn1_1"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "bn1_1"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale1_1"
  type: "Scale"
  bottom: "bn1_1"
  top: "scale1_1"
  scale_param {
    bias_term: true
  }
}

layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "scale1_1"
  top: "relu1_1"
}
layer {
  name: "conv2_0"
  type: "Convolution"
  bottom: "relu1_1"
  top: "conv2_0"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 96
    kernel_size: 3
  pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
  }
}
layer {
  name: "bn2_0"
  type: "BatchNorm"
  bottom: "conv2_0"
  top: "bn2_0"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
  
}
layer {
  name: "scale2_0"
  type: "Scale"
  bottom: "bn2_0"
  top: "scale2_0"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_0"
  type: "ReLU"
  bottom: "scale2_0"
  top: "relu2_0"
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "relu2_0"
  top: "conv2_1"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 96
    kernel_size: 3
    pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
  }
}
layer {
  name: "bn2_1"
  type: "BatchNorm"
  bottom: "conv2_1"
  top: "bn2_1"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale2_1"
  type: "Scale"
  bottom: "bn2_1"
  top: "scale2_1"
  scale_param {
    bias_term: true
  }
}

layer {
  name: "relu2_1"
  type: "ReLU"
  bottom: "scale2_1"
  top: "relu2_1"
}
layer {
  name: "pool2_1"
  type: "Pooling"
  bottom: "relu2_1"
  top: "pool2_1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
    name: "drop2_1"
    type: "Dropout"
    bottom: "pool2_1"
    top: "drop2_1"
    dropout_param {
        dropout_ratio: 0.2
    }
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "drop2_1"
  top: "conv2_2"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 96
    kernel_size: 3
    pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
  }
}
layer {
  name: "bn2_2"
  type: "BatchNorm"
  bottom: "conv2_2"
  top: "bn2_2"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }      
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale2_2"
  type: "Scale"
  bottom: "bn2_2"
  top: "scale2_2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_2"
  type: "ReLU"
  bottom: "scale2_2"
  top: "relu2_2"
}
layer {
  name: "conv3_0"
  type: "Convolution"
  bottom: "relu2_2"
  top: "conv3_0"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 128
    kernel_size: 3
  pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn3_0"
  type: "BatchNorm"
  bottom: "conv3_0"
  top: "bn3_0"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale3_0"
  type: "Scale"
  bottom: "bn3_0"
  top: "scale3_0"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_0"
  type: "ReLU"
  bottom: "scale3_0"
  top: "relu3_0"
}
layer {
  name: "conv4_0"
  type: "Convolution"
  bottom: "relu3_0"
  top: "conv4_0"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 128
    kernel_size: 3
  pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn4_0"
  type: "BatchNorm"
  bottom: "conv4_0"
  top: "bn4_0"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale4_0"
  type: "Scale"
  bottom: "bn4_0"
  top: "scale4_0"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_0"
  type: "ReLU"
  bottom: "scale4_0"
  top: "relu4_0"
}
layer {
  name: "pool4_0"
  type: "Pooling"
  bottom: "relu4_0"
  top: "pool4_0"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "pool4_0"
  top: "conv4_1"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 160
    kernel_size: 3
    pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn4_1"
  type: "BatchNorm"
  bottom: "conv4_1"
  top: "bn4_1"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale4_1"
  type: "Scale"
  bottom: "bn4_1"
  top: "scale4_1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_1"
  type: "ReLU"
  bottom: "scale4_1"
  top: "relu4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "relu4_1"
  top: "conv4_2"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 160
    kernel_size: 3
    pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "bn4_2"
  type: "BatchNorm"
  bottom: "conv4_2"
  top: "bn4_2"
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
  param {
    lr_mult: 0
   decay_mult: 0
  }
    batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
  }
}
layer {
  name: "scale4_2"
  type: "Scale"
  bottom: "bn4_2"
  top: "scale4_2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_2"
  type: "ReLU"
  bottom: "scale4_2"
  top: "relu4_2"
}
layer {
  name: "pool4_2"
  type: "Pooling"
  bottom: "relu4_2"
  top: "pool4_2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv5_0"
  type: "Convolution"
  bottom: "pool4_2"
  top: "conv5_0"
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 256
    kernel_size: 3
  pad: 1
    stride: 1
    bias_term: true
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "cccp5"
  type: "Convolution"
  bottom: "conv5_0"
  top: "cccp5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    kernel_size: 1
    group: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "poolcp5"
  type: "Pooling"
  bottom: "cccp5"
  top: "poolcp5"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 3
  }
}
layer {
  name: "cccp6"
  type: "Convolution"
  bottom: "poolcp5"
  top: "cccp6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    kernel_size: 3
    stride: 2
    pad: 1
    group: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "poolcp6"
  type: "Pooling"
  bottom: "cccp6"
  top: "poolcp6"
  pooling_param {
    pool: AVE
    global_pooling:true
  }
}
layer {
  name: "ip62"
  type: "InnerProduct"
  bottom: "poolcp6"
  top: "ip62"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 62
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name:"prob"
  type: "SoftmaxWithLoss"
  bottom: "ip62"
  bottom: "label"
  top: "prob"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip62"
  bottom: "label"
  top: "accuracy"
}

2. solver 文件
solver文件是协调整个模型运作的文件,里面可以配置迭代次数,训练策略,学习率的更新策略,训练多少次测试一次,使用GPU还是CPU等等.

net: "examples/mnist/lenet_train_test.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.01
momentum: 0.9
type: SGD
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 20000
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: CPU
debug_info: True # 是否输出中间信息

net: 网络模型,网络模型中可以同时配置train和test 模型
test_iter: 遍历整个test数据集的次数,比如你的测试集1000张,test batchsize设置为10, test_iter就应该设置为100。
test_interval: 训练迭代多少次进行一次测试(一般是遍历完一次训练集进行一次测试), 例如batchsize为100, test_interval设置为10, 也就是每完成1000张数据训练测试一次
display: 迭代多少次将记录输出到屏幕
max_iter: 最大训练迭代次数
snapshot: 多少次保存训练的caffemodel和solverstate
snapshot_prefix: 中间文件保存地址
solver_mode: 使用GPU/还是CPU
debug_info:是否将网络每一层的数据输出到屏幕,这个在调试的时候很有用
type: 求解算法,SGD/AdaDelta/AdaGrad/Nesterov/Adam/RMSProp

solver配置的官方详细介绍:
https://github.com/BVLC/caffe/wiki/Solver-Prototxt
另一个参考 solver 文件:

net: "/home/ubuntu/Zmoji/traffic_sign/models/traffic_sign_train.prototxt"
test_iter: 158
test_interval: 286
base_lr: 0.1
momentum: 0.9
weight_decay:  0.005 
type:"AdaDelta"
delta:1e-3
gamma: 0.1
lr_policy: "multistep"
stepvalue: 5000
stepvalue: 9500
stepvalue: 15300
stepvalue: 19800
stepvalue: 22300
stepvalue: 27000
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 40000 
# snapshot intermediate results
snapshot: 2000
snapshot_prefix: "/home/ubuntu/Zmoji/traffic_sign/snapshot/traffic_sign"
# solver mode: CPU or GPU
solver_mode: GPU
random_seed: 786
# debug_info: True

辅助工具

1. 模型.prototxt查看工具
http://ethereon.github.io/netscope/#/editor
2. 将caffe训练时将屏幕输出定向到文本文件并画出训练过程参数变换图
caffe中自带可以画图的工具,在caffe路径下:
./tools/extra/parse_log.sh
./tools/extra/extract_seconds.py
./tools/extra/plot_training_log.py.example

  1. 日志重定向:在训练命令中加入一行参数,实现log日志定向到文件:
    caffe train --sover=/path/to/solver >log/***.log 2>&1
  2. 解析训练数据
    将前面说的三个脚本拷贝到log文件下
    sh parse_log.sh xxxx.log
  3. 生成图片
    python plot_traning_log.py [0-7] save.png xxxx.log
    可以选择的各种图类型:
    6ha3u.md.png
    3. C++和Python之间均值文件计算和转换
    c++中使用的是.binaryproto格式,python中使用的是.npy格式,因此会经常遇到二者之间相互转化的时候。

python .npy > .binaryproto

import caffe
import numpy as np

MEAN_PROTO_PATH = 'mean.binaryproto'               # 待转换的pb格式图像均值文件路径
MEAN_NPY_PATH = 'mean.npy'                         # 转换后的numpy格式图像均值文件路径

blob = caffe.proto.caffe_pb2.BlobProto()           # 创建protobuf blob
data = open(MEAN_PROTO_PATH, 'rb' ).read()         # 读入mean.binaryproto文件内容
blob.ParseFromString(data)                         # 解析文件内容到blob

array = np.array(caffe.io.blobproto_to_array(blob))# 将blob中的均值转换成numpy格式,array的shape (mean_number,channel, hight, width)
mean_npy = array[0]                                # 一个array中可以有多组均值存在,故需要通过下标选择其中一组均值
np.save(MEAN_NPY_PATH ,mean_npy)

已知图像均值构造mean.npy

import numpy as np

MEAN_NPY_PATH = 'mean.npy'

mean = np.ones([3,256, 256], dtype=np.float)
mean[0,:,:] = 104
mean[1,:,:] = 117
mean[2,:,:] = 123

np.save(MEAN_NPY, mean)

4. 观察训练过程中的log
在solver 文件中设置debug_info: true可以看见网络的forward and backward propagation 参数的计算, 可以根据权值和微分计算,帮助调参
如何解释Caffe训练过程出现的log
https://stackoverflow.com/questions/40510706/how-to-interpret-caffe-log-with-debug-info

5. 如何添加新层
配置layer_factory.cpp
https://github.com/runhang/caffe-ssd-windows/issues/1

环境配置

  1. 如何查看CUDA/cudnn版本

CUDA:

cat /usr/local/cuda/version.txt

或者:

nvcc -V

cudnn:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
  1. 如何查看显卡信息
nvidia-smi
//10s显示一次
watch -n 10 nvidia-smi
nvidia-smi.png
表头释义:
Fan:显示风扇转速,数值在0到100%之间,是计算机的期望转速,如果计算机不是通过风扇冷却或者风扇坏了,显示出来就是N/A;
Temp:显卡内部的温度,单位是摄氏度;
Perf:表征性能状态,从P0到P12,P0表示最大性能,P12表示状态最小性能;
Pwr:能耗表示;
Bus-Id:涉及GPU总线的相关信息;
Disp.A:是Display Active的意思,表示GPU的显示是否初始化;
Memory Usage:显存的使用率;
Volatile GPU-Util:浮动的GPU利用率;
Compute M:计算模式;

教程

  1. CS231n: Tips and Tricks for tunning NNs
    https://docs.google.com/presentation/d/183aCHcSq-YsaokZrqI3khuy_zPbehG-XgkyA6L5W4t4/edit#slide=id.g38a7d6b174_18_21
上一篇下一篇

猜你喜欢

热点阅读