MNN安卓端深度学习模型推理踩坑记录

2020-01-20 本文已影响0人 qizhen816

这几天做了基于MobileNet的一个身份证区域分割的语义分割小网络，需要部署到移动端，就研究了一下NCNN和MNN，发现NCNN对ONNX一些操作的支持不是很好，resize需要自己写接口修改结构，就被劝退直接投入MNN。
不得不说二者都是很友好的深度学习部署工具，维护快速、有文档也有中文社区，而且优化也彻底，充分发挥移动端的运算潜力。
没有安卓开发经验/C++不太熟练，遇到了一些问题：

使用移动端Opencv的时候（我这里最开始是OpenCV-android-sdk 3.4），undefined reference to `cv::imwrite(cv::String const&, cv::_InputArray const&, std::__ndk1::vector<int, std::__ndk1::allocator<int> > const&)'
这个问题可能是因为你的Opencv是用gnustl编译的，但是NDK改用了 libc++，导致Jni编译报错，验证一下可以打开与CMakeList.txt同级的build.gradle，里面会有"-DANDROID_STL=c++_shared"。
问题解决来自这里，三种方法：要么把项目STL改为"-DANDROID_STL=gnustl_shared"（可能出问题），要么把Opencv用c++_shared编译，再或者下载最新的Opencv4.0.x库，它已经支持 NDK r.18+。在这里建议使用4.0.x。
ninja: build stopped: subcommand failed. error: use of undeclared identifier 'CV_BGR2RGBA'
这个问题是比较新的Opencv版本中，CV_BGR2GRAY等参数已改为COLOR_BGR2GRAY类似形式。
解决：include "opencv2/imgproc.hpp"然后把参数按文件中的命名规则替换
转换tensorflow/ Keras模型Converte Tensorflow's Op batch_normalization_16/cond/FusedBatchNorm , type = FusedBatchNorm, failed, may be some node is not const Segmentation fault (core dumped)BN层没冻住，也就是没有保存训练中的均值方差。
解决：tensorflow中设置BN层的is_training=False，Keras中x = BatchNormalization()(y, training=False)
没有进行NCHW与NHWC格式转换，导致图片交换了维度，输出结果变成九宫格或者完全错误。
解决在MNN中，可以直接复制Tensor来进行对图片进行维度交换，即python中的numpy.transpose()，在这里有说明:
格式转换
以Opencv的HWC（高、宽、通道）格式为例，可以先根据同样NHWC格式的TensorFlow Tensor读进来，再复制到NCHW格式的Caffe Tensor中，自动完成了转换：

    const std::vector<int> inputDims1 = {1, size_img, size_img, 3};
    auto nchwTensor_1 = MNN::Tensor::create<float32_t >(inputDims1, pImg.data, MNN::Tensor::TENSORFLOW);
    g_input_1->copyFromHostTensor(nchwTensor_1);
    g_input_1 = new Tensor(g_input_1, Tensor::CAFFE); //如果网络的输入g_input_1本来就是CAFFE格式，那也不同加这句，会自动转换

5.Keras->TensorFlow模型转换

Start to Convert Other Model Format To MNN Model...
Start to Optimize the MNN Net...
Segmentation fault (core dumped)

首先确认参数有没有加载成功，Batch Normalization有没有冻住，Drop out有没有置0，以及定义网络的输入是不是定义了size：input = Input( shape=(img_h, 280, 1), name='the_input')，在这里如果是定义batch_size的话也会转换失败。
解决：确认以上问题，记得在编译MNNConvert时在ools/converter/CMakeLists.txt中打开[TFMODEL_OPTIMIZE]

把输入图像按照制定宽进行等比缩放并pad0：
首先把config里的wrap调成0填充：

    ImageProcess::Config config_data;
    config_data.filterType = BILINEAR;
    config_data.wrap=ZERO;
    const float mean_vals[1] = {mean_val};
    const float norm_vals[1] = {std_val};
    ::memcpy(config_data.mean, mean_vals, sizeof(mean_vals));
    ::memcpy(config_data.normal, norm_vals, sizeof(norm_vals));
    config_data.sourceFormat = GRAY;
    config_data.destFormat = GRAY;

然后设置矩阵，宽度为按比例赋值：

    int width = img.cols;
    int height = img.rows;
    int tp = img.type();
    auto dims  = o_input->shape();
    int bpp    = dims[1];
    int size_h = dims[2];
    int size_w = dims[3];
    MNN::CV::Matrix trans;
    auto s = 1.0*width/height*size_h;
    trans.postScale(1.0f/s, 1.0f/size_h);
    trans.postScale(width, height);
    pretreat_data->setMatrix(trans);
    pretreat_data->convert(img.data, width, height, 0, input);

编译后的安卓libmnn.so或ibmnn.a太大（大于50mb）：
解决：$ANDROID_NDK/build/cmake/android.toolchain.cmake 里去掉编译选项-g，然后重新编译。
友情赠送语义分割的C++ ArgMax与外接矩形操作（改自网络）：

typedef struct BBox {
    float xmin;
    float ymin;
    float xmax;
    float ymax;
} 
// 数组最大值下标
template<class ForwardIterator>
inline int argmax(ForwardIterator first, ForwardIterator last) {
    return std::distance(first, std::max_element(first, last));
    }
 static BBox post_process(cv::Mat img,const int size_img, const int w, const int h)
 {
    // argmax
     cv::Mat out = cv::Mat::zeros(size_img, size_img, CV_8U);
     for (int h = 0; h < size_img; ++h) {
         for (int w = 0; w < size_img; ++w) {
             float_t *p = (float_t *)img.ptr(h, w); // prob of a point
             out.at<uint8_t>(h, w) = (uint8_t) argmax(p, p + 3);
         }
     }
     // 
     std::vector< std::vector< cv::Point> > contours;
     cv::findContours(
             out,
             contours,
             cv::noArray(),
             CV_RETR_TREE,
             cv::CHAIN_APPROX_SIMPLE
     );
     cv::Rect boundRect;
     boundRect = cv::boundingRect(cv::Mat(contours[0]));
     BBox box;
     auto scale_x = 1.0*w/size_img;
     auto scale_y = 1.0*h/size_img;
     box.xmin = static_cast<int>(boundRect.x*scale_x);
     box.ymin = static_cast<int>(boundRect.y*scale_y);
     box.xmax = static_cast<int>((boundRect.x+boundRect.width)*scale_x);
     box.ymax = static_cast<int>((boundRect.y+boundRect.height)*scale_y);
     return box;

MNN安卓端深度学习模型推理踩坑记录

猜你喜欢

热点阅读