Extend Python with C/C++:写属于你的Py

2019-02-20 本文已影响0人 Vophan

最近，在实现SRCNN的时候，我们遇到了一个问题：SRCNN首先要将低分率的图像使用双三次插值的方法放大到原来的大小，但是网上用python实现双三次差值的速度又是慢的惊人，所以，一不做二不休，发扬一下极客精神，写下自己的第一个Python module，但是真正上手后，发现并没有那么简单，因为在GFW中搜索到的内容，都是一些大佬的笔记，整个文章充斥着“这么简单，不用多说了”的氛围，但是，这对于没有接触过Cpython API的小白来说，其实还是很难的，于是，就有了这篇文章，希望大家可以认真阅读，一定会有很大收获。

背景

用C++和C结合Python混合编程，其实早就听说过了，但是碍于技术原因，而且过去没有混合编程的必要，也就没有接触，现在既然有机会，那么就要搞清楚他。
其实，扩展C++和C的方法很多，按照网上说的，有ctypes库，有swig，还有C++的Boost，但是，这些方法我都没有选择：

swig: 各种Bug，不友好，但是使用简单，不用自己手动写包裹函数，以及函数表等等
ctypes：调用方式太丑，逼格没到位，啊哈哈哈
Boost ：Boost库太大，而且有依赖，就做不到随时使用，随时粘合
（一己之见，欢迎指正）
所以，我最后使用了Cpython 的API，最正宗的方式来解决这个问题。

思路

首先，用C++/C来扩展Python很明显的一个问题就是，C/C++（下面简称C）怎么识别Python的数据结构，因为不同的语言，数据结构，关键字什么的都不一样。
这里，Python与C的通信采用了适配器的设计模式，不用去改变Python的底层，也不用改变C，而是在两者之间编写一个适配器来解决这个问题。
我们需要这么几样东西来进行适配：

功能函数: 就是我们想把接口暴露给python的函数
包裹函数：将Python的参数转换为C的参数，并将C的运行结果转换成Python的数据结构
函数表：将这个module的所有函数集合这个表中
库结构：就是Module的结构
库的初始化函数：initialize function
distutils：编译我们的库

实现

包裹函数：

首先将代码贴上来：

//部分代码
static PyObject *BicubicConvert(PyObject *self, PyObject *args) {
    PyObject *seq2d;
    double** d2d;
    double* d;
    double result;
    int seq2dlen;
    int seqlen;
    int i;
    int j;
    double x;
    double y;

    /* get arguments from python */
    if (!PyArg_ParseTuple(args, "Odd", &seq2d, &x, &y)) {
        return 0;
    }
    seq2d = PySequence_Fast(seq2d, "arguments muust be iterable");
    if (!seq2d) {
        return 0;
    }

    seq2dlen = PySequence_Fast_GET_SIZE(seq2d);
    d2d = malloc(seq2dlen*sizeof(double)*4);
    if (!d2d) {
        Py_DECREF(seq2d);
        return PyErr_NoMemory();
    }

    for(i=0;i<seq2dlen;i++) {
        PyObject *fitem2d;
        PyObject *item2d = PySequence_Fast_GET_ITEM(seq2d, i);
        if (!item2d) {
            Py_DECREF(seq2d);
            free(d2d);
            return 0;
        }
        item2d = PySequence_Fast(item2d, "arguments muust be iterable");
        if (!item2d) {
            return 0;
        }

        seqlen = PySequence_Fast_GET_SIZE(item2d);
        d = malloc(seqlen*sizeof(double));
        if (!d) {
        Py_DECREF(item2d);
        return PyErr_NoMemory();
        }

        for (j = 0;j<seqlen;j++) {
            PyObject *fitem;
            PyObject *item = PySequence_Fast_GET_ITEM(item2d, j);
            if (!item) {
                Py_DECREF(item2d);
                free(d);
                return 0;
            }
            fitem = PyNumber_Float(item);
            if (!fitem) {
                Py_DECREF(item2d);
                free(d);
                PyErr_SetString(PyExc_TypeError, "all items must be numbers");
                return 0;
            }
            d2d[i][j] = PyFloat_AS_DOUBLE(fitem);
            Py_DECREF(fitem);
        }

        /* */

        Py_DECREF(item2d);
        result = bicubicInterpolate(d2d, x, y);
        free(d2d);
        return Py_BuildValue("d", result);
    }
}

然后我们来解释一下各自的功能：
首先，必须是静态函数，而且返回值是PyObject*，这个是定义在<Python.h>中的，然后传入的参数必须是：self和args，args是我们传入的参数，而Self则是调用者的标示。
然后，我们使用PyARG_ParseTuple()来解析转换我们的参数，这里有一些需要注意的地方，就是第二个参数格式单元，我们就是用这种Pattern来对C和python数据结构进行转换的

关于格式单元，这里就不展开说了，要想了解更多的内容，大家可以去看Python的官方文档，写的特别的清晰。

继续说，我们可以看到，我们的参数并不是简单的一个int或者double，而是一个二维数组，那么，这里我们就要用到Python中的PySequence_Fast Protocol，通过这个Protocol我们就可以来操作iterable的参数了，也就是我们的list（当然也可以是别的），这里涉及到了：

PySequence_Fast：如果传递的参数是list或者tuple，那么就返回原来的参数，如果是其他的可迭代的内容，就将其看作是list返回。之后就可以用PySequence_Fast_GET_ITEM来操作了。
PySequence_Fast_GET_SIZE：获得可迭代对象的个数
PySequence_Fast_GET_ITEM：获得可迭代对象
Py_DECREF/Py_INCREF：

There are two macros, Py_INCREF(x) and Py_DECREF(x), which handle the incrementing and decrementing of the reference count. Py_DECREF() also frees the object when the count reaches zero. For flexibility, it doesn’t call free() directly — rather, it makes a call through a function pointer in the object’s type object. For this purpose (and others), every object also contains a pointer to its type object.
The big question now remains: when to use Py_INCREF(x) and Py_DECREF(x)? Let’s first introduce some terms. Nobody “owns” an object; however, you can own a reference to an object. An object’s reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling Py_DECREF() when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call Py_DECREF(). Forgetting to dispose of an owned reference creates a memory leak.

简单的说就是控制释放对PyObject引用的函数。

PyErr_SetString，PyErr_NoMemory：这是Cpython实现好的有关错误的函数，第一个是给一个错误类型和string，而第二个是定义的错误类型。这里不展开解释，具体内容请查看intermezzo-errors-and-exceptions
最后一定要提醒大家：一定记得释放内存，小心内存泄露

函数表

static PyMethodDef BicubicMethods [] = {
    {"cubic", cubicConvert, METH_VARARGS,
    "basic method of Bicubic."},
    {"bicubic", BicubicConvert, METH_VARARGS,
    "upsample the image with Bicubic Kernal."},
    {NULL, NULL, 0, NULL}
};

每一个大括号代表一个函数，每个括号中有四个参数：

name:函数名（python调用时）
包裹函数
标示符：表示函数参数个数
说明：也就是doc
一定记得最后要加上一个：
{NULL, NULL, 0, NULL}，他们管这个叫做sentinal,我理解就是结尾的标示吧。

库结构

static struct PyModuleDef Bicubicmodule = {
    PyModuleDef_HEAD_INIT,
    "Bicubic",
    NULL,
    -1,
    BicubicMethods
};

标示
库名
doc
（不懂）
函数表
库的初始化函数

PyMODINIT_FUNC
PyInit_Bicubic(void)
{
    return PyModule_Create(&Bicubicmodule);
}

这里需要注意的是：
这个函数的名字一定是：PyInit_name，这个name就是你的库名，不能乱填。

可能说道这里，大家就会很疑惑，这些东西到底是干嘛的，我怎么才能使用它做一个extension module呢？
别急，这就告诉你：

When the Python program imports module spam for the first time, PyInit_spam() is called. (See below for comments about embedding Python.) It calls PyModule_Create(), which returns a module object, and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDefstructures) found in the module definition. PyModule_Create() returns a pointer to the module object that it creates. It may abort with a fatal error for certain errors, or return NULL if the module could not be initialized satisfactorily. The init function must return the module object to its caller, so that it then gets inserted into sys.modules.

我们可以看到：
实际上就是，当我们python中import 的时候，他调用了我们的PyInit_name这个函数，而这个初始化函数又调用到了 PyModule_Create()这个函数，而这个函数又通过我们的库结构，实例化我们的库对象，然后库结构中有我们的函数表，又根据我们的函数表生成了我们的函数，至此，整个过程就完成了90%了。剩下的就是编译和连接了，我们可以使用gcc手动编译连接，但是这里我们使用，python自带的库，distutils

distutils

我们创建一个叫做setup.py的文件，然后在里面写入：

from distutils.core import setup, Extension

setup(name = "Bicubic", maintainer = "Vophan Lee", maintainer_email =
    "vophanlee@gmail.com", ext_modules = [Extension('Bicubic',sources=['Bicubic.c'])]
)

然后我们调用：

python setup.py build

这样我们就完成了整个过程，我们有了属于我们的库。
下面是结果：

result

参考文章：

Translating a Python Sequence into a C Array with the PySequence_Fast Protocol
Extending Python with C or C++
python如何与C++通信

write your legend

Extend Python with C/C++:写属于你的Py

背景

思路

实现

包裹函数：

函数表

库结构

库的初始化函数

distutils

参考文章：

猜你喜欢

热点阅读