Python 描述符对象 Descriptor Objects

2019-08-06 本文已影响2人 RoyTien

Reproduce from

在 Python 众多原生特性中，描述符可能是最好被自定义的特性之一，但它在底层实现的方法和属性却无时不刻被使用着，它优雅的实现方式体现出 Python 简洁之美。

简介

Python 描述符是一种创建对象属性的方法。描述符具有诸多优点，诸如：保护属性不受修改，属性类型检查，和自动更新某个依赖属性的值等。

定义

一个描述符是一个有绑定属性的对象属性，它的访问控制会被描述器协议方法重写。（In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. ）
描述符协议是以下方法：__get__()，__set__()，和 __delete__()。如果一个类定义了任意这些方法，那么这个类就是描述符。（Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.）
属性访问的默认行为是从对象的字典（object's dictionary）中获取，获取（get），设置（set），或删除（delete）属性。例如，当访问 a.x 时，有一个查找链，开始是 a.__dict__['x']，接着 type(a).__dict__['x']，之后继续在基类中寻找，不包括元类 metaclasses。如果查找的目标属性是一个定义了描述符的类对象，则 Python 会用描述符的方法覆盖默认行为。这种情况发生在在优先级链中发生的时机取决于描述符的哪些方法被定义。（If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined.）
描述符是一种功能强大的通用协议。它是 @properties，methods（方法），@staticmethod（静态方法），@classmethod（类方法），和 super() 背后的机制。（Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and super().）

描述符协议

__get__(self, instance, owner)

def __get__(self, instance, owner):
'''
  :param self: 描述符对象本身
  :param instance: 使用描述符的对象的实例
  :param owner: 使用描述符的对象拥有者
'''

__set__(self, instance, value)

def __set__(self, instance, value):
'''
  :param value: 对描述符的赋值
'''

__delete__(self, instance)

实例

class Descriptor:
    def __init__(self):
        self._name = ''

    def __get__(self, instance, owner):
        return self._name

    def __set__(self, instance, value):
        self._name = name.title()

    def __delete__(self, instance):
        del self._name

class Person:
    name = Descriptor()

为什么需要描述符

Python 是一个动态类型解释性语言，不像 C / Java 等静态编译型语言，数据类型在编译时便可以进行验证，而 Python 中必须添加额外的类型检查逻辑代码才能做到这一点。

假设我们有这样一个类：

class Movie:
    def __init__(self, title, description, score, ticket):
        self.title = title
        self.description = description
        self.score = score
        self.ticket = ticket

这里，电影的分数不能是负分，这个是错误行为，希望 Movie 类可以预防这个问题。

class Movie:
    def __init__(self, title, description, score, ticket):
        self.title = title
        self.description = description
　　　　 self.ticket = ticket
        if score < 0:
            raise ValueError("Negative value not allowed:{}".format(score))
        self.score = scroe

这样修改可以防止初始化对象的时候给电影打负分，但是如果对于已经存在的类实例就无能为力了。如果有人试着运行 movie.score = -1 ，那么谁也没法阻止。

Getter & Setter

实现对于 score 的 getter() 和 setter() 方法来防止 score 小于 0。

class Movie:
    def __init__(self, title, description, score, ticket):
        self.title = title
        self.description = description
　　　　 self.ticket = ticket
        if score < 0:
            raise ValueError("Negative value not allowed:{}".format(score))
        self.score = scroe

    def set_score(self, score):
        if score >= 0:
            self.score = score
        else:
            self.score = 0

    def get_score(self):
        return self.score

但是，大量的 getter() 和 setter() 会导致类型定义的臃肿和逻辑混乱。从 OOP 思想来看，只有属性自己最清楚自己的类型，而不是他所在的类，因此如果能将类型检查的逻辑根植于属性内部，那么就可以解决这个问题 -- @Property。

Property

注意，这里 self._score 才是对象的真正的属性，而 type(Movie.score) 是 Property。每次调用 object.score 实际就是在调用 Property 相应的 getter()，setter()，或是 deleter()。如果在 setter() 中也写的是 self.score = score，则是自己调用自己，陷入不断的递归中。

class Movie:
    def __init__(self, ticket, score):
        self.score = score
        self.ticket = ticket

    @Property
    def score(self):
        return self._score

    @score.setter
    def score(self, score):
        if score < 0:
            raise ValueError("Negative value not allowed:{}".format(score))
        self._score = score

    @score.deleter
    def score(self):
        raise AttributeError("Can not delete score")

Property 的不足

对于 Property 来说，最大的不足就是它们不能重复使用。如果有多个属性需要写为 Property，那么代码 / 重复的逻辑便会出现不少。虽然 Property 可以让类从外部看起来借口整洁漂亮，但是却做不到内部同样整洁漂亮。

Descriptor

如何用描述符来解决上面 Property 逻辑重复的问题。

如果一个实例同时定义了 __get__() 和 __set__()，那就就被认为是数据描述符。如果描述符只定义了 __get__() 就被称为非数据描述符。If an object defines both __get__() and __set__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are typically used for methods but other uses are possible).

数据描述符和非数据描述符不同在于「对于实例字典（dictionary）中的 items/entries 的计算的覆盖（override）」。Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary.

如果实例的字典有一个和数据描述符一样名称的 item/entry，数据描述符优先。如果实例的字典有一个和非数据描述符一样名称的 item/entry，字典中的 item/entry 优先。
If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.

class Integer:
    def __init__(self, name):
        print ('descriptor __init__')
        self.name = name
    
    def __get__(self, instance, owner):
        print ('descriptor __get__')
        return instance.__dict__[self.name]

    def __set__(self, instance, value):
        print ('descriptor __set__')
        if value < 0:
           raise ValueError("Negative value not allowed")
        instance.__dict__[self.name] = value

>>> class Movie:
...     # class attribute
...     score = Integer('score')
...     ticket = Integer('ticket')
...
...     def __init__(self):
...         pass
descriptor __init__
descriptor __init__
>>> movie = Movie()
>>> movie.__dict__['ticket']
KeyError: 'ticket'
>>> movie.ticket = 1
descriptor __set__
>>> movie.ticket
descriptor __get__
1
>>> movie.__dict__['ticket']
1

在调用 movie.ticket = 1 时，descriptor 的 __set__() 使用 instance.__dict__[self.name] = value 在 Movie instance 中添加了新的 attribute 并且赋值。

但是这样有些生硬，所以还缺一个构造函数。

class Integer:
    def __init__(self, name):
        print ('descriptor __init__')
        self.name = name
    
    def __get__(self, instance, owner):
        print ('descriptor __get__')
        return instance.__dict__[self.name]

    def __set__(self, instance, value):
        print ('descriptor __set__')
        if value < 0:
           raise ValueError("Negative value not allowed")
        instance.__dict__[self.name] = value


class Movie:
    # class attribute
    score = Integer('score')
    ticket = Integer('ticket')
    
    def __init__(self):
        # using self.attr to convert class attribute to object attribute
        # and call descriptor __set__()
        self.score = score
        self.ticket = ticket

这样在 get，set，和 delete 属性的时候都会进入的 Integer 的 __get__，__set__，和 __del__ 从而减少了重复的逻辑。

那么 Class 的属性是怎么变为了 instance 的属性呢？在 __init__ 函数里访问的是自己的 self.score 和 self.ticket，怎么和类属性 socre，ticket 关联起来的？它们的调用顺序是怎样的？

Invoking Descriptors 调用描述符

这里我将翻译 Python Descriptor 官方文档，因为结合 MRO，和 Python 魔法方法，这段讲解的已经比较详细了。

描述符可以通过它的名字被直接调用。例如 d.__get__(obj)，Movie.__dict__['ticket'].__get__(m, None)。（A descriptor can be called directly by its method name. For example, d.__get__(obj).）

另外，一般的，描述符的调用自动作为属性调用。例如，obj.d 在 obj 的字典里查找 d，如果 d 定义了 __get__() 方法，那 d.__get__(obj) 就会根据优先原则被调用。（Alternatively, it is more common for a descriptor to be invoked automatically upon attribute access. For example, obj.d looks up d in the dictionary of obj. If d defines the method __get__(), then d.__get__(obj) is invoked according to the precedence rules listed below.）

调用细节取决于 obj 是实例还是类。（The details of invocation depend on whether obj is an object or a class.）

Python 魔法方法指南
先复习一下 Python 的魔法方法

__getattribute__(self, name)
__getattribute__ 只能用新式类。当 obj.attr 访问实例属性时，实际调用的是 __getattribute__。

__getattr__(self, name)
当访问一个根本不存在的（或者暂时不存在）属性时，__getattr__(self, name) 会被调用。

__call__(self, [args...])
当调用一个类时，例如 obj = MyClass()，实际就是调用 MyClass.__call__()。

对于实例，机制 object.__getattribute__() 中，将 b.x 的调用转换为 type(b).__dict__['x'].__get__(b, type(b))。这个实现是通过优先链给予数据描述符比实例的变量更高的优先级，实例的变量的优先级高于非数据描述符，而__getattr__() 的优先级最低。（For objects, the machinery is in object.__getattribute__() which transforms b.x into type(b).__dict__['x'].__get__(b, type(b)). The implementation works through a precedence chain that gives data descriptors priority over instance variables, instance variables priority over non-data descriptors, and assigns lowest priority to __getattr__() if provided. The full C implementation can be found in PyObject_GenericGetAttr() in Objects/object.c.）

对于类，机制在 type.__getattribute__() 中，将 B.x 转换为 B.__dict__['x'].__get__(None, B)。（For classes, the machinery is in type.__getattribute__() which transforms B.x into B.__dict__['x'].__get__(None, B).）

def __getattribute__(self, key):
    "Emulate type_getattro() in Objects/typeobject.c"
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__'):
        return v.__get__(None, self)
    return v

描述符被 __getattribute__() 方法调用（descriptors are invoked by the __getattribute__() method）
覆写 __getattribute__() 可以阻止自动的描述符调用（overriding __getattribute__() prevents automatic descriptor calls）
object.__getattribute__() 和 type.__getattribute__() 对于 __get__() 的调用不同（object.__getattribute__() and type.__getattribute__() make different calls to __get__().）
数据描述符会覆盖实例字典（data descriptors always override instance dictionaries.）
非数据描述符会被实例字典覆盖。（non-data descriptors may be overridden by instance dictionaries.）

被 super() 返回的 object 也有 __getattribute__() 方法用来调用描述符。调用 super(B, obj).m() 会使用 obj.__class__.__mro__ 查找类 B 的基类 A，并返回 A.__dict__['m'].__get__(obj, B)。如果返回的不是描述符，m 返回的就是无变化的（类 A 的变量）。如果不在类 A 的字典中，m 恢复使用 object.__getattribute__() 来搜索。(The object returned by super() also has a custom __getattribute__() method for invoking descriptors. The call super(B, obj).m() searches obj.__class__.__mro__ for the base class A immediately following B and then returns A.__dict__['m'].__get__(obj, B). If not a descriptor, m is returned unchanged. If not in the dictionary, m reverts to a search using object.__getattribute__().)

以上的细节展示了描述符被调用的机制在 object，type，和 super() 中的 __getattribute__() 被实现。来源于 object 的类会继承这个机制或元类提供了相似的功能。另外，类可以禁止描述符的调用通过覆写 __getattribute__()。（The details above show that the mechanism for descriptors is embedded in the __getattribute__() methods for object, type, and super(). Classes inherit this machinery when they derive from object or if they have a meta-class providing similar functionality. Likewise, classes can turn-off descriptor invocation by overriding __getattribute__().）

无论是实例还是类，实际都是在 type.__dict__['x'] 找 descriptor；

内部实际是按照 MRO 顺序，顺着类，父母类一路找，直到找到 descriptor；

找到后，判断是否是 data descriptor；

如果不是 data descriptor, 在查找实例的 dict；

如果实例的 dict 没有，则尝试调用 descriptor 的 __get__()；

调用不成功，调用 __getattr__() 进行错误处理。

源码分析

通过 CPython 源码，可以验证之前官方文档中的说明。

PyObject_GenericGetAttr

PyObject *
PyObject_GenericGetAttr(PyObject *obj, PyObject *name)
{
    return _PyObject_GenericGetAttrWithDict(obj, name, NULL, 0);
}

PyObject *
_PyObject_GenericGetAttrWithDict(PyObject *obj, PyObject *name,
                                 PyObject *dict, int suppress)
{
    /* Make sure the logic of _PyObject_GetMethod is in sync with
       this method.
       When suppress=1, this function suppress AttributeError.
    */

    PyTypeObject *tp = Py_TYPE(obj);
    PyObject *descr = NULL;
    PyObject *res = NULL;
    descrgetfunc f;
    Py_ssize_t dictoffset;
    PyObject **dictptr;

    if (!PyUnicode_Check(name)){
        PyErr_Format(PyExc_TypeError,
                     "attribute name must be string, not '%.200s'",
                     name->ob_type->tp_name);
        return NULL;
    }
    Py_INCREF(name);

    if (tp->tp_dict == NULL) {
        if (PyType_Ready(tp) < 0)
            goto done;
    }

    descr = _PyType_Lookup(tp, name);

    f = NULL;
    if (descr != NULL) {
        Py_INCREF(descr);
        f = descr->ob_type->tp_descr_get;
        if (f != NULL && PyDescr_IsData(descr)) {
            res = f(descr, obj, (PyObject *)obj->ob_type);
            if (res == NULL && suppress &&
                    PyErr_ExceptionMatches(PyExc_AttributeError)) {
                PyErr_Clear();
            }
            goto done;
        }
    }

    if (dict == NULL) {
        /* Inline _PyObject_GetDictPtr */
        dictoffset = tp->tp_dictoffset;
        if (dictoffset != 0) {
            if (dictoffset < 0) {
                Py_ssize_t tsize;
                size_t size;

                tsize = ((PyVarObject *)obj)->ob_size;
                if (tsize < 0)
                    tsize = -tsize;
                size = _PyObject_VAR_SIZE(tp, tsize);
                _PyObject_ASSERT(obj, size <= PY_SSIZE_T_MAX);

                dictoffset += (Py_ssize_t)size;
                _PyObject_ASSERT(obj, dictoffset > 0);
                _PyObject_ASSERT(obj, dictoffset % SIZEOF_VOID_P == 0);
            }
            dictptr = (PyObject **) ((char *)obj + dictoffset);
            dict = *dictptr;
        }
    }
    if (dict != NULL) {
        Py_INCREF(dict);
        res = PyDict_GetItemWithError(dict, name);
        if (res != NULL) {
            Py_INCREF(res);
            Py_DECREF(dict);
            goto done;
        }
        else {
            Py_DECREF(dict);
            if (PyErr_Occurred()) {
                if (suppress && PyErr_ExceptionMatches(PyExc_AttributeError)) {
                    PyErr_Clear();
                }
                else {
                    goto done;
                }
            }
        }
    }

    if (f != NULL) {
        res = f(descr, obj, (PyObject *)Py_TYPE(obj));
        if (res == NULL && suppress &&
                PyErr_ExceptionMatches(PyExc_AttributeError)) {
            PyErr_Clear();
        }
        goto done;
    }

    if (descr != NULL) {
        res = descr;
        descr = NULL;
        goto done;
    }

    if (!suppress) {
        PyErr_Format(PyExc_AttributeError,
                     "'%.50s' object has no attribute '%U'",
                     tp->tp_name, name);
    }
  done:
    Py_XDECREF(descr);
    Py_DECREF(name);
    return res;
}

通过分析上面的源码可以看到：

PyTypeObject *tp = Py_TYPE(obj); 填充了 tp_dict；
之后 descr = _PyType_Lookup(tp, name); 找到 descriptor；
descr->ob_type->tp_descr_get != NULL 和 PyDescr_IsData(descr) 判断数据描述符
不是数据描述符，且 dict != NULL，返回 PyDict_GetItemWithError(dict, name)；
dict 中没有，descr->ob_type->tp_descr_get 返回非数据描述符的 __get__() 方法。

_PyType_Lookup

/* Internal API to look for a name through the MRO.
   This returns a borrowed reference, and doesn't set an exception! */
PyObject *
_PyType_Lookup(PyTypeObject *type, PyObject *name)
{
    PyObject *res;
    int error;
    unsigned int h;
    
    ############
    # 缓存部分代码
    ############

    /* We may end up clearing live exceptions below, so make sure it's ours. */
    assert(!PyErr_Occurred());

    res = find_name_in_mro(type, name, &error);

    ############
    # 剩余代码
    ############
}

可以看到之前的 descr = _PyType_Lookup(tp, name); 是来自于 find_name_in_mro(type, name, &error);，descriptor 是根据 MRO 顺序从类 / 父母类中找到的。

find_name_in_mro

/* Internal API to look for a name through the MRO, bypassing the method cache.
   This returns a borrowed reference, and might set an exception.
   'error' is set to: -1: error with exception; 1: error without exception; 0: ok */
static PyObject *
find_name_in_mro(PyTypeObject *type, PyObject *name, int *error)
{
    Py_ssize_t i, n;
    PyObject *mro, *res, *base, *dict;
    Py_hash_t hash;

    ############
    # 代码
    ############

    /* Look in tp_dict of types in MRO */
    mro = type->tp_mro;

    if (mro == NULL) {
        if ((type->tp_flags & Py_TPFLAGS_READYING) == 0) {
            if (PyType_Ready(type) < 0) {
                *error = -1;
                return NULL;
            }
            mro = type->tp_mro;
        }
        if (mro == NULL) {
            *error = 1;
            return NULL;
        }
    }

    ############
    # 剩余代码
    ############
}

Look in tp_dict of types in MRO，mro = type->tp_mro; 每个类都会有一个 tp_mro，通过这个确定遍历的顺序。

代码的运行映证了上面文档描述的调用描述符顺序。

Property

class Property(fget=None, fset=None, fdel=None, doc=None)

这时我们再回来看比较常用的 Property。

Calling Property() 是一个构建数据描述符的简单的方法，该数据描述符在访问属性时触发函数调用。(Calling Property() is a succinct way of building a data descriptor that triggers function calls upon access to an attribute.)

Property 有两种使用方式，一种是函数模式，一种是装饰器模式。

函数模式

class C:
    def __init__(self):
        self._x = None
    
    def getx(self):
        return self._x

    def setx(self, value):
        self._x = value

    def delx(self):
        del self._x

    x = Property(getx, setx, delx, "I'm the 'x' Property.")

要使用 Property()，首先定义的 class 必须是新式类（object 的子类），Python 3 只有新式类。如果 c 是 C 的实例，c.x 将调用 fget() 在这里就是 getx()，c.x = value 将调用 fset() 在这里就是 setx()，del c.x 将调用 fdel() 在这里就是 delx()，

使用 Property 的好处就是因为在访问属性的时候可以做一些检查。如果没有严格的要求，直接使用实例属性可能更方便。

装饰器模式

class C:
    def __init__(self):
        self._x = None

    @Property
    def x(self):
        return self._x

    @x.setter
    def x(self, value)
        self._x = value

    @x.deleter
    del x(self):
        del self._x

注意：三个函数的名字（也就是将来要访问的属性名）必须一致。

使用 Property 可以非常容易的实现属性的读写控制，如果想要属性只读，则只需要提供 getter 方法。

class C:
    def __init__(self):
        self._x = None

    @Property
    def x(self):
        return self._x

对于描述符，只实现 get 函数的描述符是非数据描述符，根据属性查找的优先级，非数据描述符的优先级是可以被实际属性覆盖（隐藏）的，但是执行如下代码：

>>> c = C()
>>> c.x
>>> c.x = 3
Traceback (most recent call last):
  File "<pyshell#39>", line 1, in <module>
    c.x = 3
AttributeError: can't set attribute

从错误信息中可以看出，c.x = 3 的时候并不是动态产生一个实例属性，也就是说 x 并不是被数据描述符，那么原因是什么呢？原因就在 Property，虽然便面上看属性 x 只设置了 get()，但其实 Property 是一个同时实现了 __get__()，__set__()，__del__() 方法的类（数据描述符）。因此使用 Property 生成的属性其实是一个数据描述符！

使用 Python 模拟的 Property 代码如下，可以看到，上面的 "At Property tributeError: can't set attribute” 异常其实是在 Property 中的 __set__() 中引发的，因为用户没有设置 fset：

class Property(object):
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
            self.__doc__ = doc

        def __get__(self, obj, objtype=None):
            if obj is None:
                return self
            if self.fget is None:
                raise AttributeError("unreadable attribute")
            return self.fget(obj)

        def __set__(self, obj, value):
            if self.fset is None:
                raise AttributeError("can't set attribute")
            self.fset(obj, value)

        def __delete__(self, obj):
            if self.fdel is None:
                raise AttributeError("can't delete attribute")
            self.fdel(obj)

        def getter(self, fget):
            return type(self)(fget, self.fset, self.fdel, self.__doc__)
        def setter(self, fset):
            return type(self)(self.fget, fset, self.fdel, self.__doc__)
        def deleter(self, fdel):
            return type(self)(self.fget, self.fset, fdel, self.__doc__)

Python 描述符对象 Descriptor Objects

简介

定义

描述符协议

实例

为什么需要描述符

Getter & Setter

Property

Property 的不足

Descriptor

Invoking Descriptors 调用描述符

源码分析

PyObject_GenericGetAttr

_PyType_Lookup

find_name_in_mro

Property

函数模式

装饰器模式

猜你喜欢

热点阅读