cached_property的实现原理及扩展

2020-11-14 本文已影响0人 Yucz

拒绝转载, 原文链接: https://www.jianshu.com/p/b228c37f2fb1

cached_property的实现与扩展

在阅读 Bottle 和 Flask　源码中，都发现了各自实现 cached_property 的身影。这是一个很实用的 descriptor ，可以缓存属性，当不是第一次调用时，直接从缓存中读取，减少函数的调用次数

下面就简单分析下 chached_property 的实现原理，其中涉及到 Python 实例属性的调用流程，如果读者不是很清晰的话，建议先参考上篇文章　Python属性的调用流程

1 Bottle 中的实现

class cached_property(object):
    """ A property that is only computed once per instance and then replaces
        itself with an ordinary attribute. Deleting the attribute resets the
        property. """

    def __init__(self, func):
        update_wrapper(self, func)
        self.func = func

    def __get__(self, obj, cls):
        if obj is None: return self
        value = obj.__dict__[self.func.__name__] = self.func(obj)
        return value

很简单的实现

2 简单的测试

class Test:
    def __init__(self):
        self._count = 100

    @cached_property
    def count(self):
        self._count += 50
        return self._count

t = Test()
# 第一次调用的时候，会执行 count 函数
t.count
Out[4]: 150
# 后面的调用都不会执行 count 函数
t.count
Out[5]: 150
t.count
Out[6]: 150

由测试例子可以看出 count 函数确实只执行了一次，后续访问 t.count 只会从缓存中读取

3 原理分析

根据 cached_property 的实现代码，我们知道 cached_property 是一个 non-data descriptor

第一次调用

当第一次调用 t.count 时，由于 count 存在于 Test.__dict__ 但不是 data descriptor, 然后 count 也不存在于 t.__dict__ 中，

由于 count 是一个 non-descriptor , 根据　属性的访问顺序，会调用到 T.__dict__['count'].__get(obj, cls)函数

t = Test()
Test.__dict__
Out[4]:  # count 存在于类中，但是一个 non-data descriptor
mappingproxy({'__dict__': <attribute '__dict__' of 'Test' objects>,
              '__doc__': None,
              '__init__': <function __main__.Test.__init__>,
              '__module__': '__main__',
              '__weakref__': <attribute '__weakref__' of 'Test' objects>,
              'count': <__main__.cached_property at 0x7f662a6337f0>})
t.__dict__
Out[5]: {'_count': 100} # count 不存在于 t.__dict__

再看下 cached_property 中 __get__ 的实现

    def __get__(self, obj, cls):
        # 如果直接通过 Class 访问，则返回自身
        if obj is None: 
            return self
        # 将 count 放在实例的 instance 中
        value = obj.__dict__[self.func.__name__] = self.func(obj)
        # obj.__dict__: {'_count': 150, 'count': 150}
        print("obj.__dict__:", obj.__dict__)
        return value

__get__ 的实现是，调用传递进来的 count 函数，并将其结果保留在 __dict__ 中，key 为函数名，即 count

所以当第一次调用 t.count 后, obj.__dict__ 中就包含 count 属性了

t.__dict__
Out[5]: {'_count': 150, 'count': 150}

第二次调用

当第二次调用后，根据属性的调用顺序，由于 t.__dict__ 中包含 count属性，所以这次调用不会经过 __get__ 函数，而是直接从 t.__dict__ 中读取，从而达到缓存属性的目的.

4 支持多线程

根据代码分析可知, Bottle 中实现的 cached_property 是不支持多线程的，当我们在多线程场景中使用时，会出现问题。比如说

import time

from threading import Thread

class Test:
    def __init__(self):
        self._count = 100

    @cached_property
    def count(self):
        time.sleep(1)
        self._count += 50
        return self._count

t = Test()
threads = []

for x in range(10):
    thread = Thread(target=lambda : t.count)
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()

# 打印输出: t.count = 600
print("t.count =", t.count)

打印出的结果是 600，说明 t.count 执行了10次，没有达到缓存的功能。

我们重新实现 cached_property, 让其支持多线程，实现代码如下

class cached_property(object):
    def __init__(self, func):
        self.func = func
        self.lock = threading.RLock()

    def __get__(self, obj, cls):
        if obj is None:
            return self

        with self.lock:
            try:
                # 当第二次及之后的调用中，name 已经在 __dict__ 中了，所以直接从里面拿
                return obj.__dict__[self.func.__name__]
            except KeyError:
                # 第一次访问时，name 不在 obj_dict 中，调用 func，再将其放到 __dict__ 中
                return obj.__dict__.setdefault(self.func.__name__, self.func(obj))

然后再去执行上面的测试代码，得出的 count 是 150, 说明多线程测试成功

5 参考链接

https://pypi.org/project/cached-property/