大师兄的Python源码学习笔记(五十五): Python的内存

2022-02-18  本文已影响0人  superkmi

大师兄的Python源码学习笔记(五十四): Python的内存管理机制(九)
大师兄的Python源码学习笔记(五十六): Python的内存管理机制(十一)

五、Python中的垃圾收集

3. 标记——清除方法
Modules/gcmodule.c

/*** list functions ***/

static void
gc_list_init(PyGC_Head *list)
{
    list->gc.gc_prev = list;
    list->gc.gc_next = list;
}
Modules/gcmodule.c

/* append list `from` onto list `to`; `from` becomes an empty list */
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
{
    PyGC_Head *tail;
    assert(from != to);
    if (!gc_list_is_empty(from)) {
        tail = to->gc.gc_prev;
        tail->gc.gc_next = from->gc.gc_next;
        tail->gc.gc_next->gc.gc_prev = tail;
        to->gc.gc_prev = from->gc.gc_prev;
        to->gc.gc_prev->gc.gc_next = to;
    }
    gc_list_init(from);
}
>>> list1 = []
>>> list2 = []
>>> list1.append(list2)
>>> list2.append(list1)
>>> a = list1
>>> list3 = []
>>> list4 = []
>>> list3.append(list4)
>>> list4.append(list3)

2.1 寻找Root Object集合
  • 假设两个对象为A、B;
  • 从A出发,因为它有一个对B的引用,则将B的引用计数减1;
  • 然后顺着引用到B,因为它有一个对A的引用,同样将A的引用计数减1;
  • 通过这种方式完成了循环引用对象间环的摘除。
  • 假设可收集对象链表中的container对象A有一个对对象C的引用,而C并不在这个链表中;
  • 如果将C的引用计数减1,而最后A并没有被回收,则C的被错误的减少了1,这将导致在未来的某个时刻出现一个对C的悬空引用**;
  • 为此,更好的做法是并不改动真是的引用计数,而改动引用计数的副本。
  • 而这个副本的唯一作用就是寻找root object集合,它就是PYGC_HEAD中的gc.gc_ref
  • 在垃圾收集的第一步,就是遍历可收集对象链表,并将每个对象的gc.gc_ref值设置为其ob_refcnt值。
Modules/gcmodule.c

/* Set all gc_refs = ob_refcnt.  After this, gc_refs is > 0 for all objects
 * in containers, and is GC_REACHABLE for all tracked gc objects not in
 * containers.
 */
static void
update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc = gc->gc.gc_next) {
        assert(_PyGCHead_REFS(gc) == GC_REACHABLE);
        _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc)));
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        assert(_PyGCHead_REFS(gc) != 0);
    }
}
Modules/gcmodule.c

/* Subtract internal references from gc_refs.  After this, gc_refs is >= 0
 * for all objects in containers, and is GC_REACHABLE for all tracked gc
 * objects not in containers.  The ones with gc_refs > 0 are directly
 * reachable from outside containers, and so can't be collected.
 */
static void
subtract_refs(PyGC_Head *containers)
{
    traverseproc traverse;
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc=gc->gc.gc_next) {
        traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
        (void) traverse(FROM_GC(gc),
                       (visitproc)visit_decref,
                       NULL);
    }
}
  • 其中的traverse是与特定的container对象相关的,在container对象的类型对象中定义。
  • 一般来说,traverse的工作是遍历container对象中的每一个引用,然后对引用进行某种动作。
  • 这个动作在subtract_refs中就是visit_decref,它以一个回调函数的形式传递到traverse操作中。
上一篇 下一篇

猜你喜欢

热点阅读