大师兄的Python源码学习笔记(六): Dict对象
2021-03-02 本文已影响0人
superkmi
大师兄的Python源码学习笔记(五): List对象
大师兄的Python源码学习笔记(七): Set对象
一、Python中的字典
- Python中的dict对象是由PyDictObject实现的。
- 由于dict在Python被大量使用,对效率要求极高,所以没有如C中的map一样,采用平衡二叉树,而是采用了哈希表(hash table)。
- 而解决哈希碰撞的方法采用了开放寻址法。
- 以上内容可以参考我的另一篇文章大师兄的数据结构学习笔记(十四): 词典。
1. Python中的词条(entry)
- 在字典中包含的每一个键(key)值(value)对称为一个词条(entry)。
- Python中的entry由PyDictEntry实现。
Objects\dict-common.h typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry;
- me_hash储存哈希值。
- me_key储存键值(key)。
- me_value储存对应的值(value)。
- 由于key和value全都是PyObject,所以理论上放什么都可以。
- 在PyDictObject生存变化中,entry会在Unused、Active、Dummy和Pending四种不同状态间切换:
状态 | 条件 |
---|---|
Unused | me_key和me_value为NULL。 |
Active | me_key和me_value都不为NULL。 |
Dummy | me_key和me_value被删除时,原因是开放寻址法的搜索方式。 |
Pending | me_key不为NULL而me_value为NULL。 |
2. 字典分类
- 在Python3.4后,字典被分为分离字典(split-table dictionaries)与联合字典(combined-table dictonaries)两类。
- 分离字典用来保存对象的__dict__属性,它们的键表被缓存在类型属性中,并允许所有该类型的实例共享值。
- 联合字典直接通过 dict 內建函数与{}生成,大部分字典都是联合字典。
3. 关联容器
- 所谓容器就是PyDictObject对象,一个PyDictObject对象是一堆词条(entry)的集合。
Include\dictobject.h
typedef struct _dictkeysobject PyDictKeysObject;
/* The ma_values pointer is NULL for a combined table
* or points to an array of PyObject* for a split table
*/
typedef struct {
PyObject_HEAD
/* Number of items in the dictionary */
Py_ssize_t ma_used;
/* Dictionary version: globally unique, value change each time
the dictionary is modified */
uint64_t ma_version_tag;
PyDictKeysObject *ma_keys;
/* If ma_values is NULL, the table is "combined": keys and values
are stored in ma_keys.
If ma_values is not NULL, the table is splitted:
keys are stored in ma_keys and values are stored in ma_values */
PyObject **ma_values;
} PyDictObject;
- ma_used用于维护处于Active状态的词条数量。
- ma_version_tag是全局唯一的标识,随字典每次的改变而改变。
- ma_keys指向PyDictKeysObject。
- 如果ma_values为空,这是一个联合字典,键和值储存在ma_keys。
- 如果ma_values不为空,这是一个分离字典,键保存在ma_keys,值保存在ma_values。
Objects\dict-common.h
/* See dictobject.c for actual layout of DictKeysObject */
struct _dictkeysobject {
Py_ssize_t dk_refcnt;
/* Size of the hash table (dk_indices). It must be a power of 2. */
Py_ssize_t dk_size;
/* Function to lookup in the hash table (dk_indices):
- lookdict(): general-purpose, and may return DKIX_ERROR if (and
only if) a comparison raises an exception.
- lookdict_unicode(): specialized to Unicode string keys, comparison of
which can never raise an exception; that function can never return
DKIX_ERROR.
- lookdict_unicode_nodummy(): similar to lookdict_unicode() but further
specialized for Unicode string keys that cannot be the <dummy> value.
- lookdict_split(): Version of lookdict() for split tables. */
dict_lookup_func dk_lookup;
/* Number of usable entries in dk_entries. */
Py_ssize_t dk_usable;
/* Number of used entries in dk_entries. */
Py_ssize_t dk_nentries;
/* Actual hash table of dk_size entries. It holds indices in dk_entries,
or DKIX_EMPTY(-1) or DKIX_DUMMY(-2).
Indices must be: 0 <= indice < USABLE_FRACTION(dk_size).
The size in bytes of an indice depends on dk_size:
- 1 byte if dk_size <= 0xff (char*)
- 2 bytes if dk_size <= 0xffff (int16_t*)
- 4 bytes if dk_size <= 0xffffffff (int32_t*)
- 8 bytes otherwise (int64_t*)
Dynamically sized, SIZEOF_VOID_P is minimum. */
char dk_indices[]; /* char is required to avoid strict aliasing. */
/* "PyDictKeyEntry dk_entries[dk_usable];" array follows:
see the DK_ENTRIES() macro */
};
- dk_refcnt是引用计数。
- dk_size是哈希表的大小,必须是2的倍数。
- dk_lookup是用于查找哈希值的函数。
- dk_usable是可用的词条数。
- dk_nentries已经使用的词条数。
- dk_indices[]是已经存入的词条。
4.字典的类型对象
- PyDictObject的类型对象为PyDict_Type。
Objects\dictobject.c
PyTypeObject PyDict_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"dict",
sizeof(PyDictObject),
0,
(destructor)dict_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
(reprfunc)dict_repr, /* tp_repr */
0, /* tp_as_number */
&dict_as_sequence, /* tp_as_sequence */
&dict_as_mapping, /* tp_as_mapping */
PyObject_HashNotImplemented, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
Py_TPFLAGS_BASETYPE | Py_TPFLAGS_DICT_SUBCLASS, /* tp_flags */
dictionary_doc, /* tp_doc */
dict_traverse, /* tp_traverse */
dict_tp_clear, /* tp_clear */
dict_richcompare, /* tp_richcompare */
0, /* tp_weaklistoffset */
(getiterfunc)dict_iter, /* tp_iter */
0, /* tp_iternext */
mapp_methods, /* tp_methods */
0, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
dict_init, /* tp_init */
PyType_GenericAlloc, /* tp_alloc */
dict_new, /* tp_new */
PyObject_GC_Del, /* tp_free */
};
二、字典的创建
- 创建字典从_PyDict_NewPresized开始,该函数的作用是生成并初始化一个字典。
Objects\dictobject.c
PyObject *
_PyDict_NewPresized(Py_ssize_t minused)
{
const Py_ssize_t max_presize = 128 * 1024;
Py_ssize_t newsize;
PyDictKeysObject *new_keys;
/* There are no strict guarantee that returned dict can contain minused
* items without resize. So we create medium size dict instead of very
* large dict or MemoryError.
*/
if (minused > USABLE_FRACTION(max_presize)) {
newsize = max_presize;
}
else {
Py_ssize_t minsize = ESTIMATE_SIZE(minused);
newsize = PyDict_MINSIZE;
while (newsize < minsize) {
newsize <<= 1;
}
}
assert(IS_POWER_OF_2(newsize));
new_keys = new_keys_object(newsize);
if (new_keys == NULL)
return NULL;
return new_dict(new_keys, NULL);
}
- 首先用一个中间值作为计算字典容量,并计算出字典的实际容量。
const Py_ssize_t max_presize = 128 * 1024; Py_ssize_t newsize; PyDictKeysObject *new_keys; if (minused > USABLE_FRACTION(max_presize)) { newsize = max_presize; } else { Py_ssize_t minsize = ESTIMATE_SIZE(minused); newsize = PyDict_MINSIZE; while (newsize < minsize) { newsize <<= 1; } }
- 之后调用了new_keys_object初始化一个PyDictKeysObject,再生成并返回一个PyDictObject。
Objects\dictobject.c
static PyDictKeysObject *new_keys_object(Py_ssize_t size)
{
PyDictKeysObject *dk;
Py_ssize_t es, usable;
assert(size >= PyDict_MINSIZE);
assert(IS_POWER_OF_2(size));
usable = USABLE_FRACTION(size);
if (size <= 0xff) {
es = 1;
}
else if (size <= 0xffff) {
es = 2;
}
#if SIZEOF_VOID_P > 4
else if (size <= 0xffffffff) {
es = 4;
}
#endif
else {
es = sizeof(Py_ssize_t);
}
if (size == PyDict_MINSIZE && numfreekeys > 0) {
dk = keys_free_list[--numfreekeys];
}
else {
dk = PyObject_MALLOC(sizeof(PyDictKeysObject)
+ es * size
+ sizeof(PyDictKeyEntry) * usable);
if (dk == NULL) {
PyErr_NoMemory();
return NULL;
}
}
DK_DEBUG_INCREF dk->dk_refcnt = 1;
dk->dk_size = size;
dk->dk_usable = usable;
dk->dk_lookup = lookdict_unicode_nodummy;
dk->dk_nentries = 0;
memset(&dk->dk_indices[0], 0xff, es * size);
memset(DK_ENTRIES(dk), 0, sizeof(PyDictKeyEntry) * usable);
return dk;
}
- 主要内容是通过传入的size,检查是否符合生成PyDictKeysObject的条件,如果满足则申请内存生成一个PyDictKeysObject。
- 之后调用new_dict初始化并返回一个PyDictObject。
Objects\dictobject.c
/* Consumes a reference to the keys object */
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
PyDictObject *mp;
assert(keys != NULL);
if (numfree) {
mp = free_list[--numfree];
assert (mp != NULL);
assert (Py_TYPE(mp) == &PyDict_Type);
_Py_NewReference((PyObject *)mp);
}
else {
mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
if (mp == NULL) {
DK_DECREF(keys);
free_values(values);
return NULL;
}
}
mp->ma_keys = keys;
mp->ma_values = values;
mp->ma_used = 0;
mp->ma_version_tag = DICT_NEXT_VERSION();
assert(_PyDict_CheckConsistency(mp));
return (PyObject *)mp;
}
- 根据keys和values判断是从缓冲池获取或者生成新的PyDictObject。
三、字典的查找
- 字典中查找提供了两种策略,分别是lookdict和lookdict_unicode。
- 两种策略实际上使用了相同的算法,lookdict_unicode只是针对字符串搜索的特殊形式。
- lookdict_unicode_nodummy是lookdict_unicode的无dummy状态函数,可以提高搜索效率。
Objects\dictobject.c
static Py_ssize_t _Py_HOT_FUNCTION
lookdict(PyDictObject *mp, PyObject *key,
Py_hash_t hash, PyObject **value_addr)
{
size_t i, mask, perturb;
PyDictKeysObject *dk;
PyDictKeyEntry *ep0;
top:
dk = mp->ma_keys;
ep0 = DK_ENTRIES(dk);
mask = DK_MASK(dk);
perturb = hash;
i = (size_t)hash & mask;
for (;;) {
Py_ssize_t ix = dk_get_index(dk, i);
if (ix == DKIX_EMPTY) {
*value_addr = NULL;
return ix;
}
if (ix >= 0) {
PyDictKeyEntry *ep = &ep0[ix];
assert(ep->me_key != NULL);
if (ep->me_key == key) {
*value_addr = ep->me_value;
return ix;
}
if (ep->me_hash == hash) {
PyObject *startkey = ep->me_key;
Py_INCREF(startkey);
int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
Py_DECREF(startkey);
if (cmp < 0) {
*value_addr = NULL;
return DKIX_ERROR;
}
if (dk == mp->ma_keys && ep->me_key == startkey) {
if (cmp > 0) {
*value_addr = ep->me_value;
return ix;
}
}
else {
/* The dict was mutated, restart */
goto top;
}
}
}
perturb >>= PERTURB_SHIFT;
i = (i*5 + perturb + 1) & mask;
}
Py_UNREACHABLE();
}
- 首先找到哈希表的起点并计算哈希值。
dk = mp->ma_keys; mask = DK_MASK(dk);
- 如果找到相同地址,则找到目标值。
if (ep->me_key == key) { *value_addr = ep->me_value; return ix; }
- 如果哈希值相同,则进行比较。
if (ep->me_hash == hash) { PyObject *startkey = ep->me_key; Py_INCREF(startkey); int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ); Py_DECREF(startkey); if (cmp < 0) { *value_addr = NULL; return DKIX_ERROR; } if (dk == mp->ma_keys && ep->me_key == startkey) { if (cmp > 0) { *value_addr = ep->me_value; return ix; } } else { /* The dict was mutated, restart */ goto top; } }
Objects\dictobject.c
/* Specialized version for string-only keys */
static Py_ssize_t _Py_HOT_FUNCTION
lookdict_unicode(PyDictObject *mp, PyObject *key,
Py_hash_t hash, PyObject **value_addr)
{
assert(mp->ma_values == NULL);
/* Make sure this function doesn't have to handle non-unicode keys,
including subclasses of str; e.g., one reason to subclass
unicodes is to override __eq__, and for speed we don't cater to
that here. */
if (!PyUnicode_CheckExact(key)) {
mp->ma_keys->dk_lookup = lookdict;
return lookdict(mp, key, hash, value_addr);
}
PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
size_t mask = DK_MASK(mp->ma_keys);
size_t perturb = (size_t)hash;
size_t i = (size_t)hash & mask;
for (;;) {
Py_ssize_t ix = dk_get_index(mp->ma_keys, i);
if (ix == DKIX_EMPTY) {
*value_addr = NULL;
return DKIX_EMPTY;
}
if (ix >= 0) {
PyDictKeyEntry *ep = &ep0[ix];
assert(ep->me_key != NULL);
assert(PyUnicode_CheckExact(ep->me_key));
if (ep->me_key == key ||
(ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
*value_addr = ep->me_value;
return ix;
}
}
perturb >>= PERTURB_SHIFT;
i = mask & (i*5 + perturb + 1);
}
Py_UNREACHABLE();
}
- 如果不是unicode,则直接调用lookdict。
if (!PyUnicode_CheckExact(key)) { mp->ma_keys->dk_lookup = lookdict; return lookdict(mp, key, hash, value_addr); }
- 获取索引位置对象。
Py_ssize_t ix = dictkeys_get_index(mp->ma_keys, i);
- 如果找到相同的key,则获得映射的value。
if (ep->me_key == key || (ep->me_hash == hash && unicode_eq(ep->me_key, key))) { *value_addr = ep->me_value; return ix; }
Objects\dictobject.c
/* Faster version of lookdict_unicode when it is known that no <dummy> keys
* will be present. */
static Py_ssize_t _Py_HOT_FUNCTION
lookdict_unicode_nodummy(PyDictObject *mp, PyObject *key,
Py_hash_t hash, PyObject **value_addr)
{
assert(mp->ma_values == NULL);
/* Make sure this function doesn't have to handle non-unicode keys,
including subclasses of str; e.g., one reason to subclass
unicodes is to override __eq__, and for speed we don't cater to
that here. */
if (!PyUnicode_CheckExact(key)) {
mp->ma_keys->dk_lookup = lookdict;
return lookdict(mp, key, hash, value_addr);
}
PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
size_t mask = DK_MASK(mp->ma_keys);
size_t perturb = (size_t)hash;
size_t i = (size_t)hash & mask;
for (;;) {
Py_ssize_t ix = dk_get_index(mp->ma_keys, i);
assert (ix != DKIX_DUMMY);
if (ix == DKIX_EMPTY) {
*value_addr = NULL;
return DKIX_EMPTY;
}
PyDictKeyEntry *ep = &ep0[ix];
assert(ep->me_key != NULL);
assert(PyUnicode_CheckExact(ep->me_key));
if (ep->me_key == key ||
(ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
*value_addr = ep->me_value;
return ix;
}
perturb >>= PERTURB_SHIFT;
i = mask & (i*5 + perturb + 1);
}
Py_UNREACHABLE();
}
- 与lookdict_unicode类似,区别是增加了dummy状态检查。
assert (ix != DKIX_DUMMY);
四、字典的插入
- 插入调用了字典对象的
tp_as_mapping
方法集的mp_ass_subscript
方法。
Objects\dictobject.c
static PyMappingMethods dict_as_mapping = {
(lenfunc)dict_length, /*mp_length*/
(binaryfunc)dict_subscript, /*mp_subscript*/
(objobjargproc)dict_ass_sub, /*mp_ass_subscript*/
};
- 从
dict_ass_sub
方法中可以看出,删除key对应PyDict_DelItem
方法,而设置key对应PyDict_SetItem
方法。
Objects\dictobject.c
static int
dict_ass_sub(PyDictObject *mp, PyObject *v, PyObject *w)
{
if (w == NULL)
return PyDict_DelItem((PyObject *)mp, v);
else
return PyDict_SetItem((PyObject *)mp, v, w);
}
Objects\dictobject.c
/* CAUTION: PyDict_SetItem() must guarantee that it won't resize the
* dictionary if it's merely replacing the value for an existing key.
* This means that it's safe to loop over a dictionary with PyDict_Next()
* and occasionally replace a value -- but you can't insert new keys or
* remove them.
*/
int
PyDict_SetItem(PyObject *op, PyObject *key, PyObject *value)
{
PyDictObject *mp;
Py_hash_t hash;
if (!PyDict_Check(op)) {
PyErr_BadInternalCall();
return -1;
}
assert(key);
assert(value);
mp = (PyDictObject *)op;
if (!PyUnicode_CheckExact(key) ||
(hash = ((PyASCIIObject *) key)->hash) == -1)
{
hash = PyObject_Hash(key);
if (hash == -1)
return -1;
}
/* insertdict() handles any resizing that might be necessary */
return insertdict(mp, key, hash, value);
}
- PyDict_SetItem首先判断传入的值
op
是否是字典类型。if (!PyDict_Check(op)) { PyErr_BadInternalCall(); return -1; }
- 然后判断哈希值是否为-1。
if (!PyUnicode_CheckExact(key) || (hash = ((PyASCIIObject *) key)->hash) == -1) { hash = PyObject_Hash(key); if (hash == -1) return -1; }
- 在PyDict_SetItem的结尾处,调用insertdict完成插入动作。
Objects\dictobject.c
/*
Internal routine to insert a new item into the table.
Used both by the internal resize routine and by the public insert routine.
Returns -1 if an error occurred, or 0 on success.
*/
static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
PyObject *old_value;
PyDictKeyEntry *ep;
Py_INCREF(key);
Py_INCREF(value);
if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) {
if (insertion_resize(mp) < 0)
goto Fail;
}
Py_ssize_t ix = mp->ma_keys->dk_lookup(mp, key, hash, &old_value);
if (ix == DKIX_ERROR)
goto Fail;
assert(PyUnicode_CheckExact(key) || mp->ma_keys->dk_lookup == lookdict);
MAINTAIN_TRACKING(mp, key, value);
/* When insertion order is different from shared key, we can't share
* the key anymore. Convert this instance to combine table.
*/
if (_PyDict_HasSplitTable(mp) &&
((ix >= 0 && old_value == NULL && mp->ma_used != ix) ||
(ix == DKIX_EMPTY && mp->ma_used != mp->ma_keys->dk_nentries))) {
if (insertion_resize(mp) < 0)
goto Fail;
ix = DKIX_EMPTY;
}
if (ix == DKIX_EMPTY) {
/* Insert into new slot. */
assert(old_value == NULL);
if (mp->ma_keys->dk_usable <= 0) {
/* Need to resize. */
if (insertion_resize(mp) < 0)
goto Fail;
}
Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
ep->me_key = key;
ep->me_hash = hash;
if (mp->ma_values) {
assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL);
mp->ma_values[mp->ma_keys->dk_nentries] = value;
}
else {
ep->me_value = value;
}
mp->ma_used++;
mp->ma_version_tag = DICT_NEXT_VERSION();
mp->ma_keys->dk_usable--;
mp->ma_keys->dk_nentries++;
assert(mp->ma_keys->dk_usable >= 0);
assert(_PyDict_CheckConsistency(mp));
return 0;
}
if (_PyDict_HasSplitTable(mp)) {
mp->ma_values[ix] = value;
if (old_value == NULL) {
/* pending state */
assert(ix == mp->ma_used);
mp->ma_used++;
}
}
else {
assert(old_value != NULL);
DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
}
mp->ma_version_tag = DICT_NEXT_VERSION();
Py_XDECREF(old_value); /* which **CAN** re-enter (see issue #22653) */
assert(_PyDict_CheckConsistency(mp));
Py_DECREF(key);
return 0;
Fail:
Py_DECREF(value);
Py_DECREF(key);
return -1;
}
- 首先value不为空,则重新设置字典的容量。
if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) { if (insertion_resize(mp) < 0) goto Fail; }
- 之后调用查询方法dk_lookup(这里对应lookdict_unicode_nodummy),查询值在字典中是否已存在,返回Entry的index。
Py_ssize_t ix = mp->ma_keys->dk_lookup(mp, key, hash, &old_value); if (ix == DKIX_ERROR) goto Fail;
- 查询key和value是否需要加入垃圾回收。
MAINTAIN_TRACKING(mp, key, value);
- 查看是否是分离表,并且查看缓冲池中是否有旧值。
/* When insertion order is different from shared key, we can't share * the key anymore. Convert this instance to combine table. */ if (_PyDict_HasSplitTable(mp) && ((ix >= 0 && old_value == NULL && mp->ma_used != ix) || (ix == DKIX_EMPTY && mp->ma_used != mp->ma_keys->dk_nentries))) { if (insertion_resize(mp) < 0) goto Fail; ix = DKIX_EMPTY; }
- 如果是新值,则执行插入动作。
if (ix == DKIX_EMPTY) { /* Insert into new slot. */ assert(old_value == NULL); if (mp->ma_keys->dk_usable <= 0) { /* Need to resize. */ if (insertion_resize(mp) < 0) goto Fail; } Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash); ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries]; dk_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries); ep->me_key = key; ep->me_hash = hash; if (mp->ma_values) { assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL); mp->ma_values[mp->ma_keys->dk_nentries] = value; } else { ep->me_value = value; } mp->ma_used++; mp->ma_version_tag = DICT_NEXT_VERSION(); mp->ma_keys->dk_usable--; mp->ma_keys->dk_nentries++; assert(mp->ma_keys->dk_usable >= 0); assert(_PyDict_CheckConsistency(mp)); return 0; }
- 如果是分离表,直接对应影射的value。
if (_PyDict_HasSplitTable(mp)) { mp->ma_values[ix] = value; if (old_value == NULL) { /* pending state */ assert(ix == mp->ma_used); mp->ma_used++; } }
- 否则使用旧的value。
else { assert(old_value != NULL); DK_ENTRIES(mp->ma_keys)[ix].me_value = value; }
五、字典的删除
- 在插入中已经提到,删除调用PyDict_DelItem方法。
Objects\dictobject.c
int
PyDict_DelItem(PyObject *op, PyObject *key)
{
Py_hash_t hash;
assert(key);
if (!PyUnicode_CheckExact(key) ||
(hash = ((PyASCIIObject *) key)->hash) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return -1;
}
return _PyDict_DelItem_KnownHash(op, key, hash);
}
- 检查key的合法性,并获得哈希值后,继续调用了内部函数_PyDict_DelItem_KnownHash。
Objects\dictobject.c
int
_PyDict_DelItem_KnownHash(PyObject *op, PyObject *key, Py_hash_t hash)
{
Py_ssize_t ix;
PyDictObject *mp;
PyObject *old_value;
if (!PyDict_Check(op)) {
PyErr_BadInternalCall();
return -1;
}
assert(key);
assert(hash != -1);
mp = (PyDictObject *)op;
ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
if (ix == DKIX_ERROR)
return -1;
if (ix == DKIX_EMPTY || old_value == NULL) {
_PyErr_SetKeyError(key);
return -1;
}
// Split table doesn't allow deletion. Combine it.
if (_PyDict_HasSplitTable(mp)) {
if (dictresize(mp, DK_SIZE(mp->ma_keys))) {
return -1;
}
ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value);
assert(ix >= 0);
}
return delitem_common(mp, hash, ix, old_value);
}
- 与插入类似,查询传入的
op
是否是字典类型,并搜索词条。if (!PyDict_Check(op)) { PyErr_BadInternalCall(); return -1; } assert(key); assert(hash != -1); mp = (PyDictObject *)op; ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value); if (ix == DKIX_ERROR) return -1; if (ix == DKIX_EMPTY || old_value == NULL) { _PyErr_SetKeyError(key); return -1; }
- 由于分离字典不能删除值,如果是分离字典,则转换为联合字典。
if (_PyDict_HasSplitTable(mp)) { if (dictresize(mp, DK_SIZE(mp->ma_keys))) { return -1; } ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &old_value); assert(ix >= 0); }
- 之后调用delitem_common方法,执行具体的删除操作。
Objects\dictobject.c
static int
delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix,
PyObject *old_value)
{
PyObject *old_key;
PyDictKeyEntry *ep;
Py_ssize_t hashpos = lookdict_index(mp->ma_keys, hash, ix);
assert(hashpos >= 0);
mp->ma_used--;
mp->ma_version_tag = DICT_NEXT_VERSION();
ep = &DK_ENTRIES(mp->ma_keys)[ix];
dk_set_index(mp->ma_keys, hashpos, DKIX_DUMMY);
ENSURE_ALLOWS_DELETIONS(mp);
old_key = ep->me_key;
ep->me_key = NULL;
ep->me_value = NULL;
Py_DECREF(old_key);
Py_DECREF(old_value);
assert(_PyDict_CheckConsistency(mp));
return 0;
}
六、字典的缓冲池
- 与PyListObject类似,PyDictObject同样使用了缓冲池。
Objects\dictobject.c
#ifndef PyDict_MAXFREELIST
#define PyDict_MAXFREELIST 80
#endif
static PyDictObject *free_list[PyDict_MAXFREELIST];
static int numfree = 0;
static PyDictKeysObject *keys_free_list[PyDict_MAXFREELIST];
static int numfreekeys = 0;
- 缓冲池的机制也与List类似,在PyDictObject销毁时,将空的PyDictObject装入。
Objects\dictobject.c
static void
dict_dealloc(PyDictObject *mp)
{
PyObject **values = mp->ma_values;
PyDictKeysObject *keys = mp->ma_keys;
Py_ssize_t i, n;
/* bpo-31095: UnTrack is needed before calling any callbacks */
PyObject_GC_UnTrack(mp);
Py_TRASHCAN_SAFE_BEGIN(mp)
if (values != NULL) {
if (values != empty_values) {
for (i = 0, n = mp->ma_keys->dk_nentries; i < n; i++) {
Py_XDECREF(values[i]);
}
free_values(values);
}
DK_DECREF(keys);
}
else if (keys != NULL) {
assert(keys->dk_refcnt == 1);
DK_DECREF(keys);
}
if (numfree < PyDict_MAXFREELIST && Py_TYPE(mp) == &PyDict_Type)
free_list[numfree++] = mp;
else
Py_TYPE(mp)->tp_free((PyObject *)mp);
Py_TRASHCAN_SAFE_END(mp)
}
- 在创建新的PyDictObject时,优先从缓冲池中取出。
Objects\dictobject.c
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
PyDictObject *mp;
assert(keys != NULL);
if (numfree) {
mp = free_list[--numfree];
assert (mp != NULL);
assert (Py_TYPE(mp) == &PyDict_Type);
_Py_NewReference((PyObject *)mp);
}
else {
mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
if (mp == NULL) {
DK_DECREF(keys);
free_values(values);
return NULL;
}
}
mp->ma_keys = keys;
mp->ma_values = values;
mp->ma_used = 0;
mp->ma_version_tag = DICT_NEXT_VERSION();
assert(_PyDict_CheckConsistency(mp));
return (PyObject *)mp;
}