Redis 内部数据结构详解

Redis 存储效率的追求-ziplist

2019-08-22  本文已影响0人  多多的大白

前面章节介绍过zskiplist ,其中多次提到ziplist。主要说明redis sorted set 结构为了节约内存开销,在数据量比较小的情况,会使用ziplist 来作为数据结构。下面我们主要介绍zip的结构以及这样的结构为了节省内存做了哪些的约定。
作为一直用高级语言Java来讨生活的笔者来说,一直认为这种数据结构的思路是可以运用到我们实际的项目中,值得我们去思考一番。

1、什么是ziplist

Redis 官方在源码上注释如:

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient. It stores both strings and integer values,
 * where integers are encoded as actual integers instead of a series of
 * characters. It allows push and pop operations on either side of the list
 * in O(1) time. However, because every operation requires a reallocation of
 * the memory used by the ziplist, the actual complexity is related to the
 * amount of memory used by the ziplist.

上面说明来自redis-4.0-14 ziplist.c ,该文件上有对ziplist 做了比较详细的解释,包含ziplist的结构。
翻译大意就是:

ziplist 是为了存储效率提供的一种经过特殊编码的双向链表。它用于存储字符串和整型,其中整型用二进制来进行编码,它能以O(1)的时间复杂度在表的两端提供push和pop操作。

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))
/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
    unsigned int bytes = ZIPLIST_HEADER_SIZE+1;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}

官方解释ziplist 是一个双向链表,我们都知道一个普通的链表 每一项都占用独立的内存空间,然后通过指针或者引用来进行关联,这会导致大量的内存碎片。ziplist占用一整块的内存区域,从上面创建ziplist,我们就可以看出,也可以理解它就是一个表,而非链表。

2、ziplist 结构

ziplist 并不像前面章节所写的zskiplist、sds 等有自定义的结构,从ziplistNew代码就可以看出来。ziplist 为了节省内存在很多细节也做了约定,后面有很多地方可以看出。

ziplist的内存结构如下

/**
 * The general layout of the ziplist is as follows:
 *  <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
 **/
 *
 * NOTE: all fields are stored in little endian, if not specified otherwise.
 *
 * <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that
 * the ziplist occupies, including the four bytes of the zlbytes field itself.
 * This value needs to be stored to be able to resize the entire structure
 * without the need to traverse it first.
 *
 * <uint32_t zltail> is the offset to the last entry in the list. This allows
 * a pop operation on the far side of the list without the need for full
 * traversal.
 *
 * <uint16_t zllen> is the number of entries. When there are more than
 * 2^16-2 entires, this value is set to 2^16-1 and we need to traverse the
 * entire list to know how many items it holds.
 *
 * <uint8_t zlend> is a special entry representing the end of the ziplist.
 * Is encoded as a single byte equal to 255. No other normal entry starts
 * with a byte set to the value of 255.

各个部分在内存中都是连续的,ziplist.c 注释上都对内存结构进行了解释,下面看下各个部分的定义:

下面我们在看下entry的构成

 /** Every entry in the ziplist is prefixed by metadata that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the entry encoding is
 * provided. It represents the entry type, integer or string, and in the case
 * of strings it also represents the length of the string payload.
 * So a complete entry is stored like this:
 **/
  <prevlen> <encoding> <entry-data>
/** The length of the previous entry, <prevlen>, is encoded in the following way:
 * If this length is smaller than 254 bytes, it will only consume a single
 * byte representing the length as an unsinged 8 bit integer. When the length
 * is greater than or equal to 254, it will consume 5 bytes. The first byte is
 * set to 254 (FE) to indicate a larger value is following. The remaining 4
 * bytes take the length of the previous entry as value.
 *
 * So practically an entry is encoded in the following way:
 *
 * <prevlen from 0 to 253> <encoding> <data>
 *
 * Or alternatively if the previous entry length is greater than 253 bytes
 * the following encoding is used:
 *
 * 0xFE <4 bytes unsigned little endian prevlen> <encoding> <data>
**/

这句话的意思是:有俩种可能,一种是1个字节表示,一种是5个字节表示。
1、如果前一项字节数小于254的话,prevlen就用一个字节表示。
2、如果大于253的话,就用5个字节表示,第一个字节为FE (即254),后面4个字节真正存储前一个数据项的占用字节数。

  1. |00pppppp| - 1 byte。第1个字节最高两个bit是00,那么<encoding>字段只有1个字节,剩余的6个bit用来表示字符串的长度值,最高可以表示63 (2^6-1)。
  2. |01pppppp|qqqqqqqq| - 2 bytes。第1个字节最高两个bit是01,那么<encoding>字段占2个字节,总共有14个bit用来表示字符串的长度值,最高可以表示16383 (2^14-1)。
  3. 10__|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes。第1个字节最高两个bit是10,那么len字段占5个字节,总共使用32个bit来表示字符串的长度值(6个bit舍弃不用),最高可以表示2^32-1。
    在这三种情况下,<data>都是按字符串来存储的;从下面第4种情况开始,<data>开始变为按整数来存储了。
  4. |11000000| - 1 byte。<encoding>字段占用1个字节,值为0xC0,后面的数据<data>存储为2个字节的int16_t类型。
  5. |11010000| - 1 byte。<encoding>字段占用1个字节,值为0xD0,后面的数据<data>存储为4个字节的int32_t类型。
  6. |11100000| - 1 byte。<encoding>字段占用1个字节,值为0xE0,后面的数据<data>存储为8个字节的int64_t类型。
  7. |11110000| - 1 byte。<encoding>字段占用1个字节,值为0xF0,后面的数据<data>存储为3个字节长的整数。
  8. |11111110| - 1 byte。<encoding>字段占用1个字节,值为0xFE,后面的数据<data>存储为1个字节的整数。
  9. |1111xxxx| - - (xxxx的值在0001和1101之间)。这是一种特殊情况,xxxx从1到13一共13个值,这时就用这13个值来表示真正的数据。注意,这里是表示真正的数据,而不是数据长度了。也就是说,在这种情况下,后面不再需要一个单独的<data>字段来表示真正的数据了,而是<encoding>和<data>合二为一了。另外,由于xxxx只能取0001和1101这13个值了(其它可能的值和其它情况冲突了,比如0000和1110分别同前面第7种第8种情况冲突,1111跟结束标记冲突),而小数值应该从0开始,因此这13个值分别表示0到12,即xxxx的值减去1才是它所要表示的那个整数数据的值。
demo

做个简单的demo。可以根据上图的解释理解。

3、连锁更新

上面我们对ziplist存储结构已经清楚了,就是存储一块大的连续的内存块里面,如果针对这样一个结构进行插入数据、删除数据,看着都觉得很麻烦。我们可以想象下插入数据的一个过程,需要计算插入的位置,还需要计算entry里面的prevlen,prevlen的规则又比较复杂。

entry

假设我们现在有上图这样一个ziplist存储,其中entry1-entryN的data 都是小于254
entry1-entryN的prevlen的长度都占用1个字节。
如果在此时我们插入了一个数据 长度大于254 暂且叫它entryM吧,插入entry1 的前面,这时会发现我们entry1-entryN 的prevlen长度都必须从1个字节变成5个字节(前面介绍过prevlen,它是一个变长编码),整个ziplist 就会发生连锁反应,全部都得变,大量的内存拷贝性能肯定会急剧下降。作为做内存存储的redis ,我们肯定会认为是灾难。
其实并没有我们想象的那么糟糕,Redis 肯定早就考虑过这样的情况,所以才敢大胆的这么玩。
下面介绍为什么可以不用这么慌:

zset-max-ziplist-entries 128
zset-max-ziplist-value 64

这俩个配置参数在前面章节《Redis zskiplist》多次介绍过,sorted set 在数据量比较小的情况下使用ziplist ,数据量达到上面的配置参数就直接用zskiplist来进行存储了。

hash-max-ziplist-entries 512
hash-max-ziplist-value 64

这俩个配置参数是在使用Hash 情况的ziplist转换,当数据量达到上面的数据,ziplist 就会转换成dist。

当然,这些配置写redis.conf里面,意味着我们可以随意配置。但是在清楚这4个配置参数的情况下,我们肯定不会随性的配置这里面的值。

综合上面的介绍:
1.压缩列表里要恰好有多个连续的、长度在250-253 字节之间的节点连锁更新才有可能被引发, 在实际中, 这种情况并不多见。
2.即使出现连锁更新, 但只要被更新的节点数量不多, 就不会对性能造成较大的影响。

4、ziplist的插入

有这么多规矩和约定的ziplist,想必插入操作肯定是很复杂的。下面我们介绍下
ziplist的插入API----ziplistInsert。

/* Insert item at "p". */
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry tail;
    /* Find out prevlen for the entry that is inserted. */
    if (p[0] != ZIP_END) {
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }
    /* See if the entry can be encoded */
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        /* 'encoding' is set to the appropriate integer encoding */
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
         * string length to figure out how to encode it. */
        reqlen = slen;
    }
    /* We need space for both the length of the previous entry and
     * the length of the payload. */
    reqlen += zipStorePrevEntryLength(NULL,prevlen);
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);
    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    int forcelarge = 0;
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }
    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;
    /* Apply memory move when necessary and update tail offset. */
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);
        /* Encode this entry's raw length in the next entry. */
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);
        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);
        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }
    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }
    /* Write the entry */
    p += zipStorePrevEntryLength(p,prevlen);
    p += zipStoreEntryEncoding(p,encoding,slen);
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

我们来简单解析一下这段代码:

ziplist API
unsigned char *ziplistNew(void);
unsigned char *ziplistMerge(unsigned char **first, unsigned char **second);
unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where);
unsigned char *ziplistIndex(unsigned char *zl, int index);
unsigned char *ziplistNext(unsigned char *zl, unsigned char *p);
unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p);
unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen);
unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p);
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip);
unsigned int ziplistLen(unsigned char *zl);

5、总结

上一篇下一篇

猜你喜欢

热点阅读