Redis 内部对象编码转换触发条件：ziplist 与 hashtable 的切换

大家好，今天咱们聊聊 Redis 里面的一个很有意思的现象：ziplist 和 hashtable 之间“身份切换”的故事。就像咱们人类一样，数据在不同的场景下也需要不同的“活法”，Redis 为了极致的性能和内存效率，就搞出了这么一套灵活的编码转换机制。

开场：Redis 的“双面”数据结构

Redis 里面的数据结构，比如 Hash，List，Set，Sorted Set，表面上看起来就那么几个，但实际上，它们内部可没那么简单。为了在不同的场景下达到最佳的性能和内存占用，Redis 偷偷地给它们披上了不同的“马甲”，也就是内部对象编码。

举个例子，一个 Hash，可能一开始用的是 ziplist（压缩列表）来存储，但随着数据的膨胀，它又会“变身”成 hashtable（哈希表）。这种“变身”可不是随便发生的，它是有条件的，今天咱们就来扒一扒 Hash 在 ziplist 和 hashtable 之间切换的那些事儿。

主角登场：ziplist 和 hashtable

在深入转换条件之前，咱们先来简单认识一下今天的主角：ziplist 和 hashtable。

ziplist（压缩列表）：内存紧凑型选手

ziplist 是一种为了节省内存而设计的特殊编码方式。它把数据都紧挨着存储在一块连续的内存空间里，就像一个被压得扁扁的“压缩包”。

优点： 内存利用率高，适合存储小数据。

缺点： 插入、删除操作可能触发连锁更新，性能相对较差。

ziplist 的结构大概是这样的：
```
<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
```
- zlbytes: 整个 ziplist 占用的字节数。
- zltail: 最后一个 entry 的偏移量，方便倒序遍历。
- zllen: entry 的数量。
- entry: 实际存储的数据。
- zlend: ziplist 的结束标志。

hashtable（哈希表）：查找速度担当

hashtable 是一种经典的键值对存储结构，通过哈希函数将键映射到数组的索引，从而实现快速的查找。

优点： 查找速度快，适合存储大数据。

缺点： 内存占用相对较高。

hashtable 的基本结构是这样的：

typedef struct dict {
    dictType *type;  // 类型特定函数
    void *privdata; // 私有数据
    dictht ht[2];   // 两个哈希表，用于 rehash
    long rehashidx; // rehash 索引，-1 表示没有 rehash
    long iterators; // 迭代器数量
} dict;

typedef struct dictht {
    dictEntry **table; // 哈希表数组
    unsigned long size; // 哈希表大小
    unsigned long sizemask; // 哈希表大小掩码，用于计算索引
    unsigned long used; // 已使用的节点数量
} dictht;

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; // 链表指针，解决哈希冲突
} dictEntry;

简单来说，hashtable 就像一个数组，每个数组元素（bucket）指向一个链表，链表里存储着键值对。当不同的键哈希到同一个索引时，就会发生冲突，Redis 使用链地址法来解决冲突。

转换的触发条件：量变引起质变

好，现在主角都登场了，咱们来聊聊 Hash 从 ziplist “变身”成 hashtable 的触发条件。这就像一个人的成长过程，小时候可能穿小码衣服，但长大了肯定要换大码的。

主要有两个参数控制着这个转换：

hash-max-ziplist-entries: ziplist 中允许存储的最大 entry 数量。
hash-max-ziplist-value: ziplist 中允许存储的 value 的最大长度。

这两个参数都在 redis.conf 配置文件中可以找到。默认值是：

hash-max-ziplist-entries 512
hash-max-ziplist-value 64

也就是说，当 Hash 满足以下任意一个条件时，就会触发 ziplist 到 hashtable 的转换：

Hash 中存储的 entry 数量超过 hash-max-ziplist-entries。
Hash 中存储的任意一个 value 的长度超过 hash-max-ziplist-value。

代码演示：模拟转换过程

光说不练假把式，咱们来写一段简单的代码，模拟一下这个转换过程。为了方便演示，这里用 Python 模拟 Redis 的行为。

class ZiplistHash:
    def __init__(self, max_entries, max_value_len):
        self.data = []
        self.max_entries = max_entries
        self.max_value_len = max_value_len

    def add(self, key, value):
        if len(self.data) >= self.max_entries or len(str(value)) > self.max_value_len:
            return False  # 触发转换
        self.data.append((key, value))
        return True

    def __str__(self):
        return str(self.data)

class HashtableHash:
    def __init__(self):
        self.data = {}

    def add(self, key, value):
        self.data[key] = value

    def __str__(self):
        return str(self.data)

def test_hash_conversion(max_entries, max_value_len):
    ziplist_hash = ZiplistHash(max_entries, max_value_len)
    print("初始状态: Ziplist")

    for i in range(max_entries):
        if not ziplist_hash.add(f"key{i}", f"value{i}"):
            print("转换触发！")
            hashtable_hash = HashtableHash()
            for k, v in ziplist_hash.data:
                hashtable_hash.add(k, v)
            print("转换成 Hashtable")
            return hashtable_hash
    # 添加超过最大 entry 的例子
    if not ziplist_hash.add(f"key{max_entries}", f"value{max_entries}"):
            print("转换触发！")
            hashtable_hash = HashtableHash()
            for k, v in ziplist_hash.data:
                hashtable_hash.add(k, v)
            print("转换成 Hashtable")
            return hashtable_hash
    return ziplist_hash

# 模拟 entry 数量超过限制
print("测试1：entry 数量超过限制")
hash_obj = test_hash_conversion(5, 10)  # 限制 5 个 entry
print(hash_obj)

# 模拟 value 长度超过限制
print("n测试2：value 长度超过限制")
hash_obj = test_hash_conversion(3, 5)  # 限制 value 长度为 5
print(hash_obj)

这段代码模拟了 Hash 从 ziplist 到 hashtable 的转换。 ZiplistHash 类模拟 ziplist 的行为，HashtableHash 类模拟 hashtable 的行为。 test_hash_conversion 函数模拟了添加数据，并在满足转换条件时进行转换的过程。

运行结果分析

运行上面的代码，你会看到：

当 entry 数量超过 max_entries 时，程序会打印 "转换触发！"，然后将数据从 ZiplistHash 转移到 HashtableHash。
当 value 的长度超过 max_value_len 时，也会触发类似的转换。

源码剖析：Redis 内部的“变形金刚”

咱们再深入到 Redis 的源码层面，看看它是如何实现这个转换的。相关的代码主要在 t_hash.c 文件中。其中，hashTypeSet 函数负责向 Hash 中添加键值对，它会检查是否需要进行编码转换。

/*
 * Set the specified hash field to the specified value.
 *
 * If the field already exists, overwrite it.
 *
 * Return 1 if the field was added, 0 if it was overwritten.
 *
 * This function will also take care of converting the hash to a more
 * memory efficient representation when needed.
 */
int hashTypeSet(robj *o, sds field, sds value) {
    int update = 0;

    /* Check if the hash needs to be converted to a more efficient
     * representation.  Note that we can't convert to a ziplist if
     * the field or value are too large. */
    if (o->encoding == OBJ_ENCODING_ZIPLIST &&
        hashTypeSetFromZiplist(o, field, value, &update) == C_OK)
    {
        return update;
    } else {
        /* Convert to hashtable if needed.  Note that we can only
         * convert to hashtable if we have enough memory. */
        if (o->encoding == OBJ_ENCODING_ZIPLIST &&
            hashTypeConvert(o, OBJ_ENCODING_HT) == C_ERR)
        {
            /* OOM */
            return -1;
        }
        return hashTypeSetFromHashTable(o, field, value, &update);
    }
}

/* Convert an object encoding to a different one.
 *
 * This function takes care of allocating the new object, populating it
 * with the encoded data from the old object, and finally deallocating
 * the old object.
 *
 * The object 'o' is updated with the new encoding, the new pointer
 * to the object is returned. */
robj *hashTypeConvert(robj *o, int enc) {
    hashTypeIterator *hi;
    void *field, *value;

    if (o->encoding == enc) return o; /* Already in the right encoding. */

    if (o->encoding == OBJ_ENCODING_ZIPLIST) {
        hi = hashTypeInitIterator(o);
        if (enc == OBJ_ENCODING_HT) {
            o->encoding = OBJ_ENCODING_HT;
            o->ptr = dictCreate(&hashDictType,NULL);
            while (hashTypeNext(hi, &field, &value) != C_ERR) {
                sds sfield = sdsdup(field);
                sds svalue = sdsdup(value);
                dictAdd(o->ptr, sfield, svalue);
            }
        } else {
            serverPanic("Unsupported hash conversion");
        }
        hashTypeReleaseIterator(hi);
    } else {
        serverPanic("Unsupported hash conversion");
    }
    return o;
}

这段代码的关键在于：

hashTypeSet 函数首先会判断 Hash 当前的编码方式是否为 ziplist。
如果是 ziplist，它会尝试将新的键值对添加到 ziplist 中。
如果添加失败（因为超过了 hash-max-ziplist-entries 或 hash-max-ziplist-value 的限制），就会调用 hashTypeConvert 函数进行编码转换。
hashTypeConvert 函数会将 ziplist 中的所有数据复制到新的 hashtable 中，然后释放 ziplist 的内存，并将 Hash 对象的编码方式更新为 hashtable。

编码转换的影响：性能的平衡术

ziplist 到 hashtable 的转换，本质上是一种性能的平衡术。 ziplist 在存储小数据时，内存效率很高，但随着数据量的增加，其性能会下降。 hashtable 虽然内存占用较高，但查找速度很快，适合存储大数据。

因此，Redis 会根据数据的实际情况，动态地选择合适的编码方式，以达到最佳的性能和内存利用率。

实战建议：如何优化 Hash 的存储

了解了 Hash 的编码转换机制，咱们就可以根据实际情况，对 Hash 的存储进行优化。

控制 Hash 的大小： 尽量避免 Hash 存储过多的 entry，可以考虑将数据拆分成多个 Hash。
限制 value 的长度： 尽量避免 value 过长，可以考虑对 value 进行压缩或分片。
合理配置参数： 根据实际情况，调整 hash-max-ziplist-entries 和 hash-max-ziplist-value 的值。

总结：Redis 的“变形”智慧

Redis 的内部对象编码转换机制，是一种非常巧妙的设计。它让 Redis 能够根据数据的实际情况，动态地选择合适的存储方式，从而在性能和内存利用率之间找到最佳的平衡点。

下次当你使用 Redis 的 Hash 时，不妨想一想，它现在是用 ziplist 还是 hashtable 存储的呢？也许，它正在默默地进行着一场“变形”呢！

发表回复 取消回复

发表回复取消回复