Redis入门4-数据库

2021-10-10 Redis Redis 0 Comments

服务器中的数据库

Redis服务器将所有数据库都保存在redis.h/redisServer 结构的 db数组 中，每个redisDb结构代表一个数据库

//Redis服务结构
struct redisServer {
  //...
  //一个数组，保存着服务器中所有的数据库
  redisDb *db;
  //初始化时候 根据dbnum 决定创建多少个数据库
  int dbnum;
  
  //...
}

//单个数据库结构
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

默认情况下，会创建 dbnum=16 个数据库

每个Redis客户端都有自己的目标数据库，默认情况下，Redis客户端的目标数据库为0号数据库

不同的客户端是怎么选定目标数据库的？不同数据库之间又不会进行数据同步，都用同一个数据库其他不就没有用了

可以通过 SELECT N 来切换目标数据库

数据库健空间

redisDb结构的dict 字典保存了数据数据库中的所有键值对，称这个字典为键空间(key space)

键空间的键也就是数据库的键，每个键都是一个字符串对象
键空间的值也就是数据库的值，每个值可以是字符串对象、列表对象、哈希表对象、集合对象和有序集合对洗那个中的任意一种Redis对象

因为键空间是一个字段，所以所有针对数据库的实际都是对键空间字段进行操作来实现的

Redis 键空间是怎么解决Hash冲突的

除了读写还有一些维护操作

读取一个值之后，服务器会更新键的LRU(最近一次使用)时间，可以用来计算键的闲置时间。使用 Object idletime key 来查看键的空闲时间
如果服务器读取键已经过期，则会先删除这个键再进行其他操作
如果客户端使用WATCH 命令监视某个键，当键被修改，服务器会将键标记为脏(ditry)，从而让事务直到这个键被修改
服务器每次修改一个键之后，都会对脏(dirty)键计数器的值增1，这个计数器会触发服务器的持久化以及复制操作
如果服务器开启了数据库通知功能，那么对键修改后，会根据配置发送相应的数据库通知

键的生存时间与过期时间

redisDb结构的expires字典保存了数据库中所有键的过期时间，称为字典的过期字典

过期字典的键是一个指针，指针指向键空间中国暖的某个键对象(即是某个数据库键)
过期字典的值是一个long long 类型的整数，这个整数保存了键所有指向数据库键的过期时间–一个毫秒精度的unix 时间戳

过期键的判断

通过过期字典，程序可以通过以下步骤检查一个给定键是否过期

检查给顶键是否存在与过期字典；如果存在，那么取得键的过期时间
检查当前UNIX时间戳是否大于键的过期时间

过期键删除策略

数据键的过期时间都保存在过期字典中，如果一个键过期了，那么它是什么时候被删除的，可能存在三种不同的删除策略

定时删除：在设置键的过期时间的同时，创建一个定时器(timer)，让定时器在键的过期时间来临时，立即执行对键的删除操作
惰性删除：放任键过期不管，但每次从键空间中获取键时，都检查取得的键是否过期。过期删除，否则返回该键
定期删除：每隔一段时间，程序就对数据库进行一次检查，删除里面的过期键。至于要删除多少过期键，以及要检查多少个数据库，则由算法决定

定时删除

定时删除策略对内存是最友好的：通过使用定时器，定时删除策略可以保证过期键会尽可能快的删除，并释放过期键锁占用的内存

缺点：

对CPU时间是最不友好的：在过期键比较多的情况下，删除过期键可能会占用相当一部分CPU时间，在内存不紧张但是CPU时间非常紧张的情况下，将CPU时间用在删除和当前任务无关的过期键上。五一对服务器的响应时间和吞吐量造成影响
创建一个定时器需要用到Redis服务器中的时间事件，而当前时间事件的实现方式–无序链表，查找一个事件的时间复杂度为O(N)–并不能高效地处理大量时间事件

惰性删除

惰性删除对CPU时间是最友好的：程序只会在取出键时才对键进行过期检查

缺点：对内存是最不友好的，如果一个键已经过期，而这个键又仍然保留在数据库中，那么只要这个过期键不被删除，它锁占用的内存就不会释放

定期删除

上面的定时删除和惰性删除在单一使用时都有明显的缺陷：

定时删除占用太多CPU时间，影响服务器的响应时间和吞吐量
惰性删除浪费太多内存，有内存泄露的危险

定期删除策略是前两种策略的一种整合和折中：每隔一段时间执行一次删除过期键的操作，并通过限制删除操作执行的时长和频率来减少删除操作对CPU时间的影响

问题是：如何设置定期删除的时间间隔，以及删除操作的执行时长

如果删除太频繁或者执行时间太长，定期删除策略就会退化成定时删除策略，以至于将CPU过多的浪费在删除过期键上
如果删除操作执行的太少或时间太短，定期删除策略又会和惰性删除策略一样，出现浪费内存的情况

Redis 的过期删除策略

Redis 实际使用的是惰性删除和定期删除两种策略

惰性删除策略：所有读写的Redis命令在执行前都调用 expireIfNeeded 函数对键进行检查，如果键过期，函数将键从数据库删除

int expireIfNeeded(redisDb *db, robj *key) {
    if (!keyIsExpired(db,key)) return 0;

    /* If we are running in the context of a slave, instead of
     * evicting the expired key from the database, we return ASAP:
     * the slave key expiration is controlled by the master that will
     * send us synthesized DEL operations for expired keys.
     *
     * Still we try to return the right information to the caller,
     * that is, 0 if we think the key should be still valid, 1 if
     * we think the key is expired at this time. */
    if (server.masterhost != NULL) return 1;

    /* If clients are paused, we keep the current dataset constant,
     * but return to the client what we believe is the right state. Typically,
     * at the end of the pause we will properly expire the key OR we will
     * have failed over and the new primary will send us the expire. */
    if (checkClientPauseTimeoutAndReturnIfPaused()) return 1;

    /* Delete the key */
    deleteExpiredKeyAndPropagate(db,key);
    return 1;
}

注意上面的 server.masterhost != NULL 如果不为空，说明当前服务器为备机。那么即使当前键是过期的，也仅仅返回后状态1，而不会进行下面的删除操作

定时删除策略：每当Redis的服务器周期性操作 serverCron 函数执行时，activeExpireCycle 函数就会被调用，在规定时间内，分多次遍历服务器中各个数据库，从数据库的 expires 字典中随机检查一部分键的过期时间，并删除其中的过期键

函数每次运行时，都从一定数量的数据库中取出一定数量的随机键进行检查，并删除其中的过期键
全局变量 current_db记录当前 activeExpireCycle函数检查的进度，并在下一次 activeExpireCycle 函数调用时，接着上一次的进度进行处理

void activeExpireCycle(int type) {
    /* Adjust the running parameters according to the configured expire
     * effort. The default effort is 1, and the maximum configurable effort
     * is 10. */
    unsigned long
    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
                                  2*effort,
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
                                    effort;

    /* This function has some global state in order to continue the work
     * incrementally across calls. */
    static unsigned int current_db = 0; /* Next DB to test. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (checkClientPauseTimeoutAndReturnIfPaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit
         * for time limit, unless the percentage of estimated stale keys is
         * too high. Also never repeat a fast cycle for the same period
         * as the fast cycle total duration itself. */
        if (!timelimit_exit &&
            server.stat_expired_stale_perc < config_cycle_acceptable_stale)
            return;

        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
     * time per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = config_cycle_fast_duration; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea
     * about the number of keys that are already logically expired, but still
     * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;

    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        /* Expired and checked in a single loop. */
        unsigned long expired, sampled;

        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle there are still
         * a big percentage of keys to expire, compared to the number of keys
         * we scanned. The percentage, stored in config_cycle_acceptable_stale
         * is not fixed, but depends on the Redis configured "expire effort". */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();

            /* When there are less than 1% filled slots, sampling the key
             * space is expensive, so stop here waiting for better times...
             * The dictionary will be resized asap. */
            if (slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys
             * with an expire set, checking for expired ones. */
            expired = 0;
            sampled = 0;
            ttl_sum = 0;
            ttl_samples = 0;

            if (num > config_keys_per_loop)
                num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table
             * for speed concerns: this makes this code coupled with dict.c,
             * but it hardly changed in ten years.
             *
             * Note that certain places of the hash table may be empty,
             * so we want also a stop condition about the number of
             * buckets that we scanned. However scanning for free buckets
             * is very fast: we are in the cache line scanning a sequential
             * array of NULL pointers, so we can scan a lot more buckets
             * than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;

            while (sampled < num && checked_buckets < max_buckets) {
                for (int table = 0; table < 2; table++) {
                    if (table == 1 && !dictIsRehashing(db->expires)) break;

                    unsigned long idx = db->expires_cursor;
                    idx &= db->expires->ht[table].sizemask;
                    dictEntry *de = db->expires->ht[table].table[idx];
                    long long ttl;

                    /* Scan the current bucket of the current table. */
                    checked_buckets++;
                    while(de) {
                        /* Get the next entry now since this entry may get
                         * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet
                             * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }
                }
                db->expires_cursor++;
            }
            total_expired += expired;
            total_sampled += sampled;

            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples.
                 * We just use the current estimate with a weight of 2%
                 * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to
             * expire. So after a given amount of milliseconds return to the
             * caller waiting for the other active expire cycle. */
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle for the current database if there are
             * an acceptable amount of stale keys (logically expired but yet
             * not reclaimed). */
        } while (sampled == 0 ||
                 (expired*100/sampled) > config_cycle_acceptable_stale);
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}

AOF、RDB和复制功能对过期键的处理

生成RDB文件：执行 SAVE 或 BGSAVE 创建一个新的RDB文件时，程序会对数据库中的键进行检查，已过期的键不会被保存到新创建的RDB文件中
载入RDB文件：在启动Redis服务器是，如果服务器开启了RDB功能，那么服务器将对RDB文件进行载入
- 如果服务器是以主服务器模式运行，那么载入RDB文件时，程序会对文件中保存键进行检查，过期键将被忽略。
- 如果服务器是已从服务器模式运行，那么载入RDB文件时，所有键不论是否过期，都将被载入到数据库中。因为主从服务器在进行数据同步时候，从服务器的数据库就会被清空，过期键对载入RDB文件的从服务器不会造成影响
AOF文件写入：当服务器以AOF持久化模式运行时，如果数据库中的某个键已经过期，但还没有被惰性或定期删除，AOF文件不会产生任何变化。如果被删除策略删除后，程序会向AOF文件追加AOF文件一个DEL命令，来显示的记录该键被删除
AOF重写：在执行AOF重写过程中，程序会对数据库中的键进行检查，已过期的键不会被保存到重写后的AOF文件中
复制：当服务器运行在复制模式下，从服务器的过期键删除动作由主服务器控制
- 主服务器删除一个过期键之后，会限制地向所有从服务器发送一个DEL命令，告知从服务器删除这个过期键
- 从服务器在执行客户端发送的读命令式，即使碰到过期键页不会将过期键删除，而是继续将处理未过期的键一样来处理过期键
- 从服务器只有在街道主服务器发来的DEL命令之后，才会删除过期键

robj *lookupKeyReadWithFlags(redisDb *db, robj *key, int flags) {
    robj *val;

    if (expireIfNeeded(db,key) == 1) {
        /* If we are in the context of a master, expireIfNeeded() returns 1
         * when the key is no longer valid, so we can return NULL ASAP. */
        if (server.masterhost == NULL)
            goto keymiss;

        //如果备机是只读模式，那么返回的就是空值
        if (server.current_client &&
            server.current_client != server.master &&
            server.current_client->cmd &&
            server.current_client->cmd->flags & CMD_READONLY)
        {
            goto keymiss;
        }
    }
    val = lookupKey(db,key,flags);
    if (val == NULL)
        goto keymiss;
    server.stat_keyspace_hits++;
    return val;

keymiss:
    if (!(flags & LOOKUP_NONOTIFY)) {
        notifyKeyspaceEvent(NOTIFY_KEY_MISS, "keymiss", key, db->id);
    }
    server.stat_keyspace_misses++;
    return NULL;
}

如果从服务器是只读模式，那么返回的就是空值

否则即使键值已经过期，但是从服务器仍然能够返回该值

总结

Redis服务器所有数据库都保存在 redisServer.db数组中，而数据库的数量则由 redisServer.dbnum属性保存
客户端通过修改目标数据库指针，让它指向redisServer.db数组中的不同元素来切换不同的数据库
数据库主要有dict 和 expires 两个字段构成，其中 dict字典负责保存键值对，而 expires 字典则负责保存键的过期时间
数据库的键总是一个字符串对象，而值则可以是任意一种Redis对象类型
expires 字典的键指向数据库中的某个键，而值则记录了数据库键的过期时间，过期时间是一个以毫秒为单位的UNIX时间戳
Redis 使用惰性删除和定期删除两种策略来删除过期的键
当一个过期键被删除之后，服务器会追加一条DEL命令到现在AOF文件的末尾，显示地删除过期键。从服务器即使发现过期键也不会主动删除它，而是等待主节点发来DEL命令

参考文献

《Redis 设计与实现》

Yuan Kang

Learn and live