Redis入门4-数据库

服务器中的数据库

Redis服务器将所有数据库都保存在redis.h/redisServer 结构的 db数组 中,每个redisDb结构代表一个数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
//Redis服务结构
struct redisServer {
//...
//一个数组,保存着服务器中所有的数据库
redisDb *db;
//初始化时候 根据dbnum 决定创建多少个数据库
int dbnum;

//...
}

//单个数据库结构
typedef struct redisDb {
dict *dict; /* The keyspace for this DB */
dict *expires; /* Timeout of keys with a timeout set */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
dict *ready_keys; /* Blocked keys that received a PUSH */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id; /* Database ID */
long long avg_ttl; /* Average TTL, just for stats */
unsigned long expires_cursor; /* Cursor of the active expire cycle. */
list *defrag_later; /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

image-20211010103110558

默认情况下,会创建 dbnum=16 个数据库

每个Redis客户端都有自己的目标数据库,默认情况下,Redis客户端的目标数据库为0号数据库

不同的客户端是怎么选定目标数据库的?不同数据库之间又不会进行数据同步,都用同一个数据库其他不就没有用了

可以通过 SELECT N 来切换目标数据库

数据库健空间

redisDb结构的dict 字典保存了数据数据库中的所有键值对,称这个字典为键空间(key space)

  • 键空间的键也就是数据库的键,每个键都是一个字符串对象

  • 键空间的值也就是数据库的值,每个值可以是字符串对象、列表对象、哈希表对象、集合对象和有序集合对洗那个中的任意一种Redis对象

image-20211010110136744

因为键空间是一个字段,所以所有针对数据库的实际都是对键空间字段进行操作来实现的

Redis 键空间是怎么解决Hash冲突的

除了读写还有一些维护操作

  • 读取一个值之后,服务器会更新键的LRU(最近一次使用)时间,可以用来计算键的闲置时间。使用 Object idletime key 来查看键的空闲时间
  • 如果服务器读取键已经过期,则会先删除这个键再进行其他操作
  • 如果客户端使用WATCH 命令监视某个键,当键被修改,服务器会将键标记为脏(ditry),从而让事务直到这个键被修改
  • 服务器每次修改一个键之后,都会对脏(dirty)键计数器的值增1,这个计数器会触发服务器的持久化以及复制操作
  • 如果服务器开启了数据库通知功能,那么对键修改后,会根据配置发送相应的数据库通知

键的生存时间与过期时间

redisDb结构的expires字典保存了数据库中所有键的过期时间,称为字典的过期字典

  • 过期字典的键是一个指针,指针指向键空间中国暖的某个键对象(即是某个数据库键)
  • 过期字典的值是一个long long 类型的整数,这个整数保存了键所有指向数据库键的过期时间–一个毫秒精度的unix 时间戳

过期键的判断

通过过期字典,程序可以通过以下步骤检查一个给定键是否过期

  1. 检查给顶键是否存在与过期字典;如果存在,那么取得键的过期时间

  2. 检查当前UNIX时间戳是否大于键的过期时间

过期键删除策略

数据键的过期时间都保存在过期字典中,如果一个键过期了,那么它是什么时候被删除的,可能存在三种不同的删除策略

  • 定时删除:在设置键的过期时间的同时,创建一个定时器(timer),让定时器在键的过期时间来临时,立即执行对键的删除操作
  • 惰性删除:放任键过期不管,但每次从键空间中获取键时,都检查取得的键是否过期。过期删除,否则返回该键
  • 定期删除:每隔一段时间,程序就对数据库进行一次检查,删除里面的过期键。至于要删除多少过期键,以及要检查多少个数据库,则由算法决定
定时删除

定时删除策略对内存是最友好的:通过使用定时器,定时删除策略可以保证过期键会尽可能快的删除,并释放过期键锁占用的内存

缺点:

  1. 对CPU时间是最不友好的:在过期键比较多的情况下,删除过期键可能会占用相当一部分CPU时间,在内存不紧张但是CPU时间非常紧张的情况下,将CPU时间用在删除和当前任务无关的过期键上。五一对服务器的响应时间和吞吐量造成影响
  2. 创建一个定时器需要用到Redis服务器中的时间事件,而当前时间事件的实现方式–无序链表,查找一个事件的时间复杂度为O(N)–并不能高效地处理大量时间事件
惰性删除

惰性删除对CPU时间是最友好的:程序只会在取出键时才对键进行过期检查

缺点:对内存是最不友好的,如果一个键已经过期,而这个键又仍然保留在数据库中,那么只要这个过期键不被删除,它锁占用的内存就不会释放

定期删除

上面的定时删除和惰性删除在单一使用时都有明显的缺陷:

  1. 定时删除占用太多CPU时间,影响服务器的响应时间和吞吐量
  2. 惰性删除浪费太多内存,有内存泄露的危险

定期删除策略是前两种策略的一种整合和折中:每隔一段时间执行一次删除过期键的操作,并通过限制删除操作执行的时长和频率来减少删除操作对CPU时间的影响

问题是:如何设置定期删除的时间间隔,以及删除操作的执行时长

  • 如果删除太频繁或者执行时间太长,定期删除策略就会退化成定时删除策略,以至于将CPU过多的浪费在删除过期键上
  • 如果删除操作执行的太少或时间太短,定期删除策略又会和惰性删除策略一样,出现浪费内存的情况

Redis 的过期删除策略

Redis 实际使用的是 惰性删除和定期删除两种策略

惰性删除策略:所有读写的Redis命令在执行前都调用 expireIfNeeded 函数对键进行检查,如果键过期,函数将键从数据库删除

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int expireIfNeeded(redisDb *db, robj *key) {
if (!keyIsExpired(db,key)) return 0;

/* If we are running in the context of a slave, instead of
* evicting the expired key from the database, we return ASAP:
* the slave key expiration is controlled by the master that will
* send us synthesized DEL operations for expired keys.
*
* Still we try to return the right information to the caller,
* that is, 0 if we think the key should be still valid, 1 if
* we think the key is expired at this time. */
if (server.masterhost != NULL) return 1;

/* If clients are paused, we keep the current dataset constant,
* but return to the client what we believe is the right state. Typically,
* at the end of the pause we will properly expire the key OR we will
* have failed over and the new primary will send us the expire. */
if (checkClientPauseTimeoutAndReturnIfPaused()) return 1;

/* Delete the key */
deleteExpiredKeyAndPropagate(db,key);
return 1;
}

注意上面的 server.masterhost != NULL 如果不为空,说明当前服务器为备机。那么即使当前键是过期的,也仅仅返回后状态1,而不会进行下面的删除操作

定时删除策略:每当Redis的服务器周期性操作 serverCron 函数执行时,activeExpireCycle 函数就会被调用,在规定时间内,分多次遍历服务器中各个数据库,从数据库的 expires 字典中随机检查一部分键的过期时间,并删除其中的过期键

  • 函数每次运行时,都从一定数量的数据库中取出一定数量的随机键进行检查,并删除其中的过期键
  • 全局变量 current_db记录当前 activeExpireCycle函数检查的进度,并在下一次 activeExpireCycle 函数调用时,接着上一次的进度进行处理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
void activeExpireCycle(int type) {
/* Adjust the running parameters according to the configured expire
* effort. The default effort is 1, and the maximum configurable effort
* is 10. */
unsigned long
effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
2*effort,
config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
effort;

/* This function has some global state in order to continue the work
* incrementally across calls. */
static unsigned int current_db = 0; /* Next DB to test. */
static int timelimit_exit = 0; /* Time limit hit in previous call? */
static long long last_fast_cycle = 0; /* When last fast cycle ran. */

int j, iteration = 0;
int dbs_per_call = CRON_DBS_PER_CALL;
long long start = ustime(), timelimit, elapsed;

/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (checkClientPauseTimeoutAndReturnIfPaused()) return;

if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
/* Don't start a fast cycle if the previous cycle did not exit
* for time limit, unless the percentage of estimated stale keys is
* too high. Also never repeat a fast cycle for the same period
* as the fast cycle total duration itself. */
if (!timelimit_exit &&
server.stat_expired_stale_perc < config_cycle_acceptable_stale)
return;

if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
return;

last_fast_cycle = start;
}

/* We usually should test CRON_DBS_PER_CALL per iteration, with
* two exceptions:
*
* 1) Don't test more DBs than we have.
* 2) If last time we hit the time limit, we want to scan all DBs
* in this iteration, as there is work to do in some DB and we don't want
* expired keys to use memory for too much time. */
if (dbs_per_call > server.dbnum || timelimit_exit)
dbs_per_call = server.dbnum;

/* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
* time per iteration. Since this function gets called with a frequency of
* server.hz times per second, the following is the max amount of
* microseconds we can spend in this function. */
timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
timelimit_exit = 0;
if (timelimit <= 0) timelimit = 1;

if (type == ACTIVE_EXPIRE_CYCLE_FAST)
timelimit = config_cycle_fast_duration; /* in microseconds. */

/* Accumulate some global stats as we expire keys, to have some idea
* about the number of keys that are already logically expired, but still
* existing inside the database. */
long total_sampled = 0;
long total_expired = 0;

for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
/* Expired and checked in a single loop. */
unsigned long expired, sampled;

redisDb *db = server.db+(current_db % server.dbnum);

/* Increment the DB now so we are sure if we run out of time
* in the current DB we'll restart from the next. This allows to
* distribute the time evenly across DBs. */
current_db++;

/* Continue to expire if at the end of the cycle there are still
* a big percentage of keys to expire, compared to the number of keys
* we scanned. The percentage, stored in config_cycle_acceptable_stale
* is not fixed, but depends on the Redis configured "expire effort". */
do {
unsigned long num, slots;
long long now, ttl_sum;
int ttl_samples;
iteration++;

/* If there is nothing to expire try next DB ASAP. */
if ((num = dictSize(db->expires)) == 0) {
db->avg_ttl = 0;
break;
}
slots = dictSlots(db->expires);
now = mstime();

/* When there are less than 1% filled slots, sampling the key
* space is expensive, so stop here waiting for better times...
* The dictionary will be resized asap. */
if (slots > DICT_HT_INITIAL_SIZE &&
(num*100/slots < 1)) break;

/* The main collection cycle. Sample random keys among keys
* with an expire set, checking for expired ones. */
expired = 0;
sampled = 0;
ttl_sum = 0;
ttl_samples = 0;

if (num > config_keys_per_loop)
num = config_keys_per_loop;

/* Here we access the low level representation of the hash table
* for speed concerns: this makes this code coupled with dict.c,
* but it hardly changed in ten years.
*
* Note that certain places of the hash table may be empty,
* so we want also a stop condition about the number of
* buckets that we scanned. However scanning for free buckets
* is very fast: we are in the cache line scanning a sequential
* array of NULL pointers, so we can scan a lot more buckets
* than keys in the same time. */
long max_buckets = num*20;
long checked_buckets = 0;

while (sampled < num && checked_buckets < max_buckets) {
for (int table = 0; table < 2; table++) {
if (table == 1 && !dictIsRehashing(db->expires)) break;

unsigned long idx = db->expires_cursor;
idx &= db->expires->ht[table].sizemask;
dictEntry *de = db->expires->ht[table].table[idx];
long long ttl;

/* Scan the current bucket of the current table. */
checked_buckets++;
while(de) {
/* Get the next entry now since this entry may get
* deleted. */
dictEntry *e = de;
de = de->next;

ttl = dictGetSignedIntegerVal(e)-now;
if (activeExpireCycleTryExpire(db,e,now)) expired++;
if (ttl > 0) {
/* We want the average TTL of keys yet
* not expired. */
ttl_sum += ttl;
ttl_samples++;
}
sampled++;
}
}
db->expires_cursor++;
}
total_expired += expired;
total_sampled += sampled;

/* Update the average TTL stats for this database. */
if (ttl_samples) {
long long avg_ttl = ttl_sum/ttl_samples;

/* Do a simple running average with a few samples.
* We just use the current estimate with a weight of 2%
* and the previous estimate with a weight of 98%. */
if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
}

/* We can't block forever here even if there are many keys to
* expire. So after a given amount of milliseconds return to the
* caller waiting for the other active expire cycle. */
if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
elapsed = ustime()-start;
if (elapsed > timelimit) {
timelimit_exit = 1;
server.stat_expired_time_cap_reached_count++;
break;
}
}
/* We don't repeat the cycle for the current database if there are
* an acceptable amount of stale keys (logically expired but yet
* not reclaimed). */
} while (sampled == 0 ||
(expired*100/sampled) > config_cycle_acceptable_stale);
}

elapsed = ustime()-start;
server.stat_expire_cycle_time_used += elapsed;
latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

/* Update our estimate of keys existing but yet to be expired.
* Running average with this sample accounting for 5%. */
double current_perc;
if (total_sampled) {
current_perc = (double)total_expired/total_sampled;
} else
current_perc = 0;
server.stat_expired_stale_perc = (current_perc*0.05)+
(server.stat_expired_stale_perc*0.95);
}

AOF、RDB和复制功能对过期键的处理

  1. 生成RDB文件:执行 SAVEBGSAVE 创建一个新的RDB文件时,程序会对数据库中的键进行检查,已过期的键不会被保存到新创建的RDB文件中
  2. 载入RDB文件:在启动Redis服务器是,如果服务器开启了RDB功能,那么服务器将对RDB文件进行载入
    • 如果服务器是以主服务器模式运行,那么载入RDB文件时,程序会对文件中保存键进行检查,过期键将被忽略。
    • 如果服务器是已从服务器模式运行,那么载入RDB文件时,所有键不论是否过期,都将被载入到数据库中。因为主从服务器在进行数据同步时候,从服务器的数据库就会被清空,过期键对载入RDB文件的从服务器不会造成影响
  3. AOF文件写入:当服务器以AOF持久化模式运行时,如果数据库中的某个键已经过期,但还没有被惰性或定期删除,AOF文件不会产生任何变化。如果被删除策略删除后,程序会向AOF文件追加AOF文件一个DEL命令,来显示的记录该键被删除
  4. AOF重写:在执行AOF重写过程中,程序会对数据库中的键进行检查,已过期的键不会被保存到重写后的AOF文件中
  5. 复制:当服务器运行在复制模式下,从服务器的过期键删除动作由主服务器控制
    • 主服务器删除一个过期键之后,会限制地向所有从服务器发送一个DEL命令,告知从服务器删除这个过期键
    • 从服务器在执行客户端发送的读命令式,即使碰到过期键页不会将过期键删除,而是继续将处理未过期的键一样来处理过期键
    • 从服务器只有在街道主服务器发来的DEL命令之后,才会删除过期键
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
robj *lookupKeyReadWithFlags(redisDb *db, robj *key, int flags) {
robj *val;

if (expireIfNeeded(db,key) == 1) {
/* If we are in the context of a master, expireIfNeeded() returns 1
* when the key is no longer valid, so we can return NULL ASAP. */
if (server.masterhost == NULL)
goto keymiss;

//如果备机是只读模式,那么返回的就是空值
if (server.current_client &&
server.current_client != server.master &&
server.current_client->cmd &&
server.current_client->cmd->flags & CMD_READONLY)
{
goto keymiss;
}
}
val = lookupKey(db,key,flags);
if (val == NULL)
goto keymiss;
server.stat_keyspace_hits++;
return val;

keymiss:
if (!(flags & LOOKUP_NONOTIFY)) {
notifyKeyspaceEvent(NOTIFY_KEY_MISS, "keymiss", key, db->id);
}
server.stat_keyspace_misses++;
return NULL;
}
  1. 如果从服务器是只读模式,那么返回的就是空值
  2. 否则即使键值已经过期,但是从服务器仍然能够返回该值

image-20211010162549686

总结

  1. Redis服务器所有数据库 都保存在 redisServer.db数组中,而数据库的数量则由 redisServer.dbnum属性保存
  2. 客户端通过修改目标数据库指针,让它指向redisServer.db数组中的不同元素来切换不同的数据库
  3. 数据库主要有dict 和 expires 两个字段构成,其中 dict字典负责保存键值对,而 expires 字典则负责保存键的过期时间
  4. 数据库的键总是一个字符串对象,而值则可以是任意一种Redis对象类型
  5. expires 字典的键指向数据库中的某个键,而值则记录了数据库键的过期时间,过期时间是一个以毫秒为单位的UNIX时间戳
  6. Redis 使用惰性删除和定期删除两种策略来删除过期的键
  7. 当一个过期键被删除之后,服务器会追加一条DEL命令到现在AOF文件的末尾,显示地删除过期键。从服务器即使发现过期键也不会主动删除它,而是等待主节点发来DEL命令

参考文献

  1. 《Redis 设计与实现》