一 AOF
二 AOFRW

三 AOFRW存在的问题
1 内存开销
aof_pending_rewrite:0
aof_buffer_length:35500
aof_rewrite_buffer_length:34000
aof_pending_bio_fsync:0
复制
3351:M 25 Jan 2022 09:55:39.655 * Background append only file rewriting started by pid 6817
3351:M 25 Jan 2022 09:57:51.864 * AOF rewrite child asks to stop sending diffs.
6817:C 25 Jan 2022 09:57:51.864 * Parent agreed to stop sending diffs. Finalizing AOF...
6817:C 25 Jan 2022 09:57:51.864 * Concatenating 2135.60 MB of AOF diff received from parent.
3351:M 25 Jan 2022 09:57:56.545 * Background AOF buffer size: 100 MB
复制
2 CPU开销
在AOFRW期间,主进程需要花费CPU时间向aof_rewrite_buf写数据,并使用eventloop事件循环向子进程发送aof_rewrite_buf中的数据:
/* Append data to the AOF rewrite buffer, allocating new blocks if needed. */
void aofRewriteBufferAppend(unsigned char *s, unsigned long len) {
// 此处省略其他细节...
/* Install a file event to send data to the rewrite child if there is
* not one already. */
if (!server.aof_stop_sending_diff &&
aeGetFileEvents(server.el,server.aof_pipe_write_data_to_child) == 0)
{
aeCreateFileEvent(server.el, server.aof_pipe_write_data_to_child,
AE_WRITABLE, aofChildWriteDiffData, NULL);
}
// 此处省略其他细节...
}
复制
在子进程执行重写操作的后期,会循环读取pipe中主进程发送来的增量数据,然后追加写入到临时AOF文件:
int rewriteAppendOnlyFile(char *filename) {
// 此处省略其他细节...
/* Read again a few times to get more data from the parent.
* We can't read forever (the server may receive data from clients
* faster than it is able to send data to the child), so we try to read
* some more data in a loop as soon as there is a good chance more data
* will come. If it looks like we are wasting time, we abort (this
* happens after 20 ms without new data). */
int nodata = 0;
mstime_t start = mstime();
while(mstime()-start < 1000 && nodata < 20) {
if (aeWait(server.aof_pipe_read_data_from_parent, AE_READABLE, 1) <= 0)
{
nodata++;
continue;
}
nodata = 0; /* Start counting from zero, we stop on N *contiguous*
timeouts. */
aofReadDiffFromParent();
}
// 此处省略其他细节...
}
复制
在子进程完成重写操作后,主进程会在backgroundRewriteDoneHandler 中进行收尾工作。其中一个任务就是将在重写期间aof_rewrite_buf中没有消费完成的数据写入临时AOF文件。如果aof_rewrite_buf中遗留的数据很多,这里也将消耗CPU时间。
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
// 此处省略其他细节...
/* Flush the differences accumulated by the parent to the rewritten AOF. */
if (aofRewriteBufferWrite(newfd) == -1) {
serverLog(LL_WARNING,
"Error trying to flush the parent diff to the rewritten AOF: %s", strerror(errno));
close(newfd);
goto cleanup;
}
// 此处省略其他细节...
}
复制
3 磁盘IO开销
4 代码复杂度
/* AOF pipes used to communicate between parent and child during rewrite. */
int aof_pipe_write_data_to_child;
int aof_pipe_read_data_from_parent;
int aof_pipe_write_ack_to_parent;
int aof_pipe_read_ack_from_child;
int aof_pipe_write_ack_to_child;
int aof_pipe_read_ack_from_parent;
复制
四 MP-AOF实现
1 方案概述
BASE:表示基础AOF,它一般由子进程通过重写产生,该文件最多只有一个。
INCR:表示增量AOF,它一般会在AOFRW开始执行时被创建,该文件可能存在多个。
HISTORY:表示历史AOF,它由BASE和INCR AOF变化而来,每次AOFRW成功完成时,本次AOFRW之前对应的BASE和INCR AOF都将变为HISTORY,HISTORY类型的AOF会被Redis自动删除。

2 关键实现
Manifest
1)在内存中的表示
aofInfo:表示一个AOF文件信息,当前仅包括文件名、文件序号和文件类型
base_aof_info:表示BASE AOF信息,当不存在BASE AOF时,该字段为NULL
incr_aof_list:用于存放所有INCR AOF文件的信息,所有的INCR AOF都会按照文件打开顺序排放
history_aof_list:用于存放HISTORY AOF信息,history_aof_list中的元素都是从base_aof_info和incr_aof_list中move过来的
typedef struct {
sds file_name; /* file name */
long long file_seq; /* file sequence */
aof_file_type file_type; /* file type */
} aofInfo;
typedef struct {
aofInfo *base_aof_info; /* BASE file information. NULL if there is no BASE file. */
list *incr_aof_list; /* INCR AOFs list. We may have multiple INCR AOF when rewrite fails. */
list *history_aof_list; /* HISTORY AOF list. When the AOFRW success, The aofInfo contained in
`base_aof_info` and `incr_aof_list` will be moved to this list. We
will delete these AOF files when AOFRW finish. */
long long curr_base_file_seq; /* The sequence number used by the current BASE file. */
long long curr_incr_file_seq; /* The sequence number used by the current INCR file. */
int dirty; /* 1 Indicates that the aofManifest in the memory is inconsistent with
disk, we need to persist it immediately. */
} aofManifest;
复制
struct redisServer {
// 此处省略其他细节...
aofManifest *aof_manifest; /* Used to track AOFs. */
// 此处省略其他细节...
}
复制
2)在磁盘上的表示
file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i
file appendonly.aof.2.incr.aof seq 2 type i
复制
file appendonly.aof.1.base.rdb seq 1 type b newkey newvalue
file appendonly.aof.1.incr.aof type i seq 1
# this is annotations
seq 2 type i file appendonly.aof.2.incr.aof
复制
文件命名规则
seq为文件的序号,由1开始单调递增,BASE和INCR拥有独立的文件序号
type为AOF的类型,表示这个AOF文件是BASE还是INCR
format用来表示这个AOF内部的编码方式,由于Redis支持RDB preamble机制,因此BASE AOF可能是RDB格式编码也可能是AOF格式编码:
#define BASE_FILE_SUFFIX ".base"
#define INCR_FILE_SUFFIX ".incr"
#define RDB_FORMAT_SUFFIX ".rdb"
#define AOF_FORMAT_SUFFIX ".aof"
#define MANIFEST_NAME_SUFFIX ".manifest"
复制
appendonly.aof.1.base.rdb // 开启RDB preamble
appendonly.aof.1.base.aof // 关闭RDB preamble
appendonly.aof.1.incr.aof
appendonly.aof.2.incr.aof
复制
兼容老版本升级
如果appenddirname目录不存在 或者appenddirname目录存在,但是目录中没有对应的manifest清单文件 如果appenddirname目录存在且目录中存在manifest清单文件,且清单文件中只有BASE AOF相关信息,且这个BASE AOF的名字和server.aof_filename相同,且appenddirname目录中不存在名为server.aof_filename的文件
/* Load the AOF files according the aofManifest pointed by am. */
int loadAppendOnlyFiles(aofManifest *am) {
// 此处省略其他细节...
/* If the 'server.aof_filename' file exists in dir, we may be starting
* from an old redis version. We will use enter upgrade mode in three situations.
*
* 1. If the 'server.aof_dirname' directory not exist
* 2. If the 'server.aof_dirname' directory exists but the manifest file is missing
* 3. If the 'server.aof_dirname' directory exists and the manifest file it contains
* has only one base AOF record, and the file name of this base AOF is 'server.aof_filename',
* and the 'server.aof_filename' file not exist in 'server.aof_dirname' directory
* */
if (fileExist(server.aof_filename)) {
if (!dirExists(server.aof_dirname) ||
(am->base_aof_info == NULL && listLength(am->incr_aof_list) == 0) ||
(am->base_aof_info != NULL && listLength(am->incr_aof_list) == 0 &&
!strcmp(am->base_aof_info->file_name, server.aof_filename) && !aofFileExist(server.aof_filename)))
{
aofUpgradePrepare(am);
}
}
// 此处省略其他细节...
}
复制
使用server.aof_filename作为文件名来构造一个BASE AOF信息 将该BASE AOF信息持久化到manifest文件 使用rename 将旧AOF文件移动到appenddirname目录中
void aofUpgradePrepare(aofManifest *am) {
// 此处省略其他细节...
/* 1. Manually construct a BASE type aofInfo and add it to aofManifest. */
if (am->base_aof_info) aofInfoFree(am->base_aof_info);
aofInfo *ai = aofInfoCreate();
ai->file_name = sdsnew(server.aof_filename);
ai->file_seq = 1;
ai->file_type = AOF_FILE_TYPE_BASE;
am->base_aof_info = ai;
am->curr_base_file_seq = 1;
am->dirty = 1;
/* 2. Persist the manifest file to AOF directory. */
if (persistAofManifest(am) != C_OK) {
exit(1);
}
/* 3. Move the old AOF file to AOF directory. */
sds aof_filepath = makePath(server.aof_dirname, server.aof_filename);
if (rename(server.aof_filename, aof_filepath) == -1) {
sdsfree(aof_filepath);
exit(1);;
}
// 此处省略其他细节...
}
复制
多文件加载及进度计算
int loadAppendOnlyFiles(aofManifest *am) {
// 此处省略其他细节...
/* Here we calculate the total size of all BASE and INCR files in
* advance, it will be set to `server.loading_total_bytes`. */
total_size = getBaseAndIncrAppendOnlyFilesSize(am);
startLoading(total_size, RDBFLAGS_AOF_PREAMBLE, 0);
/* Load BASE AOF if needed. */
if (am->base_aof_info) {
aof_name = (char*)am->base_aof_info->file_name;
updateLoadingFileName(aof_name);
loadSingleAppendOnlyFile(aof_name);
}
/* Load INCR AOFs if needed. */
if (listLength(am->incr_aof_list)) {
listNode *ln;
listIter li;
listRewind(am->incr_aof_list, &li);
while ((ln = listNext(&li)) != NULL) {
aofInfo *ai = (aofInfo*)ln->value;
aof_name = (char*)ai->file_name;
updateLoadingFileName(aof_name);
loadSingleAppendOnlyFile(aof_name);
}
}
server.aof_current_size = total_size;
server.aof_rewrite_base_size = server.aof_current_size;
server.aof_fsync_offset = server.aof_current_size;
stopLoading();
// 此处省略其他细节...
}
复制
AOFRW Crash Safety
BASE AOF的名字中包含文件序号,保证每次创建的BASE AOF不会和之前的BASE AOF冲突; 先执行AOF的rename 操作,再修改manifest文件;
file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i
复制
file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i
file appendonly.aof.2.incr.aof seq 2 type i
复制
file appendonly.aof.2.base.rdb seq 2 type b
file appendonly.aof.1.base.rdb seq 1 type h
file appendonly.aof.1.incr.aof seq 1 type h
file appendonly.aof.2.incr.aof seq 2 type i
复制
在修改内存中的server.aof_manifest前,先dup一份临时的manifest结构,接下来的修改都将针对这个临时的manifest进行。这样做的好处是,一旦后面的步骤出现失败,我们可以简单的销毁临时manifest从而回滚整个操作,避免污染server.aof_manifest全局数据结构; 从临时manifest中获取新的BASE AOF文件名(记为new_base_filename),并将之前(如果有)的BASE AOF标记为HISTORY; 将子进程产生的temp-rewriteaof-bg-pid.aof临时文件重命名为new_base_filename; 将临时manifest结构中上一次的INCR AOF全部标记为HISTORY类型; 将临时manifest对应的信息持久化到磁盘(persistAofManifest内部会保证manifest本身修改的原子性); 如果上述步骤都成功了,我们可以放心的将内存中的server.aof_manifest指针指向临时的manifest结构(并释放之前的manifest结构),至此整个修改对Redis可见; 清理HISTORY类型的AOF,该步骤允许失败,因为它不会导致数据一致性问题。
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
snprintf(tmpfile, 256, "temp-rewriteaof-bg-%d.aof",
(int)server.child_pid);
/* 1. Dup a temporary aof_manifest for subsequent modifications. */
temp_am = aofManifestDup(server.aof_manifest);
/* 2. Get a new BASE file name and mark the previous (if we have)
* as the HISTORY type. */
new_base_filename = getNewBaseFileNameAndMarkPreAsHistory(temp_am);
/* 3. Rename the temporary aof file to 'new_base_filename'. */
if (rename(tmpfile, new_base_filename) == -1) {
aofManifestFree(temp_am);
goto cleanup;
}
/* 4. Change the AOF file type in 'incr_aof_list' from AOF_FILE_TYPE_INCR
* to AOF_FILE_TYPE_HIST, and move them to the 'history_aof_list'. */
markRewrittenIncrAofAsHistory(temp_am);
/* 5. Persist our modifications. */
if (persistAofManifest(temp_am) == C_ERR) {
bg_unlink(new_base_filename);
aofManifestFree(temp_am);
goto cleanup;
}
/* 6. We can safely let `server.aof_manifest` point to 'temp_am' and free the previous one. */
aofManifestFreeAndUpdate(temp_am);
/* 7. We don't care about the return value of `aofDelHistoryFiles`, because the history
* deletion failure will not cause any problems. */
aofDelHistoryFiles();
}
复制
支持AOF truncate
if (ftruncate(server.aof_fd, server.aof_last_incr_size) == -1) {
//此处省略其他细节...
}
复制
AOFRW限流
if (server.aof_state == AOF_ON &&
!hasActiveChildProcess() &&
server.aof_rewrite_perc &&
server.aof_current_size > server.aof_rewrite_min_size &&
!aofRewriteLimited())
{
long long base = server.aof_rewrite_base_size ?
server.aof_rewrite_base_size : 1;
long long growth = (server.aof_current_size*100/base) - 100;
if (growth >= server.aof_rewrite_perc) {
rewriteAppendOnlyFileBackground();
}
}
复制
五 总结
搜索与推荐技术实战训练营
文章转载自阿里技术,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
Redis 挂 AGPLv3 “战袍”,开源江湖风云突变
青年数据库学习互助会
68次阅读
2025-05-08 10:04:49
Redis改协议内幕曝光!核心开发者亲述被“踢出局”,外部贡献者几乎全跑光了!
老鱼笔记
47次阅读
2025-04-17 10:41:56
Redis数据库——Cluster集群模式
编程Cookbook
46次阅读
2025-04-16 15:34:44
Redis 8.0:开启一体化多功能开源数据平台新时代
老王两点中
45次阅读
2025-05-12 09:00:43
亚马逊:MemoryDB,一款内存优先的云数据库
数据库应用创新实验室
31次阅读
2025-04-18 09:54:15
优雅遍历和删除特定开头的key
陌殇流苏
27次阅读
2025-04-25 12:17:03
redis初识
chirpyli
23次阅读
2025-05-07 17:32:31
Redis数据库——内存分配器
编程Cookbook
18次阅读
2025-04-14 12:59:10
Redka:基于 SQLite 的 Redis 替代方案
老柴杂货铺
17次阅读
2025-04-13 11:26:41
Redis数据库——持久化机制
编程Cookbook
14次阅读
2025-04-15 11:32:38