LevelDB 完全解析（5）：Cache

coredump 2021-04-25

1070

前文回顾

LevelDB 完全解析（0）：基本原理和整体架构
LevelDB 完全解析（1）：MemTable
LevelDB 完全解析（2）：Log
LevelDB 完全解析（3）：SSTable
LevelDB 完全解析（4）：Manifest

根据功能的不同，LevelDB 中有两种 cache：

Block cache：缓存解压后的 data block，可以加快热数据的查询。
Table cache：缓存打开的 SSTable 文件描述符和对应的 index block、meta block 等信息。

在 LevelDB 中，block cache 和 table cache 都是基于 ShardedLRUCache^[1] 实现的。

ShardedLRUCache

ShardedLRUCache^[2] 是在 LRUCache^[3] 上包装了一层分片——根据 key 的哈希值的前 4 位（kNumShardBits^[4]）分 16 个（kNumShards^[5]） LRUCache。

分片的作用是减少多线程对同一个 LRUCache 对象的争用。

LRUCache

LRU，全称是 Least Recently Used（最近最少使用），是一种利用局部性原理（如果数据最近被访问过，那么将来被访问的几率也更高）的缓存淘汰策略。LRUCache 即是一个基于 LRU 淘汰策略的缓存。当热点数据比较集中时，LRUCache 的效率比较高。

LevelDB 的 LRUCache^[6] 的实现由一个哈希表和两个链表组成：

链表 lru_^[7]：维护 cache 中的缓存对象的使用热度。数据每次被访问的时候，都会被插入到这个链表最新的地方。lru_->next 指向最旧的数据， lru_->prev 指向最新的数据。当 cache 占用的内存超过限制时，则从 lru_->next 开始清理数据。
链表 in_use_^[8]：维护 cache 中有哪些缓存对象被返回给调用端使用。这些数据不能被淘汰。
哈希表 table_^[9]：保存所有 key -> 缓存对象，用于快速查找数据。

LRUCache 的 Insert 和 Lookup 的时间复杂度都是 O(1)。

LRUHandle

LRUHandle^[10] 是 LRUCache 中的一个对象。

struct LRUHandle {
  void* value;
  void (*deleter)(const Slice&, void* value);
  LRUHandle* next_hash;
  LRUHandle* next;
  LRUHandle* prev;
  size_t charge;  // TODO(opt): Only allow uint32_t?
  size_t key_length;
  bool in_cache;     // Whether entry is in the cache.
  uint32_t refs;     // References, including cache reference, if present.
  uint32_t hash;     // Hash of key(); used for fast sharding and comparisons
  char key_data[1];  // Beginning of key

  Slice key() const {
    // next_ is only equal to this if the LRU handle is the list head of an
    // empty list. List heads never have meaningful keys.
    assert(next != this);

    return Slice(key_data, key_length);
  }
};

重点关注三个 LRUHandle
的指针：

next_hash
: 哈希表的实现采用的是拉链法来处理哈希冲突，用 next_hash
维护落到同一个 bucket 的对象。
next
/ prev
: LRUCache 通过链表来维护缓存对象的使用“热度”和是否正在被调用端使用，这两个指针就是用来实现链表的。

HandleTable

HandleTable^[11] 是 LevelDB 采用拉链法实现的一个哈希表，主要提供了 Lookup^[12]、Insert^[13]、Remove^[14] 三个接口。没有做什么特殊优化，逻辑很清楚。

当哈希表的元素数量超过 list_
的长度 length_
，会调用 Resize^[15] 进行重新哈希（rehash）。直接扫描整个哈希表进行全量 rehash，如果哈希表很大，遇到 rehash 可能会导致系统抖动。

缓存淘汰策略

LRU

LRU 是一种常用的缓存淘汰策略，因为大部分情况下，数据的访问都是具有局部性的——最近访问过的数据，短时间内还被访问的概率比较大；而比较久没被访问的数据，短时间内会被访问的概率比较小。

当热点数据比较集中时，LRU 的缓存命中率比较高。但是在某些场景下，LRU 的缓存命中率会急剧下降，比如批量遍历。

LevelDB 在读参数 ReadOptions
提供了一个参数 fill_cache
，让上层控制是否要将 data block 放入到 block cache。

MySQL 的 InnoDB 的 LRU 缓存实现为了避免扫描操作污染 cache，采用了两级的 LRU cache。数据会先进入第一级 cache，一段时间之后还有访问再放到第二级 cache。

FIFO

FIFO，全称 First In First Out，其实就是一个队列，按先进先出的方法淘汰数据。新的数据插入队列尾部（入队），队列满了之后从队列的头部开始删除（出队）。

LFU

LFU，全称 Least Frequently Used，根据数据的访问频率来淘汰数据，适用于“如果数据过去被访问多次，那么将来被访问的频率也更高”的场景。

Random

随机淘汰缓存对象。好处是不用根据 LRU、FIFO、LFU 的要求维护缓存对象的顺序。

Block Cache

一个 SSTable 在被打开的时候，会通过 options.block_cache
的 NewId^[16] 为其分配一个唯一的 cache_id^[17]。

每个 data block 保存到 block cache 的 key 为 cache_id + offset^[18]。

由于 SSTable 是只读的，block 的读取和 block cache 的维护非常简单，具体参考 Table::BlockReader^[19]。

Table Cache

Table cache 的具体实现在 table_cache.h ^[20]和 table_cache.cc^[21]。

Table cache 的 key 是 SSTable 的 file_number^[22]，value 是一个 TableAndFile ^[23]对象。

TableAndFile
有两个成员变量 file
和 table
，重点是 table
。

struct TableAndFile {
  RandomAccessFile* file;
  Table* table;
};

Table 内部封装了 index 和 filter，以及其他一些 SSTable 的元数据（参考代码^[24]）。

struct Table::Rep {
  ~Rep() {
    delete filter;
    delete[] filter_data;
    delete index_block;
  }

  Options options;
  Status status;
  RandomAccessFile* file;
  uint64_t cache_id;
  FilterBlockReader* filter;
  const char* filter_data;

  BlockHandle metaindex_handle;  // Handle to metaindex_block: saved from footer
  Block* index_block;
};

参考资料

[1]

ShardedLRUCache: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L338

[2]

ShardedLRUCache: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L338

[3]

LRUCache: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L150

[4]

kNumShardBits: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L335

[5]

kNumShards: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L336

[6]

LRUCache: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L150

[7]

链表 lru_: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L188

[8]

链表 in_use_: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L192

[9]

哈希表 table_: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L194

[10]

LRUHandle: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L42

[11]

HandleTable: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L69

[12]

Lookup: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L74

[13]

Insert: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L78

[14]

Remove: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L94

[15]

Resize: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L122

[16]

NewId: https://github.com/google/leveldb/blob/1.22/util/cache.cc#L378

[17]

cache_id: https://github.com/google/leveldb/blob/1.22/table/table.cc#L74

[18]

cache_id + offset: https://github.com/google/leveldb/blob/1.22/table/table.cc#L172

[19]

Table::BlockReader: https://github.com/google/leveldb/blob/1.22/table/table.cc#L155

[20]

table_cache.h : https://github.com/google/leveldb/blob/1.22/db/table_cache.h

[21]

table_cache.cc: https://github.com/google/leveldb/blob/1.22/db/table_cache.cc

[22]

SSTable 的 file_number: https://github.com/google/leveldb/blob/1.22/db/table_cache.cc#L45

[23]

TableAndFile : https://github.com/google/leveldb/blob/1.22/db/table_cache.cc#L14

[24]

参考代码: https://github.com/google/leveldb/blob/1.22/table/table.cc#L20

数据库

文章转载自coredump，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。