暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SILO - Speedy Transactions in Multicore In-Memory Databases
537
15页
0次
2021-12-21
5墨值下载
Speedy Transactions in Multicore In-Memory Databases
Stephen Tu, Wenting Zheng, Eddie Kohler
, Barbara Liskov, and Samuel Madden
MIT CSAIL and
Harvard University
Abstract
Silo is a new in-memory database that achieves excel-
lent performance and scalability on modern multicore
machines. Silo was designed from the ground up to use
system memory and caches efficiently. For instance, it
avoids all centralized contention points, including that of
centralized transaction ID assignment. Silo’s key contri-
bution is a commit protocol based on optimistic concur-
rency control that provides serializability while avoid-
ing all shared-memory writes for records that were only
read. Though this might seem to complicate the en-
forcement of a serial order, correct logging and recov-
ery is provided by linking periodically-updated epochs
with the commit protocol. Silo provides the same guar-
antees as any serializable database without unnecessary
scalability bottlenecks or much additional latency. Silo
achieves almost 700,000 transactions per second on a
standard TPC-C workload mix on a 32-core machine, as
well as near-linear scalability. Considered per core, this
is several times higher than previously reported results.
1 Introduction
Thanks to drastic increases in main memory sizes and
processor core counts for server-class machines, modern
high-end servers can have several terabytes of RAM and
80 or more cores. When used effectively, this is enough
processing power and memory to handle data sets and
computations that used to be spread across many disks
and machines. However, harnassing this power is tricky;
even single points of contention, like compare-and-
swaps on a shared-memory word, can limit scalability.
This paper presents Silo, a new main-memory
database that achieves excellent performance on multi-
core machines. We designed Silo from the ground up
to use system memory and caches efficiently. We avoid
all centralized contention points and make all synchro-
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for third-party components of this work must be honored. For all other
uses, contact the Owner/Author.
Copyright is held by the Owner/Author(s).
SOSP’13, Nov. 3–6, 2013, Farmington, Pennsylvania, USA.
ACM 978-1-4503-2388-8/13/11.
http://dx.doi.org/10.1145/2517349.2522713
nization scale with the data, allowing larger databases to
support more concurrency.
Silo uses a Masstree-inspired tree structure for its un-
derlying indexes. Masstree [23] is a fast concurrent B-
tree-like structure optimized for multicore performance.
But Masstree only supports non-serializable, single-key
transactions, whereas any real database must support
transactions that affect multiple keys and occur in some
serial order. Our core result, the Silo commit protocol, is
a minimal-contention serializable commit protocol that
provides these properties.
Silo uses a variant of optimistic concurrency control
(OCC) [18]. An OCC transaction tracks the records it
reads and writes in thread-local storage. At commit time,
after validating that no concurrent transaction’s writes
overlapped with its read set, the transaction installs all
written records at once. If validation fails, the transaction
aborts. This approach has several benefits for scalability.
OCC writes to shared memory only at commit time, af-
ter the transaction’s compute phase has completed; this
short write period reduces contention. And thanks to the
validation step, read-set records need not be locked. This
matters because the memory writes required for read
locks can induce contention [11].
Previous OCC implementations are not free of scal-
ing bottlenecks, however, with a key reason being the re-
quirement for tracking “anti-dependencies” (write-after-
read conflicts). Consider a transaction t
1
that reads a
record from the database, and a concurrent transaction
t
2
that overwrites the value t
1
saw. A serializable sys-
tem must order t
1
before t
2
even after a potential crash
and recovery from persistent logs. To achieve this order-
ing, most systems require that t
1
communicate with t
2
,
such as by posting its read sets to shared memory or via
a centrally-assigned, monotonically-increasing transac-
tion ID [18, 19]. Some non-serializable systems can
avoid this communication, but they suffer from anoma-
lies like snapshot isolation’s “write skew” [2].
Silo provides serializability while avoiding all shared-
memory writes for read transactions. The commit proto-
col was carefully designed using memory fences to scal-
ably produce results consistent with a serial order. This
leaves the problem of correct recovery, which we solve
using a form of epoch-based group commit. Time is di-
vided into a series of short epochs. Even though transac-
tion results always agree with a serial order, the system
18
does not explicitly know the serial order except across
epoch boundaries: if t
1
s epoch is before t
2
s, then t
1
pre-
cedes t
2
in the serial order. The system logs transactions
in units of whole epochs and releases results to clients
at epoch boundaries. As a result, Silo provides the same
guarantees as any serializable database without unneces-
sary scalability bottlenecks or much additional latency.
Epochs have other benefits as well; for example, we use
them to provide database snapshots that long-lived read-
only transactions can use to reduce aborts.
On a single 32-core machine, Silo achieves roughly
700,000 transactions per second on the standard TPC-
C benchmark for online transaction processing (OLTP).
This is about 22,000 transactions per second per core.
Per-core transaction throughput at 32 cores is 91%
of that at 8 cores, a small drop indicating that Silo
scales well. For context, the database literature for high-
performance in-memory OLTP systems cites per-core
TPC-C throughput at least several times lower than
Silo’s [16, 25, 27, 28], and benchmarks on our hardware
with a commercial main-memory database system per-
form at most 3,000 transactions per second per core.
An important Silo design choice is its shared-memory
store: any database worker can potentially access the
whole database. Several recent main-memory database
designs instead use data partitioning, in which database
workers effectively own subsets of the data [25, 32]. Par-
titioning can shrink table sizes and avoids the expense of
managing fine-grained locks, but works best when the
query load matches the partitioning. To understand the
tradeoffs, we built and evaluated a partitioned variant of
Silo. Partitioning performs better for some workloads,
but a shared-memory design wins when cross-partition
transactions are frequent or partitions are overloaded.
Silo assumes a one-shot request model in which
all parameters for each client request are available at
the start, and requests always complete without further
client interaction. This model is well-suited for OLTP
workloads. If high-latency client communication were
part of a transaction, the likelihood of abort due to con-
current updates would grow.
Our performance is higher than a full system would
observe since our clients do not currently use the net-
work. (In Masstree, network communication reduced
throughput by 23% [23]; we would expect a similar
reduction for key-value workloads, less for workloads
with more computation-heavy transactions.) Neverthe-
less, our experiments show that Silo has very high per-
formance, that transactions can be serialized without
contention, and that cache-friendly design principles
work well for shared-memory serializable databases.
2 Related work
A number of recent systems have proposed storage
abstractions for main-memory and multicore systems.
These can be broadly classified according to whether or
not they provide transactional support.
2.1 Non-transactional systems
The non-transactional system most related to Silo
is Masstree [23], an extremely scalable and high-
throughput main-memory B
+
-tree. Masstree uses tech-
niques such as version validation instead of read locks
and efficient fine-grained locking algorithms. It builds
on several prior trees, including OLFIT [4], Bronson et
al. [3], and B
link
-trees [20], and adds new techniques, in-
cluding a trie-like structure that speeds up key compar-
isons. Silo’s underlying concurrent B
+
-tree implemen-
tation was inspired by Masstree.
PALM [31] is another high-throughput B
+
-tree struc-
ture designed for multicore systems. It uses a batch-
ing technique, extensive prefetching, and intra-core
SIMD parallelism to provide extremely high throughput.
PALM’s techniques could potentially speed up Silo’s
tree operations.
The Bw-tree [21] is a high-throughput multiversion
tree structure optimized for multicore flash storage. New
versions of data are installed using delta records and
compare-and-swaps on a mapping table; there are no
locks or overwrites (we found both helpful for perfor-
mance). Silo’s data structures are designed for main
memory, whereas many of the Bw-tree’s structures are
designed for flash. Like Silo, the Bw-tree uses RCU-
style epochs for garbage collection; Silo’s epochs also
support scalable serializable logging and snapshots.
LLAMA [22] adds support for transactional logging, but
its logger is centralized.
2.2 Transactional systems
Silo uses optimistic concurrency control [10, 18], which
has several advantages on multicore machines, including
a relatively short period of contention. However, OCC
and its variants (e.g., [1, 12, 34]) often induce contention
in the commit phase, such as centralized transaction ID
assignment or communication among all concurrently
executing transactions.
Larson et al. [19] recently revisited the performance
of locking and OCC-based multi-version concurrency
control (MVCC) systems versus that of traditional
single-copy locking systems in the context of multi-
core main-memory databases. Their OCC implementa-
tion exploits MVCC to avoid installing writes until com-
mit time, and avoids many of the centralized critical sec-
tions present in classic OCC. These techniques form the
basis for concurrency control in Hekaton [8], the main-
memory component of SQL Server. However, their de-
19
of 15
5墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜