does not explicitly know the serial order except across
epoch boundaries: if t
1
’s epoch is before t
2
’s, then t
1
pre-
cedes t
2
in the serial order. The system logs transactions
in units of whole epochs and releases results to clients
at epoch boundaries. As a result, Silo provides the same
guarantees as any serializable database without unneces-
sary scalability bottlenecks or much additional latency.
Epochs have other benefits as well; for example, we use
them to provide database snapshots that long-lived read-
only transactions can use to reduce aborts.
On a single 32-core machine, Silo achieves roughly
700,000 transactions per second on the standard TPC-
C benchmark for online transaction processing (OLTP).
This is about 22,000 transactions per second per core.
Per-core transaction throughput at 32 cores is 91%
of that at 8 cores, a small drop indicating that Silo
scales well. For context, the database literature for high-
performance in-memory OLTP systems cites per-core
TPC-C throughput at least several times lower than
Silo’s [16, 25, 27, 28], and benchmarks on our hardware
with a commercial main-memory database system per-
form at most 3,000 transactions per second per core.
An important Silo design choice is its shared-memory
store: any database worker can potentially access the
whole database. Several recent main-memory database
designs instead use data partitioning, in which database
workers effectively own subsets of the data [25, 32]. Par-
titioning can shrink table sizes and avoids the expense of
managing fine-grained locks, but works best when the
query load matches the partitioning. To understand the
tradeoffs, we built and evaluated a partitioned variant of
Silo. Partitioning performs better for some workloads,
but a shared-memory design wins when cross-partition
transactions are frequent or partitions are overloaded.
Silo assumes a one-shot request model in which
all parameters for each client request are available at
the start, and requests always complete without further
client interaction. This model is well-suited for OLTP
workloads. If high-latency client communication were
part of a transaction, the likelihood of abort due to con-
current updates would grow.
Our performance is higher than a full system would
observe since our clients do not currently use the net-
work. (In Masstree, network communication reduced
throughput by 23% [23]; we would expect a similar
reduction for key-value workloads, less for workloads
with more computation-heavy transactions.) Neverthe-
less, our experiments show that Silo has very high per-
formance, that transactions can be serialized without
contention, and that cache-friendly design principles
work well for shared-memory serializable databases.
2 Related work
A number of recent systems have proposed storage
abstractions for main-memory and multicore systems.
These can be broadly classified according to whether or
not they provide transactional support.
2.1 Non-transactional systems
The non-transactional system most related to Silo
is Masstree [23], an extremely scalable and high-
throughput main-memory B
+
-tree. Masstree uses tech-
niques such as version validation instead of read locks
and efficient fine-grained locking algorithms. It builds
on several prior trees, including OLFIT [4], Bronson et
al. [3], and B
link
-trees [20], and adds new techniques, in-
cluding a trie-like structure that speeds up key compar-
isons. Silo’s underlying concurrent B
+
-tree implemen-
tation was inspired by Masstree.
PALM [31] is another high-throughput B
+
-tree struc-
ture designed for multicore systems. It uses a batch-
ing technique, extensive prefetching, and intra-core
SIMD parallelism to provide extremely high throughput.
PALM’s techniques could potentially speed up Silo’s
tree operations.
The Bw-tree [21] is a high-throughput multiversion
tree structure optimized for multicore flash storage. New
versions of data are installed using delta records and
compare-and-swaps on a mapping table; there are no
locks or overwrites (we found both helpful for perfor-
mance). Silo’s data structures are designed for main
memory, whereas many of the Bw-tree’s structures are
designed for flash. Like Silo, the Bw-tree uses RCU-
style epochs for garbage collection; Silo’s epochs also
support scalable serializable logging and snapshots.
LLAMA [22] adds support for transactional logging, but
its logger is centralized.
2.2 Transactional systems
Silo uses optimistic concurrency control [10, 18], which
has several advantages on multicore machines, including
a relatively short period of contention. However, OCC
and its variants (e.g., [1, 12, 34]) often induce contention
in the commit phase, such as centralized transaction ID
assignment or communication among all concurrently
executing transactions.
Larson et al. [19] recently revisited the performance
of locking and OCC-based multi-version concurrency
control (MVCC) systems versus that of traditional
single-copy locking systems in the context of multi-
core main-memory databases. Their OCC implementa-
tion exploits MVCC to avoid installing writes until com-
mit time, and avoids many of the centralized critical sec-
tions present in classic OCC. These techniques form the
basis for concurrency control in Hekaton [8], the main-
memory component of SQL Server. However, their de-
19
评论