
CIDR’22, January 10-13, 2022, Chaminade, CA, USA Pat Helland
system. When all goes well, the system itself oers its expected
latency as each of its components meets their SLO for prompt
service. When services within the system are less punctual, we can
see impact on the larger system’s responsiveness
5
.
Retrying slow requests to an alternate server is a common tech-
nique to mitigate this [
20
]. Proactively sending two or more requests
to dierent servers and accepting the rst response dramatically
lowers the expected latency for the rst response.
In the DB world, we see latency bounding for log subsystems.
One example is Apache Bo okkeeper [
7
]. Log entries are appended by
writing to N log-storage-replicas (called Bookies). When Q of N log-
storage-replicas have acknowledged the receipt of the log entry, it is
durable. Since N - Q log-storage-replicas may be slow, the expected
and observed SLO for writing to the log is dramatically reduced.
The database system is predictably faster with less variability.
AWS Aurora[
46
,
48
] carries this further with replicated AWS-
storage-servers for log replay and block creation. These AWS-storage-
servers are eectively the lower half of the database. They are placed
with 2 servers in each of 3 AZs (Availability Zones). When 4 of
these servers have acknowledged the log entry, Aurora considers
the log entry to be durable. Transactions having commit records
in the entry may be conrmed to the waiting human. Even when
AZ+1
6
failures have happened, the commit record can be read later.
Logging is fast even if 2 of the 6 AWS-storage-servers are slow.
Still, major portions of the database run in a single server.
If that server is jittery, humans will experience unpredictable delays.
Can ALL of the database be responsive
atop unpredictable servers?
Can every aspect of a database be decentralized
and responsive even when some of its pieces are not?
1.2 Applicability to a Broad Range of Systems.
This paper is about jitter-avoidance through the use of quorum. It
investigates techniques to manage complex systems based on an al-
ternate form of order (called seniority) that is largely decoupled from
classic happened before partial-order messaging guarantees[35].
By sketching a design for a transactional database, we
demonstrates the broad applicability of these principles
to many complex distributed systems.
Paxos[36] provides linear order but it jitters
We provide partial order and avoid jitter
1.3 Availability in a Complex System
The phone system over land lines oers amazing availability. Occa-
sionally, a call is dropped but you can redial and connect.
Availability is dened as the opportunity to dial another call.
In transactional database systems, transactions may fail. Lock
conicts or other problems can cause work to be discarded. You
don’t want this to be common but once in a while is OK. When a
transaction aborts, the application can restart the work and hope-
fully succeed. In a distributed database, restarted transactions may
5
See the paper Thinking about Availability in Large Service Infrastructures [
39
] for great
discussions of SLAs (Service Level Agreements), SLOs (Service Level Objectives), and
SLIs (Service Level Indicators). The discussion also includes multi-dimensional SLOs.
6
Tolerating AZ+1 failures means an entire AZ can be lost at the same time as one more
server is down for other reasons [46].
route to a dierent database server. If a single transaction gets stuck,
the application can abandon it, retry, and probably succeed.
Our decoupled transactions database may occasionally timeout
while doing a transaction. Even so, it remains open for new business.
1.4 Snapshot Isolation Guarantees
The goal for this design is to imagine a database that runs many
existing applications with excellent availability and responsiveness.
Special attention is required to when, how, and with what guar-
antees the application sees and changes its data. Snapshot Isolation
[
13
,
41
] denes the database behavior that an application sees when
running with concurrent work. It is arguably the most common
isolation guarantee provided by modern commercial databases. In
its basic form, snapshot isolation provides guarantees when reading
and committing updates in a transaction:
•
Snapshot reads: Each transaction gets a snapshot time as it
begins and, as it reads data, sees updates from transactions
committed before the snapshot time and its own updates.
•
Conict detection prior to commit: As a transaction is
about to commit, the database will ensure that none of the
updated records has been changed by other transactions
since this transaction’s snapshot time.
Snapshot isolation is extremely popular. Many existing applications
depend on these guarantees for their successful execution.
1.5 What’s Jitter?
Jitter mean behavior dierent than the expected norm. In particular,
for response time varying from what’s expected. In electronics, it
means deviation from the circuit’s expected clock timing.
In networking, jitter describes larger than expected variance in
the delay moving through the network [
17
]. Servers may jitter for
many reasons from gray failure to Operating Systems to Java VMs.
See §12: (Appendix A: Building on a Jittery Foundation).
Within a distributed system, jitter comprises both variability
from the network and variability from the server processing a request.
From the standpoint of the client issuing a request, it can’t tell the
cause of the delay, just that the response is not as zippy as hoped.
If part of the cluster is not responding, those servers may or
may not be doing work. For servers spread widely within or across
data centers, an observer may see some servers responding quickly
but not all of them. The ones WE see responding quickly may not be
providing prompt responses to other servers in the datacenter.
What can we assume so we can make progress?
1.6 Quorum: Avoiding Snapshot Isolation Jitter
Snapshot isolation transactions can do independent changes to the
database as long as there are no conicting updates and each trans-
action sees only snapshot reads. What if these two things can be
accomplished without stalling behind jittery servers?
We can avoid stalling
7
when we do work with quorums. The
idea is to ask a collection of servers to do each operation and wait
for only some of them to answer. Suppose we have a quorum of N
servers and we only need answers from Q of them (where Q > N/2).
When we get Q answers, we know that at least one server of the Q
has already seen any single previous operation. By combining all
Q answers, we can know about previous work. F is our maximum
7
The probability of stalling drops dramatically when we don’t wait for every server.
评论