Decoupled Transactions: Low Tail Latency Online Transactions Atop Jittery Servers

张凡

30页

0次

2022-01-30

免费下载

Decoupled Transactions:

Low Tail Latency Online Transactions Atop Jiery Servers

Pat Helland

phelland@salesforce.com

Salesforce

San Francisco, CA, USA

ABSTRACT

Modern cloud data centers are busy places that share lots of re-

sources. It is common for services to uctuate in their respon-

siveness, sometimes becoming slow or very slow. Many distributed

systems experience cascading slowness as one or a few slow servers

(or their network) bring the entire system to its knees.

Non-transactional work copes by using idempotent retries by-

passing the laggards. For transactional databases, it’s not so simple.

This paper sketches a design for a distributed database providing

responsive snapshot isolation transactions even when some of its

servers and connections stop or, more perniciously, just slow down.

We present a thought experiment for a decoupled transactions

database system that avoids cascading slowdown when a subset of

its servers are sick but not necessarily dead. The goal is to provide

low tail latency online transactions atop servers and networks that

may sometimes go slow. Assume at most F recalcitrant servers

in the database. Can we design a robust system that makes pre-

dictable progress while not waiting for F slow servers? Can we use

these ideas for practical deployments in modern data centers with

availability zones and today’s expected operational challenges?

This hypothetical design explores techniques to dampen applica-

tion visible jitter in a database system running in a cloud datacenter

when most of the servers are responsive. This inevitably causes us

to examine the nature of a database’s knowledge of correctness and

how that can exist without a centralized authority.

ACM Reference Format:

Pat Helland. 2022. Decoupled Transactions: Low Tail Latency Online Trans-

actions Atop Jittery Servers. In Proceedings of 12th Annual Conf on Innova-

tive Data Systems Research (CIDR ’22) (CIDR’22). ACM, New York, NY, USA,

30 pages. https://doi.org/10.1145/1122445.1122456

1 INTRODUCTION

Typically, online transactional databases are deployed with expen-

sive dedicated hardware

in data centers owned by their enterprise.

It is desirable to run these solutions in cloud data center environ-

ments to avail ourselves of their tremendous advantages in exible

deployments and elasticity of resources. However, there are new

challenges as these shared environments do not oer predictable

High quality servers, data storage in SANs (Storage Area Networks) [

] and

bespoke networks are common deployments for mission-critical databases.

This article is published under a Creative Commons Attribution License

(http://creativecommons.org/licenses/by/3.0/), which permits distribution and repro-

duction in any medium as well as allowing derivative works, provided that you at-

tribute the original work to the author(s) and CIDR 2022. 12th Annual Conference on

Innovative Data Systems Research (CIDR ’22). January 9-12, 2022, Chaminade, USA.

CIDR’22, January 10-13, 2022, Chaminade, CA, USA

ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . .$15.00

https://doi.org/10.1145/1122445.1122456

response time to requests owing across servers. In the cloud, re-

sponses exhibit probabilistic latency. Worse, the expected response

time varies as the environment experiences pressure. Some servers

will take noticeably longer while others provide their normal ex-

pected response time distributions.

Human facing non-transactional work provides a vibrant and

responsive experience using retries of idempotent operations

Can we provide high-availability low-latency responses from a

high-throughput externally consistent

distributed database that

tolerates jittery or sick servers? Can we know what happened in the

past quickly with low tail-latency so we can briskly make changes in

the future based on what happened? Can we rapidly protect against

conicting concurrent updates as we commit transactions? How

the heck can we build a system without a centralized authority to

remember what’s happened and decide what should happen next?

It is not our goal to dene a super-scalable SQL system.

We hope to scale to tens of servers with predictable response

time while running in a largely unpredictable environment.

This thought exercise aspires to pry apart the cross-server depen-

dencies in our imaginary system and push our minds to understand

the nature of how state is represented in a system and how that

state may be decentralized. Can we bound the internal dependen-

cies within the implementation of a database so transactions can

commit rapidly even when some components aren’t responsive?

It is hoped that this can empower future systems to blithely ig-

nore performance irregularities in large data centers and just give

prompt answers when enough servers participate.

1.1 Inspiration for This Work

In the cloud, we frequently measure the latency between a request

and its response as one service invokes another. These response la-

tencies are probabilistic and can be expressed as a CDF (Cumulative

Distribution Function) showing the expected likelihood a response

will be received by a specied time

For example, an SLO (Service Level Objective) for a service may

specify a 99.9% probability of responding within 2 milliseconds.

A system will typically have multiple dimensions to its SLO such

as windows of time in which the target response latency is actually

met. For example, 99.9% of the responses will be received within

2 milliseconds at least 98% of the time.

When a larger system is built using smaller services, the SLO of

the smaller services can have signicant impact on the resulting

See The Tail at Scale [

] for an excellent discussion about bounding tail-latency

probabilities by retrying to an alternate server.

I.e. snapshot isolation [13, 41] as seen external to a distributed database.

See §12 (Appendix A: Building on a Jittery Foundation) for an in-depth discussion of

the challenges we face within modern data centers.

CIDR’22, January 10-13, 2022, Chaminade, CA, USA Pat Helland

system. When all goes well, the system itself oers its expected

latency as each of its components meets their SLO for prompt

service. When services within the system are less punctual, we can

see impact on the larger system’s responsiveness

Retrying slow requests to an alternate server is a common tech-

nique to mitigate this [

]. Proactively sending two or more requests

to dierent servers and accepting the rst response dramatically

lowers the expected latency for the rst response.

In the DB world, we see latency bounding for log subsystems.

One example is Apache Bo okkeeper [

]. Log entries are appended by

writing to N log-storage-replicas (called Bookies). When Q of N log-

storage-replicas have acknowledged the receipt of the log entry, it is

durable. Since N - Q log-storage-replicas may be slow, the expected

and observed SLO for writing to the log is dramatically reduced.

The database system is predictably faster with less variability.

AWS Aurora[

] carries this further with replicated AWS-

storage-servers for log replay and block creation. These AWS-storage-

servers are eectively the lower half of the database. They are placed

with 2 servers in each of 3 AZs (Availability Zones). When 4 of

these servers have acknowledged the log entry, Aurora considers

the log entry to be durable. Transactions having commit records

in the entry may be conrmed to the waiting human. Even when

AZ+1

failures have happened, the commit record can be read later.

Logging is fast even if 2 of the 6 AWS-storage-servers are slow.

Still, major portions of the database run in a single server.

If that server is jittery, humans will experience unpredictable delays.

Can ALL of the database be responsive

atop unpredictable servers?

Can every aspect of a database be decentralized

and responsive even when some of its pieces are not?

1.2 Applicability to a Broad Range of Systems.

This paper is about jitter-avoidance through the use of quorum. It

investigates techniques to manage complex systems based on an al-

ternate form of order (called seniority) that is largely decoupled from

classic happened before partial-order messaging guarantees[35].

By sketching a design for a transactional database, we

demonstrates the broad applicability of these principles

to many complex distributed systems.

Paxos[36] provides linear order but it jitters

We provide partial order and avoid jitter

1.3 Availability in a Complex System

The phone system over land lines oers amazing availability. Occa-

sionally, a call is dropped but you can redial and connect.

Availability is dened as the opportunity to dial another call.

In transactional database systems, transactions may fail. Lock

conicts or other problems can cause work to be discarded. You

don’t want this to be common but once in a while is OK. When a

transaction aborts, the application can restart the work and hope-

fully succeed. In a distributed database, restarted transactions may

See the paper Thinking about Availability in Large Service Infrastructures [

] for great

discussions of SLAs (Service Level Agreements), SLOs (Service Level Objectives), and

SLIs (Service Level Indicators). The discussion also includes multi-dimensional SLOs.

Tolerating AZ+1 failures means an entire AZ can be lost at the same time as one more

server is down for other reasons [46].

route to a dierent database server. If a single transaction gets stuck,

the application can abandon it, retry, and probably succeed.

Our decoupled transactions database may occasionally timeout

while doing a transaction. Even so, it remains open for new business.

1.4 Snapshot Isolation Guarantees

The goal for this design is to imagine a database that runs many

existing applications with excellent availability and responsiveness.

Special attention is required to when, how, and with what guar-

antees the application sees and changes its data. Snapshot Isolation

[

] denes the database behavior that an application sees when

running with concurrent work. It is arguably the most common

isolation guarantee provided by modern commercial databases. In

its basic form, snapshot isolation provides guarantees when reading

and committing updates in a transaction:

•

Snapshot reads: Each transaction gets a snapshot time as it

begins and, as it reads data, sees updates from transactions

committed before the snapshot time and its own updates.

•

Conict detection prior to commit: As a transaction is

about to commit, the database will ensure that none of the

updated records has been changed by other transactions

since this transaction’s snapshot time.

Snapshot isolation is extremely popular. Many existing applications

depend on these guarantees for their successful execution.

1.5 What’s Jitter?

Jitter mean behavior dierent than the expected norm. In particular,

for response time varying from what’s expected. In electronics, it

means deviation from the circuit’s expected clock timing.

In networking, jitter describes larger than expected variance in

the delay moving through the network [

]. Servers may jitter for

many reasons from gray failure to Operating Systems to Java VMs.

See §12: (Appendix A: Building on a Jittery Foundation).

Within a distributed system, jitter comprises both variability

from the network and variability from the server processing a request.

From the standpoint of the client issuing a request, it can’t tell the

cause of the delay, just that the response is not as zippy as hoped.

If part of the cluster is not responding, those servers may or

may not be doing work. For servers spread widely within or across

data centers, an observer may see some servers responding quickly

but not all of them. The ones WE see responding quickly may not be

providing prompt responses to other servers in the datacenter.

What can we assume so we can make progress?

1.6 Quorum: Avoiding Snapshot Isolation Jitter

Snapshot isolation transactions can do independent changes to the

database as long as there are no conicting updates and each trans-

action sees only snapshot reads. What if these two things can be

accomplished without stalling behind jittery servers?

We can avoid stalling

when we do work with quorums. The

idea is to ask a collection of servers to do each operation and wait

for only some of them to answer. Suppose we have a quorum of N

servers and we only need answers from Q of them (where Q > N/2).

When we get Q answers, we know that at least one server of the Q

has already seen any single previous operation. By combining all

Q answers, we can know about previous work. F is our maximum

The probability of stalling drops dramatically when we don’t wait for every server.

of 30

免费下载

论文

关注

评论