the same data. One replica tablet is elected as the Paxos
leader for the group, and that leader is the entry point for
all transactional activity for the group. Groups may also
include readonly replicas, which do not vote in the Paxos
algorithm and cannot become the group leader.
Spanner provides serializable pessimistic transactions us-
ing strict two-phase locking. A transaction includes mul-
tiple reads, taking shared or exclusive locks, followed by a
single write that upgrades locks and atomically commits the
transaction. All commits are synchronously replicated us-
ing Paxos. Transactions are most efficient when updating
data co-located in a single group. Spanner also supports
transactions across multiple groups, called transaction par-
ticipants, using a two-phase commit (2PC) protocol on top
of Paxos. 2PC adds an extra network round trip so it usu-
ally doubles observed commit latency. 2PC scales well up to
10s of participants, but abort frequency and latency increase
significantly with 100s of participants [7].
Spanner has very strong consistency and timestamp se-
mantics. Every transaction is assigned a commit timestamp,
and these timestamps provide a global total ordering for
commits. Spanner uses a novel mechanism to pick glob-
ally ordered timestamps in a scalable way using hardware
clocks deployed in Google datacenters. Spanner uses these
timestamps to provide multi-versioned consistent reads, in-
cluding snapshot reads of current data, without taking
read locks. For guaranteed non-blocking, globally consis-
tent reads, Spanner provides a global safe timestamp, below
which no in-flight or future transaction can possibly com-
mit. The global safe timestamp typically lags current time
by 5-10 seconds. Reads at this timestamp can normally run
on any replica tablet, including readonly replicas, and they
never block behind running transactions.
3. DATA MODEL
3.1 Hierarchical Schema
The F1 data model is very similar to the Spanner data
model. In fact, Spanner’s original data model was more like
Bigtable, but Spanner later adopted F1’s data model. At
the logical level, F1 has a relational schema similar to that
of a traditional RDBMS, with some extensions including
explicit table hierarchy and columns with Protocol Buffer
data types.
Logically, tables in the F1 schema can be organized into
a hierarchy. Physically, F1 stores each child table clustered
with and interleaved within the rows from its parent table.
Tables from the logical schema cannot be arbitrarily inter-
leaved: the child table must have a foreign key to its parent
table as a prefix of its primary key. For example, the Ad-
Words schema contains a table Customer with primary key
(CustomerId), which has a child table Campaign with pri-
mary key (CustomerId, CampaignId), which in turn has
a child table AdGroup with primary key (CustomerId,
CampaignId, AdGroupId). A row of the root table in the
hierarchy is called a root row. All child table rows corre-
sponding to a root row are clustered together with that root
row in a single Spanner directory, meaning that cluster is
normally stored on a single Spanner server. Child rows are
stored under their parent row ordered by primary key. Fig-
ure 2 shows an example.
The hierarchically clustered physical schema has several
advantages over a flat relational schema. Consider the cor-
responding traditional schema, also depicted in Figure 2. In
this traditional schema, fetching all Campaign and AdGroup
records corresponding to a given CustomerId would take two
sequential steps, because there is no direct way to retrieve
AdGroup records by CustomerId. In the F1 version of the
schema, the hierarchical primary keys allow the fetches of
Campaign and AdGroup records to be started in parallel,
because both tables are keyed by CustomerId. The primary
key prefix property means that reading all AdGroups for
a particular Customer can be expressed as a single range
read, rather than reading each row individually using an
index. Furthermore, because the tables are both stored in
primary key order, rows from the two tables can be joined
using a simple ordered merge. Because the data is clustered
into a single directory, we can read it all in a single Spanner
request. All of these properties of a hierarchical schema help
mitigate the latency effects of having remote data.
Hierarchical clustering is especially useful for updates,
since it reduces the number of Spanner groups involved in
a transaction. Because each root row and all of its descen-
dant rows are stored in a single Spanner directory, trans-
actions restricted to a single root will usually avoid 2PC
and the associated latency penalty, so most applications try
to use single-root transactions as much as possible. Even
when doing transactions across multiple roots, it is impor-
tant to limit the number of roots involved because adding
more participants generally increases latency and decreases
the likelihood of a successful commit.
Hierarchical clustering is not mandatory in F1. An F1
schema often has several root tables, and in fact, a com-
pletely flat MySQL-style schema is still possible. Using hi-
erarchy however, to the extent that it matches data seman-
tics, is highly beneficial. In AdWords, most transactions are
typically updating data for a single advertiser at a time, so
we made the advertiser a root table (Customer) and clus-
tered related tables under it. This clustering was critical to
achieving acceptable latency.
3.2 Protocol Buffers
The F1 data model supports table columns that con-
tain structured data types. These structured types use the
schema and binary encoding format provided by Google’s
open source Protocol Buffer [16] library. Protocol Buffers
have typed fields that can be required, optional, or repeated;
fields can also be nested Protocol Buffers. At Google, Proto-
col Buffers are ubiquitous for data storage and interchange
between applications. When we still had a MySQL schema,
users often had to write tedious and error-prone transfor-
mations between database rows and in-memory data struc-
tures. Putting protocol buffers in the schema removes this
impedance mismatch and gives users a universal data struc-
ture they can use both in the database and in application
code.
Protocol Buffers allow the use of repeated fields. In F1
schema designs, we often use repeated fields instead of child
tables when the number of child records has a low upper
bound. By using repeated fields, we avoid the performance
overhead and complexity of storing and joining multiple
child records. The entire protocol buffer is effectively treated
as one blob by Spanner. Aside from performance impacts,
Protocol Buffer columns are more natural and reduce seman-
tic complexity for users, who can now read and write their
logical business objects as atomic units, without having to
文档被以下合辑收录
相关文档
评论