• F1 users make heavy use of batching, parallelism and
asynchronous reads. We use a new ORM (object-
relational mapping) library that makes these concepts
explicit. This places an upper bound on the number of
RPCs required for typical application-level operations,
making those operations scale well by default.
The F1 system has been managing all AdWords advertis-
ing campaign data in production since early 2012. AdWords
is a vast and diverse ecosystem including 100s of applica-
tions and 1000s of users, all sharing the same database. This
database is over 100 TB, serves up to hundreds of thousands
of requests per second, and runs SQL queries that scan tens
of trillions of data rows per day. Availability reaches five
nines, even in the presence of unplanned outages, and ob-
servable latency on our web applications has not increased
compared to the old MySQL system.
We discuss the AdWords F1 database throughout this pa-
per as it was the original and motivating user for F1. Several
other groups at Google are now beginning to deploy F1.
2. BASIC ARCHITECTURE
Users interact with F1 through the F1 client library.
Other tools like the command-line ad-hoc SQL shell are im-
plemented using the same client. The client sends requests
to one of many F1 servers, which are responsible for reading
and writing data from remote data sources and coordinating
query execution. Figure 1 depicts the basic architecture and
the communication between components.
F1 Client
Load Balancer
...
Spanner
CFS
F1
Server
Slave
Pool
Slave
Pool
Spanner
CFS
F1
Server
Slave
Pool
Slave
Pool
F1
Master
F1
Master
Figure 1: The basic architecture of the F1 system,
with servers in two datacenters.
Because of F1’s distributed architecture, special care must
be taken to avoid unnecessarily increasing request latency.
For example, the F1 client and load balancer prefer to con-
nect to an F1 server in a nearby datacenter whenever possi-
ble. However, requests may transparently go to F1 servers
in remote datacenters in cases of high load or failures.
F1 servers are typically co-located in the same set of dat-
acenters as the Spanner servers storing the data. This co-
location ensures that F1 servers generally have fast access
to the underlying data. For availability and load balancing,
F1 servers can communicate with Spanner servers outside
their own datacenter when necessary. The Spanner servers
in each datacenter in turn retrieve their data from the Colos-
sus File System (CFS) [14] in the same datacenter. Unlike
Spanner, CFS is not a globally replicated service and there-
fore Spanner servers will never communicate with remote
CFS instances.
F1 servers are mostly stateless, allowing a client to com-
municate with a different F1 server for each request. The
one exception is when a client uses pessimistic transactions
and must hold locks. The client is then bound to one F1
server for the duration of that transaction. F1 transactions
are described in more detail in Section 5. F1 servers can
be quickly added (or removed) from our system in response
to the total load because F1 servers do not own any data
and hence a server addition (or removal) requires no data
movement.
An F1 cluster has several additional components that
allow for the execution of distributed SQL queries. Dis-
tributed execution is chosen over centralized execution when
the query planner estimates that increased parallelism will
reduce query processing latency. The shared slave pool con-
sists of F1 processes that exist only to execute parts of dis-
tributed query plans on behalf of regular F1 servers. Slave
pool membership is maintained by the F1 master, which
monitors slave process health and distributes the list of avail-
able slaves to F1 servers. F1 also supports large-scale data
processing through Google’s MapReduce framework [10].
For performance reasons, MapReduce workers are allowed to
communicate directly with Spanner servers to extract data
in bulk (not shown in the figure). Other clients perform
reads and writes exclusively through F1 servers.
The throughput of the entire system can be scaled up by
adding more Spanner servers, F1 servers, or F1 slaves. Since
F1 servers do not store data, adding new servers does not
involve any data re-distribution costs. Adding new Spanner
servers results in data re-distribution. This process is com-
pletely transparent to F1 servers (and therefore F1 clients).
The Spanner-based remote storage model and our geo-
graphically distributed deployment leads to latency char-
acteristics that are very different from those of regu-
lar databases. Because the data is synchronously repli-
cated across multiple datacenters, and because we’ve cho-
sen widely distributed datacenters, the commit latencies are
relatively high (50-150 ms). This high latency necessitates
changes to the patterns that clients use when interacting
with the database. We describe these changes in Section 7.1,
and we provide further detail on our deployment choices, and
the resulting availability and latency, in Sections 9 and 10.
2.1 Spanner
F1 is built on top of Spanner. Both systems were devel-
oped at the same time and in close collaboration. Spanner
handles lower-level storage issues like persistence, caching,
replication, fault tolerance, data sharding and movement,
location lookups, and transactions.
In Spanner, data rows are partitioned into clusters called
directories using ancestry relationships in the schema. Each
directory has at least one fragment, and large directories
can have multiple fragments. Groups store a collection of
directory fragments. Each group typically has one replica
tablet per datacenter. Data is replicated synchronously using
the Paxos algorithm [18], and all tablets for a group store
文档被以下合辑收录
评论