
based on the data volume; i.e., if the data volume is relatively small, a
centralized database with complete functionality is chosen. Further,
if the data volume is huge, a distributed database or distributed
storage system is selected, thus sacricing the functionality and
stand-alone performance in order to address the issue by modifying
the business or adding machines.
We further enhanced OceanBase [
47
] to version 4.0, expecting
that it would better support small-scale enterprises. The system
integrates several storage shards with a shared log stream and
provides a high-availability service. Owing to the advancement
in technology, contemporary machines have come to feature mul-
tiple cores, large amounts of DRAM, and high-speed storage de-
vices. This highlights the importance of considering both horizontal
and vertical scalability in the design of a distributed database sys-
tem. Accordingly, we have developed Paetica
1
as a hybrid shared-
nothing/shared-everything cloud database system capable of sup-
porting both stand-alone and distributed integrated architecture.
We will describe the concept of Paetica in detail with the following
contributions.
•
We propose Paetica, a stand-alone and distributed inte-
grated architecture that is implemented in version 4.0 of
the OceanBase system. Paetica features independent SQL,
transaction, and storage engines in both the stand-alone
and distributed systems, which enables the dynamic cong-
uration switching by the user. The integrated architecture
design allows OceanBase to operate eciently without in-
curring the distributed interaction overhead in the stand-
alone mode. Furthermore, while operating in the distributed
mode, the system achieves high performance besides pro-
viding disaster tolerance.
•
We have developed a stand-alone and distributed integrated
SQL engine that is capable of processing SQL in diverse
situations. The engine has been designed to execute SQL
both in the serial and parallel manner to fully utilize the
available CPU cores. Furthermore, in distributed execu-
tion scenarios, the engine is capable of parallelism across
multiple machines that allows ecient processing of SQL
commands.
•
We have constructed a stand-alone and distributed inte-
grated LSM-Tree storage engine that includes various com-
paction optimizations for both the stand-alone and dis-
tributed modes. These optimizations include the techniques
such as incremental major compaction and staggered round-
robin compaction, which intends to achieve a balance be-
tween the write performance and storage space utilization.
•
For the stand-alone and distributed integrated transaction
processing engine, we have proposed an optimized version
of the 2-Phase Commit (2PC) protocol. This optimization
intends to reduce the message processing and log volume,
and subsequently decrease transaction latency. In the stand-
alone mode, Paetica does not require the use of 2PC and
instead utilizes a single log stream to process transactions
without accessing the global time service (GTS). Conse-
quently, the eciency of the transaction engine is compa-
rable to that of a stand-alone database.
1
Paetica is OceanBase version 4.0.
We have conducted scalability experiments to demonstrate the
linear scalability of Paetica. Our OLTP (Online Transaction Pro-
cessing) experiments also demonstrate that Paetica exhibits high
concurrency and low latency in both stand-alone and distributed
modes. We have also compared OceanBase 4.0 with MySQL 8.0 in a
separate experiment and found that OceanBase 4.0 performs better
than MySQL 8.0 in small-scale and stand-alone situations. Further-
more, we have compared Paetica with OceanBase version 3.1 and
Greenplum [
31
] 6.22.1 on the TPC-H [
7
] 100GB experiments, and it
is observed that Paetica outperforms OceanBase 3.1 5x on average.
Compared with Greenplum 6.22.1, Paetica demonstrates a superior
performance across all queries.
The paper is organized as follows. §2 presents the OceanBase
evolution. §3 provides an overview of the stand-alone and dis-
tributed architecture. §4 and §5 describe the integrated engine of
SQL and transaction processing. We present the experiments in §6
to prove the ecacy and economy of Paetica. We present discussion
including polymorphisim, dynamism and native multi-tenancy in
§7, and we review the related work in §8. Finally, we give the con-
clusions in §9. OceanBase is an open-source project under Mulan
Public License 2.0 [
2
] and the source code referenced in this paper
is available on both gitee [3] and GitHub [4].
2 OCEANBASE EVOLUTION
In this section, we illustrate the evolution of OceanBase from ver-
sion 0.5 to version 4.0.
2.1 OceanBase 0.5
OceanBase [
4
] has been developed since 2010. Figure 1 is the over-
all architecture diagram of OceanBase version 0.5. Concomitantly,
OceanBase has been divided into two layers, viz., storage and com-
puting. The upper layer is a service layer that provides SQL services
statelessly, and the lower layer is a storage cluster composed of two
kinds of servers: ChunkServer and UpdateServer. The ChunkServer
is characterized by the capability for automatic partitioning and
horizontal scalability of storage. The UpdateServer utilizes the Paxos
protocol [
28
] to attain strong consistency and availability. However,
the UpdateServer does not possess the capability for distributed
transactions. Such an architecture can enable OceanBase to bet-
ter support businesses similar to Taobao favorites [
1
]. Further, it
has certain scalability, particularly a relatively strong scalability of
reading, and the SQL layer is stateless and can be scaled freely.
Despite the advantages of this architecture, a major issue is that
the UpdateServer node is a single-point write, multi-point read archi-
tecture, which is similar to PolarDB [
13
][
29
] and makes it dicult
to expand when higher levels of concurrency become necessary.
Furthermore, according to Figure 1, splitting the storage and SQL
layers results in a high query delay. It is dicult to control the net-
work jitter, and controlling the jitter of latency can be challenging
under conditions of extremely high latency requirements.
2.2 OceanBase 1.0 ~ 3.0
To address the aforementioned issues, OceanBase has abandoned
its previous architecture and developed the version 1.0 ~ 3.0, which
is characterized by a fully peer-to-peer (P2P) structure as shown
Figure 2. Each OBServer contains SQL, storage, and transaction
3729
评论