
the operation of one SQL statement may involve multiple
physical data tables.
(4)Distributed transaction
After the data is stored in multiple physical data
tables through data sharding, it is also necessary to ensure
the consistency and integrity of the data in a distributed
environment, which requires the use of distributed
transactions. Distributed transactions generally use a two-
phase commit protocol (2pc protocol).
III. SCHEME DESIGN AND REALIZATION
A. Design and Implementation of Sharding Scheme of
Distributed Database Storage Engine
1. Database distributed design
In distributed database design, database distribution
design includes database fragmentation design and
fragment allocation design, which are closely related.
The database distribution design should consider the
following goals:
(1)Availability and reliability. A distributed
database system is composed of multiple nodes, and
multiple copies of data are stored on each node to ensure
the reliability and availability of the system.
(2)Deal with locality. Data distribution should be
based on satisfying local operations as much as possible,
even if most operations are completed in local sites. This
requires dividing the data and placing the data fragments
on the most frequent site or the closest site as possible.
(3)Storage cost. The distribution of multiple data
copies is affected by the storage capacity of each node.
Although storage capacity is not very important compared
to the application's CPU, I/O, and network transmission
costs, it is still a factor that should be considered when
designing.
(4)Load distribution. Reasonably distribute
workloads on various sites of the network, so as to give
full play to the capabilities of computers in various places
and increase the parallelism of the execution of various
applications. Load distribution and processing locality
may conflict, so a comprehensive trade-off must be made
when designing data distribution.
2 Distributed storage design
In order to realize data sharding, the distributed
transactional database uses a three-tier architecture model.
The three-tier architecture diagram mode is shown in
Figure1.
The three-layer model is an abstraction of data
nodes, which is divided into data node layer, data source
layer and data space layer. When receiving a SQL request
from the user, first query in the data space layer, and then
obtain the data source layer from the found data space
layer, and then obtain the data that needs to be executed
according to the read/write type of SQL and the
characteristics of the data source One or several back-end
data nodes, and then execute specific SQL statements.
Figure 1 Three-layer model diagram
3.Design of data node layer
A data node object is a specific back-end data node
instance. All SQL requests for distributed databases will
eventually be implemented on a specific one or several
data nodes, that is, one or several back-end data node
instances. on. The data node layer uses protocols to
directly deal with the network layer, so it is necessary to
provide network interface and network connection related
information. It is shown in Figure2.
Figure2 Data node design class diagram
The data node layer is the first level of abstraction
of the back-end single-node instance of the distributed
transaction database. A data node stores connection
information, master-slave information, and so on of a
back-end data node instance.
4 Design of data source layer
The data source layer is the interaction layer
between the data space layer and the data node layer, that
is, the data source of the data space layer. There are five
types of data sources in this article, namely, single-node
data sources, primary and backup data sources, separate
read-write data sources, master-slave replication data
sources, and load balancing data. The latter four are
assembled from single-node data sources, and the overall
design adopts a tree structure.It is shown in Figure3.
DBScale
Data space layer
Data source layer
Data node layer
Data node layer
Data space layer
Network protocol
Database
Database
Database
Data source layer
SQL
request
相关文档
评论