
2
environments, without compromising or even improving OLTP
performance by alleviating the constraints in creating and
maintaining analytic indexes [11]. The dual-format in-memory
representation (Figure 1) allows an Oracle RDBMS object (table,
table partition, table subpartition) to be simultaneously
maintained in traditional row format logged and persisted in
underlying storage, as well as in column format maintained purely
in-memory without additional logging. The row format is
maintained as a set of on-disk pages or blocks that are accessed
through an in-memory buffer cache [12], while the columnarized
format is maintained as a set of compressed in-memory granules
called in-memory compression units or IMCUs [10][11] in an In-
memory Column Store [11] transactionally consistent with the
row format [13]. By building the column store into the existing
row format based database engine, it is ensured that all of the rich
set of Oracle Database features [4][11] such as database recovery,
disaster recovery, backup, replication, storage mirroring, and node
clustering work transparently with the IM column store enabled,
without any change in mid-tier and application layers.
Figure 1. Dual-format representation of Oracle DBIM.
The dual format representation is highly optimized for maximal
utilization of main memory capacity. The Oracle Database buffer
cache used to access the row format has been optimized over
decades to achieve extremely high hit-rates even with a very small
size compared to the database size. As the In-memory Column
Store replaces analytic indexes, the buffer cache gets better
utilized by actual row-organized data pages. Besides providing
query performance optimized compression schemes, Oracle
DBIM also allows the columnar format to be compressed using
techniques suited for higher capacity utilization [11].
Unlike a pure in-memory database, the dual format DBIM does
not require the entire database to have to fit in the in-memory
column store to become operational. While the row format is
maintained for all database objects, the user is allowed to specify
whether an individual object (Oracle RDBMS table, partition or
subpartition) should be simultaneously maintained in the in-
memory columnar format. At an object level, Oracle DBIM also
allows users to specify a subset of its columns to be maintained
in-memory. This allows for the highest levels of capacity
utilization of the database through data storage tiering across
main-memory, flash cache, solid state drives, high capacity disk
drives, etc.
A detailed description of Oracle DBIM features is available at the
Proceedings of the 31
st
International Conference on Data
Engineering 2015 [11]. In this paper, we primarily aim to
concentrate on various aspects of the distributed architecture of
Oracle DBIM, its underlying components, and related methods
behind its transparency and seamlessness.
2. NEED FOR A DISTRIBUTED
ARCHITECTURE
As enterprises are witnessing exponential growth in data ingestion
volumes, a conventional wisdom has developed across the
industry that scaling out using a cluster of commodity servers is
better suited for executing analytic workloads over large data sets
[13]. There are several valid reasons for development of such a
perception. Scale-out enables aggregation of computational
resources of multiple machines into a virtual single machine with
the combined power of all its component machines allowing for
easier elastic expansion [14]. Furthermore, since each node
handles only a part of the entire data set, there may not be the
same contention for CPU and memory resources as characterized
by a centralized DBMS [15]. The scenario is particularly relevant
for main-memory based RDBMSs. With increasing deluge of data
volumes, the main memory of a single machine may not stay
sufficient.
However, in the last couple of years or so, several researchers and
industry experts have raised the question whether it is time to
reconsider scale-up versus scale-out [16]. They provide evidence
that the majority of analytics workloads do not process huge data
sets at a given time. For example, analytics production clusters at
Microsoft and Yahoo have median job input sizes less than 14 GB
[17], and 90% of jobs on a Facebook cluster have input sizes
under 100 GB [17]. Moreover, hardware price trends are
beginning to change performance points. Today’s commodity
servers can affordably hold 100s of GB of DRAM and 32 cores on
a quad socket motherboard with multiple high-bandwidth memory
channels per socket, while high end servers such as the M6
Oracle Sun SuperCluster [18] providing up to 32 TB of DRAM
and 1024 cores, are also becoming more commonplace.
As far as implementation of scale-up parallelism in main-memory
architectures is concerned, it may seem less complex because the
memory address space is completely shared in a single server.
However, current state-of-the-art multi processor systems employ
Non-uniform Memory Access or NUMA [19], a memory design
where the memory access time depends on the memory location
relative to the processor. Accesses from a single processor to
local memory provides lower latency compared to remote memory
accesses as well as alleviates interconnect contention bottlenecks
across remote memory controllers. As a result, a NUMA-based
distributed in-memory framework becomes necessary even in a
single SMP server and gets extremely relevant for larger SMPs
like the SuperCluster.
Even if a monolithic server meets the capacity/performance
requirements of a data processing system, scale-out architectures
can be designed to offer visible benefits of high availability and
minimal recovery; features that are most relevant for a non-
persistent volatile main-memory database [10]. A single main-
memory database server poses the risk of a single point of failure.
In addition, the recovery process (process to re-populate all data
in memory) gets relatively long, leading to extended downtime. A
distributed main-memory system can be designed to be fault
tolerant through replication of in-memory data so that it exists at
more than one site. It also provides the scope to design extremely
efficient redistribution mechanisms for fast recovery.
1631
评论