Distributed Architecture of Oracle Database In-memory.pdf

章芋文

233

12页

1次

2022-06-13

免费下载

Distributed Architecture of Oracle Database In-memory

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase,

Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil

Macnaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, Mohamed Zait

Oracle Corporation

500 Oracle Parkway, Redwood Shores, CA 94065

{Niloy.Mukherjee}@Oracle.com

ABSTRACT

Over the last few years, the information technology industry has

witnessed revolutions in multiple dimensions. Increasing

ubiquitous sources of data have posed two connected challenges

to data management solutions – processing unprecedented

volumes of data, and providing ad-hoc real-time analysis in

mainstream production data stores without compromising regular

transactional workload performance. In parallel, computer

hardware systems are scaling out elastically, scaling up in the

number of processors and cores, and increasing main memory

capacity extensively. The data processing challenges combined

with the rapid advancement of hardware systems has necessitated

the evolution of a new breed of main-memory databases

optimized for mixed OLTAP environments and designed to scale.

The Oracle RDBMS In-memory Option (DBIM) is an industry-

first distributed dual format architecture that allows a database

object to be stored in columnar format in main memory highly

optimized to break performance barriers in analytic query

workloads, simultaneously maintaining transactional consistency

with the corresponding OLTP optimized row-major format

persisted in storage and accessed through database buffer cache.

In this paper, we present the distributed, highly-available, and

fault-tolerant architecture of the Oracle DBIM that enables the

RDBMS to transparently scale out in a database cluster, both in

terms of memory capacity and query processing throughput. We

believe that the architecture is unique among all mainstream in-

memory databases. It allows complete application-transparent,

extremely scalable and automated distribution of Oracle RDBMS

objects in-memory across a cluster, as well as across multiple

NUMA nodes within a single server. It seamlessly provides

distribution awareness to the Oracle SQL execution framework

through affinitized fault-tolerant parallel execution within and

across servers without explicit optimizer plan changes or query

rewrites.

1. ORACLE DBIM – AN OVERVIEW

The area of data analytics witnessed a revolution in the past

decade with the deluge of data ingestion sources [1]. The past

decade therefore witnessed a resurgence of columnar DBMS

systems, e.g., C-Store [2] and Monet DB [3], as pure columnar

format became a proven standard suited for traditional data

warehousing and analytics practice where the historical data is

first curated in usually dedicated data warehouses, separate from

the transactional data stores used in mainstream production

environment. However, being unsuitable for OLTP workloads,

pure columnar format is not entirely ideal for the real-time

analytics use-case model that demands high performance analysis

of transactional data on the mainstream production data stores. In

comparison, traditional industry-strength row-major DBMS

systems [4] have been well suited for OLTP workloads but have

incurred manageability and complexity overheads required in

computation and maintenance of analytic indexes and OLAP

engines geared towards high performance analytics [4].

Even though Oracle TimesTen [5] was one of the first industry-

strength main-memory databases developed in mid 1990s, it is

only over the last few years that the explosion in processing and

memory capacity in commodity systems has resulted in the

resurgence of main memory based database systems. These

include columnar technologies such as SAP HANA [6], IBM

BLU [7] etc. as well as row-oriented ones such as Oracle

TimesTen and H-Store [8]. As DRAM capacity keeps on

increasing and becoming cheaper [9], main memory no longer

remains a limited resource. Today’s multi core, multiprocessor

servers provide fast communication between processor cores via

main memory, taking full advantages of main memory

bandwidths. Main memory is therefore being conceived by DBMS

architects more as a primary storage container and less as a cache

optimizing disk based accesses.

With this precursor, we present a quick overview of Oracle

Database In-memory Option (DBIM) [10] [11] that was

introduced in 2014 as the industry-first dual format main-memory

database architected to provide breakthrough performance for

analytic workloads in pure OLAP as well as mixed OLTAP

This work is licensed under the Creative Commons Attribution-

NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this

license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain

permission prior to any use beyond those covered by the license. Contact

invited to present their results at the 41st International Conference on Very

Large Data Bases, August 31st – September 4th 2015, Kohala Coast, Hawaii.

Proceedings of the VLDB Endowment, Vol. 8, No. 12

1630

environments, without compromising or even improving OLTP

performance by alleviating the constraints in creating and

maintaining analytic indexes [11]. The dual-format in-memory

representation (Figure 1) allows an Oracle RDBMS object (table,

table partition, table subpartition) to be simultaneously

maintained in traditional row format logged and persisted in

underlying storage, as well as in column format maintained purely

in-memory without additional logging. The row format is

maintained as a set of on-disk pages or blocks that are accessed

through an in-memory buffer cache [12], while the columnarized

format is maintained as a set of compressed in-memory granules

called in-memory compression units or IMCUs [10][11] in an In-

memory Column Store [11] transactionally consistent with the

row format [13]. By building the column store into the existing

row format based database engine, it is ensured that all of the rich

set of Oracle Database features [4][11] such as database recovery,

disaster recovery, backup, replication, storage mirroring, and node

clustering work transparently with the IM column store enabled,

without any change in mid-tier and application layers.

Figure 1. Dual-format representation of Oracle DBIM.

The dual format representation is highly optimized for maximal

utilization of main memory capacity. The Oracle Database buffer

cache used to access the row format has been optimized over

decades to achieve extremely high hit-rates even with a very small

size compared to the database size. As the In-memory Column

Store replaces analytic indexes, the buffer cache gets better

utilized by actual row-organized data pages. Besides providing

query performance optimized compression schemes, Oracle

DBIM also allows the columnar format to be compressed using

techniques suited for higher capacity utilization [11].

Unlike a pure in-memory database, the dual format DBIM does

not require the entire database to have to fit in the in-memory

column store to become operational. While the row format is

maintained for all database objects, the user is allowed to specify

whether an individual object (Oracle RDBMS table, partition or

subpartition) should be simultaneously maintained in the in-

memory columnar format. At an object level, Oracle DBIM also

allows users to specify a subset of its columns to be maintained

in-memory. This allows for the highest levels of capacity

utilization of the database through data storage tiering across

main-memory, flash cache, solid state drives, high capacity disk

drives, etc.

A detailed description of Oracle DBIM features is available at the

Proceedings of the 31

International Conference on Data

Engineering 2015 [11]. In this paper, we primarily aim to

concentrate on various aspects of the distributed architecture of

Oracle DBIM, its underlying components, and related methods

behind its transparency and seamlessness.

2. NEED FOR A DISTRIBUTED

ARCHITECTURE

As enterprises are witnessing exponential growth in data ingestion

volumes, a conventional wisdom has developed across the

industry that scaling out using a cluster of commodity servers is

better suited for executing analytic workloads over large data sets

[13]. There are several valid reasons for development of such a

perception. Scale-out enables aggregation of computational

resources of multiple machines into a virtual single machine with

the combined power of all its component machines allowing for

easier elastic expansion [14]. Furthermore, since each node

handles only a part of the entire data set, there may not be the

same contention for CPU and memory resources as characterized

by a centralized DBMS [15]. The scenario is particularly relevant

for main-memory based RDBMSs. With increasing deluge of data

volumes, the main memory of a single machine may not stay

sufficient.

However, in the last couple of years or so, several researchers and

industry experts have raised the question whether it is time to

reconsider scale-up versus scale-out [16]. They provide evidence

that the majority of analytics workloads do not process huge data

sets at a given time. For example, analytics production clusters at

Microsoft and Yahoo have median job input sizes less than 14 GB

[17], and 90% of jobs on a Facebook cluster have input sizes

under 100 GB [17]. Moreover, hardware price trends are

beginning to change performance points. Today’s commodity

servers can affordably hold 100s of GB of DRAM and 32 cores on

a quad socket motherboard with multiple high-bandwidth memory

channels per socket, while high end servers such as the M6

Oracle Sun SuperCluster [18] providing up to 32 TB of DRAM

and 1024 cores, are also becoming more commonplace.

As far as implementation of scale-up parallelism in main-memory

architectures is concerned, it may seem less complex because the

memory address space is completely shared in a single server.

However, current state-of-the-art multi processor systems employ

Non-uniform Memory Access or NUMA [19], a memory design

where the memory access time depends on the memory location

relative to the processor. Accesses from a single processor to

local memory provides lower latency compared to remote memory

accesses as well as alleviates interconnect contention bottlenecks

across remote memory controllers. As a result, a NUMA-based

distributed in-memory framework becomes necessary even in a

single SMP server and gets extremely relevant for larger SMPs

like the SuperCluster.

Even if a monolithic server meets the capacity/performance

requirements of a data processing system, scale-out architectures

can be designed to offer visible benefits of high availability and

minimal recovery; features that are most relevant for a non-

persistent volatile main-memory database [10]. A single main-

memory database server poses the risk of a single point of failure.

In addition, the recovery process (process to re-populate all data

in memory) gets relatively long, leading to extended downtime. A

distributed main-memory system can be designed to be fault

tolerant through replication of in-memory data so that it exists at

more than one site. It also provides the scope to design extremely

efficient redistribution mechanisms for fast recovery.

1631

of 12

免费下载

oracle in-memory 列存 paper

关注

评论