从表象来看,数据库的启动极其简单,只需要以SYSDBA/SYSOPER身份登陆,敲一条startup命令既可启动数据库。然而在这条命令之后,Oracle需要执行一系列复杂的操作,深入理解这些操作不仅有助于了解Oracle数据库的运行机制,还可以在故障发生时帮助大家快速的定位问题的根源所在,所以接下来让我们一起分析一下数据库的启动过程。
Oracle数据库的启动主要包含三个步骤:
- 启动数据库到Nomount状态
- 启动数据库到Mount状态
- 启动数据库到Open状态
完成这三个过程,数据库才能进入就绪状态,准备提供数据访问。图1-1描述了一个数据库从关闭状态(shutdown)到OPEN状态经历的这些步骤(关闭步骤与此相反,是一个逆向过程):
图1 数据库的起停步骤
1.启动数据库到Nomount状态
在启动的第一步骤,Oracle首先寻找参数文件(pfile / spfile),然后根据参数文件中的设置(如内存分配等设置),创建实例(INSTANCE),分配内存,启动后台进程。Nomount的过程也就是启动数据库实例的过程。这个过程在后台是启动Oracle可执行程序的过程,Windows上是执行oracle.exe文件进行初始化,在Unix/Linux上是执行oracle可执行文件进行初始化。
1.1 Oracle的可执行文件
Windows上Oracle 11g的执行文件大小约为86M,而Linux上Oracle 11g的执行文件达到145M左右,在Oracle 12c版本中,Oracle可执行文件的大小达到了约280M,由此可见不同版本源码复杂度的增加:
D:\oracle\product\11.1.0\BIN>dir oracle.exe 2007-10-03 17:42 89,702,400 oracle.exe [oracle@localhost bin]$ ls -al $ORACLE_HOME/bin/oracle -rwsr-s--x 1 oracle dba 151901909 Jul 4 15:13 /oracle/product/11.1.0/bin/oracle eygle-/home/oracle$ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x. 1 oracle oinstall 281911888 Mar 16 23:40 /oracle/product/12.1.0/bin/oracle
复制
在Unix/Linux上可以通过file命令查看oracle执行文件来判断Oracle是64位或是32位的,以下是Linux平台的一个示范输出,输出显示Oracle为32位:
[oracle@localhost bin]$ file $ORACLE_HOME/bin/oracle /opt/oracle/product/11.1.0/bin/oracle: setuid setgid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped
复制
在Windows上,也有可选的命令增强工具可以提供类似的功能。由于32位的Oracle数据库,最多只能使用约1.7G的内存,会极大约束Oracle的性能,在生产环境中,应当使用64位系统,部署64位的Oracle数据库。
了解Oracle可执行文件还有另外一个用途,在Unix/Linux上通过strings命令可以将oracle可执行文件中的字符文本转储出来,在转储的文本中可以找到很多有意思的信息,比如一些Oracle未公开的Hints信息、数据库字典基表创建信息等,类似如下一条命令可以完成这样的工作:
strings $ORACLE_HOME/bin/oracle > oracle.txt
复制
比如如下命令显示了源码中和Oracle Database 12c新特性Pluggable数据库相关的一些信息:
eygle-/home/oracle$ grep -i pluggable oracle.txt|more Pluggable Database Open/Close Pluggable Database Enqueue held by foreground or DBWR to synchronize database mount/open or pluggable database open with other operations Synchronize pluggable database open/close _enable_pluggable_database Enable Pluggable Database _pluggable_database_debug Enable checking of pluggable database ID in redo Debug flag for pluggable database related operations pluggable database open pluggable database close Pluggable database resetlogs Coordinate pluggable database operations Pluggable Database File Copy PLUGGABLE_DATABASE Pluggable database Unsupported Rollback Segment/Undo Tablespace operation issued (and ignored) in a Pluggable Database n
复制
在Oracle 10g的版本中,某位程序员甚至引入了一段英国Radiohead 乐队 Creep 这首歌的歌词,他说他不属于这里:
[eygle@hpserver2 ~]$ strings $ORACLE_HOME/bin/oracle |grep radiohead I'm a creep, I'm a winner, what the hell am I doing here.I don't belong here - radiohead
复制
当然在后续版本中,这段表白被去除了。
1.2 实例以及进程的创建
在Nomount初始化的过程中,只要拥有了一个参数文件,就可以凭之启动实例(INSTANCE),这一步骤并不需要任何控制文件或数据文件等的参与。以下是在Linux平台上正常启动实例到nomount状态的过程:
eygle-/home/oracle$ export ORACLE_SID=eygle eygle-/home/oracle$ echo "db_name=eygle" > $ORACLE_HOME/dbs/initeygle.ora eygle-/home/oracle$ sqlplus "/ as sysdba" SQL*Plus: Release 12.1.0.0.1 Beta on Thu Jun 14 16:59:39 2012 Copyright (c) 1982, 2012, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount; ORACLE instance started. Total System Global Area 238034944 bytes Fixed Size 2272840 bytes Variable Size 180355512 bytes Database Buffers 50331648 bytes Redo Buffers 5074944 bytes
复制
注意观察,Oracle根据参数文件的内容,创建了instance,分配了相应的内存区域,启动了相应的后台进程。SGA的分配信息从以上输出中可以看到。
观察告警日志文件(alert_<ORACLE_SID>.log),可以看到这一阶段的启动过程:读取参数文件,应用参数启动实例。所有在参数文件中定义的非缺省参数都会记录在告警日志文件中,以下是这一过程的日志摘要示例:
Thu Jun 14 16:55:14 2012 WARNING: unknown state for DB spfile location resource Starting ORACLE instance (normal) CLI notifier numLatches:3 maxDescs:222 ****************** Large Pages Information ***************** Total System Global Area in large pages = 0 KB (0%) Large pages used by this instance: 0 (0 KB) Large pages unused system wide = 0 (0 KB) (alloc incr 4096 KB) Large pages configured system wide = 0 (0 KB) Large page size = 2048 KB RECOMMENDATION: Total System Global Area size is 230 MB. For optimal performance, prior to the next instance restart increase the number of unused large pages by at least 115 (page size 2048 KB, total size 230 MB) system wide to get 100% of the System Global Area allocated with large pages *********************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/oracle/product/12.1.0/dbs/arch Autotune of undo retention is turned on. IMODE=BR ILAT =22 LICENSE_MAX_USERS = 0 SYS auditing is disabled kfvLoadLib can not be called in this instance type Starting up: Oracle Database 12c Enterprise Edition Release 12.1.0.0.1 - 64bit Beta With the Partitioning, OLAP, Data Mining and Real Application Testing options. ORACLE_HOME = /u01/oracle/product/12.1.0 System name: Linux Node name: eygle Release: 2.6.32-100.28.5.el6.x86_64 Version: #1 SMP Wed Feb 2 18:40:23 EST 2011 Machine: x86_64 VM name: VMWare Version: 6 Using parameter settings in server-side pfile /u01/oracle/product/12.1.0/dbs/initeygle.ora System parameters with non-default values: db_name = "eygle"
复制
应用参数创建实例之后,后台进程依次启动。
注意在以下日志输出中包含了PID信息以及OS ID两个信息,PID代表该进程在数据库内部的标识符编号,而OS ID则代表该进程在操作系统上的进程编号:
Thu Jun 14 16:59:43 2012 PMON started with pid=2, OS id=2609 Thu Jun 14 16:59:43 2012 PSP0 started with pid=3, OS id=2611 Thu Jun 14 16:59:44 2012 VKTM started with pid=4, OS id=2613 VKTM running at (100ms) precision Thu Jun 14 16:59:44 2012 GEN0 started with pid=5, OS id=2617 Thu Jun 14 16:59:44 2012 DIAG started with pid=6, OS id=2619 Thu Jun 14 16:59:44 2012 OFSD started with pid=7, OS id=2621 Thu Jun 14 16:59:44 2012 DBRM started with pid=8, OS id=2623 Thu Jun 14 16:59:44 2012 DIA0 started with pid=9, OS id=2625 Thu Jun 14 16:59:44 2012 MMAN started with pid=10, OS id=2627 Thu Jun 14 16:59:44 2012 DBW0 started with pid=11, OS id=2629 Thu Jun 14 16:59:44 2012 LGWR started with pid=12, OS id=2631 Thu Jun 14 16:59:44 2012 CKPT started with pid=13, OS id=2633 Thu Jun 14 16:59:44 2012 SMON started with pid=14, OS id=2635 Thu Jun 14 16:59:44 2012 RECO started with pid=15, OS id=2637 Thu Jun 14 16:59:44 2012 LREG started with pid=16, OS id=2639 Thu Jun 14 16:59:44 2012 MMON started with pid=17, OS id=2641 Thu Jun 14 16:59:44 2012 MMNL started with pid=18, OS id=2643 ORACLE_BASE from environment = /u01/oracle Using default pga_aggregate_limit of 1024 MB
复制
在这里提醒大家注意一下Oracle不同版本告警日志信息的变化,在Oracle 9i早期版本中,后台进程启动的日志信息里并不包含OS ID,以下是Oracle 9.2.0.4的日志信息(在Oracle 9.2.0.8中已经包含了OS ID信息):
PMON started with pid=2 DBW0 started with pid=3 LGWR started with pid=4 CKPT started with pid=5 SMON started with pid=6 RECO started with pid=7
复制
自Oracle Database 11g开始,这部分信息有了进一步的增强,输出中不仅包含了OS ID,而且每个后台进程的启动都有单独的时间标记(时间标记可以帮助判断每个后台进程启动时所消耗的时间,从而辅助进行问题诊断):
Sat Jul 05 09:53:55 2008 PMON started with pid=2, OS id=13898 Sat Jul 05 09:53:55 2008 VKTM started with pid=3, OS id=13900 at elevated priority VKTM running at (20)ms precision Sat Jul 05 09:53:55 2008 DIAG started with pid=4, OS id=13904 Sat Jul 05 09:53:55 2008 DBRM started with pid=5, OS id=13906 Sat Jul 05 09:53:55 2008 PSP0 started with pid=6, OS id=13908
复制
提示:从Oracle不同版本中的变化来体会Oracle的技术进步、甚至借鉴这些变化是学习Oracle的方法之一。任何细微的变化都值得注意,认真、细致、严谨是对DBA的基本素质要求。