在 LINUX 下通过 PCS配置 Oracle 的 HA 架构模式

原创易安珠 2020-04-17

4363

相较于rhel6/centos6 下的HA配置，rhel7 下使用pcs 取代了原先的ccs和luci ，结合pcs+pacemaker+corosyn+oracle 可以搭建 Oracle 的HA配置，并迁移数据共享平台数据库。使用corosync作为集群消息事务层（Massage Layer），pacemaker作为集群资源管理器（Cluster Resource Management），pcs作为CRM的管理接口工具。

编辑host文件：
pcs01 ： 10.228.110.126
pcs02 ：10.228.110.128
VIP ： 192.168.110.133

配置节点间互信，可使用如下脚本及命令配置

./sshUserSetup.sh -user root -hosts “hacluster01 hacluster02” -advanced –noPromptPassphrase
./sshUserSetup.sh -user root -hosts “hacluster01 hacluster02” -advanced –noPromptPassphrase

关闭防火墙

systemctl start firewalld
systemctl stop firewalld
systemctl disable firewalld

查看防火墙状态
systemctl status firewalld.service

关闭防火墙
systemctl stop firewalld.service

查看开机是否启动防火墙
systemctl is-enabled firewalld.service

开机不启动
systemctl disable firewalld.service


firewall-cmd --permanent --add-service=high-availability

# firewall-cmd --permanent --add-service=high-availability

# firewall-cmd --add-service=high-availability

#firewall-cmd --reload
复制

关闭selunux

vi /etc/selinux/config

selinux=disabled

setenforce 0
复制

关闭NetworkManager

sysemctl stop NetworkManager

sysemctl disable NetworkManager

sysemctl status NetworkManager
复制

挂载ISO

mount /home/patrol/rhel-server-7.4-x86_64-dvd.iso /mnt
复制

或者直接在IBMC管理口上加载ISO镜像

然后 mount /dev/sr0 /mnt

配置yum

[root@pcs02 yum.repos.d]# cat ha.repo

[rhel-yum]

name=ha2   ---这个名字要一样，

baseurl=file:///mnt/addons/HighAvailabilit-----ISO里双机软件包的位置

enabled=1

gpgcheck=0

[ha]

name=ha2

baseurl=file:///mnt

enabled=1

gpgcheck=0
复制

软件安装
1 使用yum源可以直接安装corosync pacemaker以及pcs软件：

yum install pcs pacemaker fence-agents-all
--yum install  /iso/addons/HighAvailability/*  -y

yum install lvm2-cluster gfs2-utils   

---我是直接把gfs2-utils这个rpm拷贝到系统里   rpm -ivh gfs2-utils.rpm这样安装的这个包，    lvm2-cluster这个包没有找到。。。没安装 。
复制

2 在安装linux系统时，选定其实在安装操作系统的时候在软件源那儿，选中“HIGH AVAILABILITY ADD-ON”这个选项，安装系统的时候就安装上双机软件了，这些就不用操作了。Hacluster用户也自动有了。

yum install pcs pacemaker fence-agents-all
复制

更改hacluster用户密码
两台机器分别更改hacluster用户密码

passwd hacluster
复制

两台机器密码改成一样的

开启服务
开启pcsd服务，两台都要开启

systemctl start pcsd

systemctl enable pcsd

systemctl start pcsd.service

systemctl enable pcsd.service
复制

验证hacluster用户

pcs cluster auth node1 node2 ... nodeN   --语法

[root@z1 ~]# pcs host auth z1.example.com z2.example.com

[root@pcs01 patrol]# pcs cluster auth pcs01 pcs02

Username: hacluster

Password:

pcs01: Authorized

pcs02: Authorized

[root@pcs01 patrol]#

 

 

[root@pcs02 Packages]# pcs cluster auth pcs01 pcs02

pcs01: Already authorized

pcs02: Already authorized

[root@pcs02 Packages]#

创建cluster
Create a cluster named mycluster with both nodes as cluster members. This command

creates and starts the cluster in one step. You only need to run this from one node in the cluster

because pcs configuration commands take affect for the entire cluster.

On one node in cluster, run the following command.

[root@z1 ~]# pcs cluster setup my_cluster --start  z1.example.com z2.example.com

# pcs cluster setup mycluster --start pcs01 pcs02
复制

软件配WEB登录

https://nodename:2224，或者 nodename换成ip
复制

添加节点、cluster

添加两个节点，然后把两个节点做成一个 cluster

或者pcs cluster setup -name mycluster pcs01 pcs02

添加vip

pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor

interval=30s
复制

Resource id 资源名称随意

IP 浮动IP地址

Nic 浮动ip 绑定到哪块网卡上，默认，或者指定

添加LVM资源配置

Volgrpname   VG名字

添加Filesystem资源配置

Device   文件系统对应的设备  如：/dev/ftpvg/lv_backup

Directory   mount点          如：/reportdata.backup

Fstype   文件系统类型         如： ext4
复制

也可以同时配置资源之间的启停依赖顺序关系。

添加oracle资源配置
其中oracle资源主要包括tns和rdbms instance两种。

添加监听资源
按照如下步骤进行监听资源添加以及测试,图形添加:

type 部分选择oralsnr

resource group 选择分配的组。

resource_id 为资源的名称

sid为tnsnames.ora里面的别名

Home为数据库ORACLE_HOME

User为数据库软件安装用户

tns_admin为$ORACLE_HOME/network/admin
复制

配置完成操作的案例：

数据库资源

按照如下步骤进行数据库资源添加以及测试,图形添加:

Resource_id为资源名称.

SID为数据库实例名称.

Home值为数据库ORACLE_HOME值

User为数据库软件安装用户
复制

以下为实际操作部署的资源信息:

添加 fence设备
节点的IMM管理口 IP 、用户名、密码、

A Red Hat High Availability cluster requires that you configure fencing for the cluster. The

reasons for this requirement are described in Fencing in a Red Hat High Availability Cluster. For

this introduction, however, which is intended to show only how to use the basic Pacemaker

commands, disable fencing by setting the stonith-enabled cluster option to false.

WARNING

The use of stonith-enabled=false is completely inappropriate for a

production cluster. It tells the cluster to simply pretend that failed nodes are

safely powered off.

# pcs property set stonith-enabled=false
复制

Port 节点名

软件管理
资源检查

pcs status

pcs status –full

pcs status resources –full

pcs status resources arch

pcs status resources dsdb

pcs resource show

pcs resource show vip
复制

配置资源优先级

pcs constraint location ora prefers pcs01=150

pcs constraint location ora prefers pcs02=50

pcs constraint location arch prefers pcs01=150

pcs constraint location arch prefers pcs02=50

pcs constraint location data prefers pcs01=150

pcs constraint location data prefers pcs02=50

pcs constraint location VIP prefers pcs01=150

pcs constraint location VIP prefers pcs02=50

pcs constraint location LVM prefers pcs01=150

pcs constraint location LVM prefers pcs02=50

pcs constraint location lsnrctl prefers pcs01=150

pcs constraint location lsnrctl prefers pcs02=50

pcs constraint location database prefers pcs01=150

pcs constraint location database prefers pcs02=50

   禁用与启用资源

pcs resource enable resource_id

pcs resource disable resource_id  --关闭资源,并不再启动资源

     资源组切换

pcs resource clear dsdb

pcs resource move dsdb

pcs resource move dsdb pcs01

pcs resource move dsdb pcs02
复制

其他命令

1、pcs resource move 资源名/资源组名  --把资源/资源组 移动到另一个节点

2、执行 pcs resource clear  

pcs resource cleanup WebSite资源名/资源组名     --清楚此资源在此节点上的 location限制，清楚限制之后，此资源/组，才可以在一次切换到这个节点，否则这个节点不能接收资源/组 了。

 

pcs status                    查看cluster 整体状态

pcs status nodes ftp2          显示ftp2节点状态

pcs status groups ftpgroup     显示资源组状态

pcs config                   显示所有配置信息

pcs resource show --full       显示所有资源信息

 

pcs resource show VIP     查看资源信息

fence_ipmilan -a 10.228.230.98 -l USERID -p PASSW0RD -o status   测试 fence 设备

 

pcs cluster stop [--all] [node] [...]     停止cluster

pcs cluster start                    启动cluster

 

 

使 z1.example.com 進入 standby 模式中

# pcs cluster standby z1.example.com

 

 

z1 從 standby 模式中退出

# pcs cluster unstandby z1.example.com

 
集群模块管理
模块信息如下:

pcs cluster status

如何保障pacemaker corssync以及pcsd均属于enable状态?

systemctl enable corosync pacemaker pcsd
复制

模拟故障
监听故障
模拟监听故障，手工停止监听程序，则cluster软件(资源参数intervals)过一段时间，会自动启动监听程序。

[oracle@pcs01 trace]$ lsnrctl stop listener_dsdb    <--模拟监听故障

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 09-NOV-2018 17:53:44

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521)))

The command completed successfully

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    4393 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    4468 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    4513 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    4516 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    4777 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns    ß 监听程序已经启动

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    5110     1  0 17:53 ?        00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit

oracle    5233 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$ ps -ef|grep tns

root       183     2  0 11月06 ?      00:00:00 [netns]

oracle    5110     1  1 17:53 ?        00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit

oracle    5376 46314  0 17:53 pts/0    00:00:00 grep --color=auto tns

[oracle@pcs01 trace]$
复制

数据库故障
ABORT方式模拟数据库宕库故障

[oracle@pcs01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Nov 12 11:43:05 2018

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown abort

ORACLE instance shut down.

SQL> exit

数据库alert日志如下：

Mon Nov 12 11:44:05 2018

Shutting down instance (abort)

License high water mark = 33

USER (ospid: 8659): terminating the instance

Instance terminated by USER, pid = 8659

Mon Nov 12 11:44:07 2018

Instance shutdown complete

Mon Nov 12 11:46:03 2018

Adjusting the default value of parameter parallel_max_servers

from 1280 to 970 due to the value of parameter processes (1000)

Starting ORACLE instance (normal)

************************ Large Pages Information *******************

Per process system memlock (soft) limit = 13 GB

 

Total Shared Global Region in Large Pages = 0 KB (0%)

 

Large Pages used by this instance: 0 (0 KB)

Large Pages unused system wide = 0 (0 KB)

Large Pages configured system wide = 0 (0 KB)

Large Page size = 2048 KB

 

RECOMMENDATION:

  Total System Global Area size is 20 GB. For optimal performance,

  prior to the next instance restart:

  1. Increase the number of unused large pages by

 at least 10241 (page size 2048 KB, total size 20 GB) system wide to

  get 100% of the System Global Area allocated with large pages

  2. Large pages are automatically locked into physical memory.

 Increase the per process memlock (soft) limit to at least 20 GB to lock

 100% System Global Area's large pages into physical memory

********************************************************************

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 32

Number of processor cores in the system is 16

Number of processor sockets in the system is 4

Picked latch-free SCN scheme 3

Autotune of undo retention is turned on.

IMODE=BR

ILAT =168

LICENSE_MAX_USERS = 0

SYS auditing is disabled

NUMA system with 8 nodes detected

Oracle NUMA support not enabled

The parameter _enable_NUMA_support should be set to TRUE to enable Oracle NUMA support

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options.

ORACLE_HOME = /sjgxpt/oracle/product/11.2/db_1

System name:    Linux

Node name:      pcs01

Release:        3.10.0-693.el7.x86_64

Version:        #1 SMP Thu Jul 6 19:56:57 EDT 2017

Machine:        x86_64

Using parameter settings in server-side spfile /sjgxpt/oracle/product/11.2/db_1/dbs/spfiledsdb.ora

System parameters with non-default values:

  processes                = 1000

  sessions                 = 1536

  sga_target               = 20G

  control_files            = "/sjgxpt/oracle/data/dsdb/control01.ctl"

  control_files            = "/sjgxpt/oracle/data/dsdb/control02.ctl"

  db_block_size            = 8192

  compatible               = "11.2.0.4.0"

  log_archive_dest_1       = "LOCATION=/sjgxpt/oracle/arch"

  log_archive_format       = "%t_%s_%r.dbf"

  undo_tablespace          = "UNDOTBS1"

  remote_login_passwordfile= "EXCLUSIVE"

  db_domain                = ""

  dispatchers              = "(PROTOCOL=TCP) (SERVICE=dsdbXDB)"

  local_listener           = "(DESCRIPTION_LIST =

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = TCP)(HOST =10.228.110.133)(PORT = 1521))

      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))

    )

  )"

  java_jit_enabled         = TRUE

  audit_file_dest          = "/sjgxpt/oracle/product/admin/dsdb/adump"

  audit_trail              = "DB"

  db_name                  = "dsdb"

  open_cursors             = 300

  pga_aggregate_target     = 12875M

  diagnostic_dest          = "/sjgxpt/oracle/product"

Mon Nov 12 11:46:03 2018

PMON started with pid=2, OS id=16298

Mon Nov 12 11:46:04 2018

PSP0 started with pid=3, OS id=16300

Mon Nov 12 11:46:05 2018

VKTM started with pid=4, OS id=16309 at elevated priority

VKTM running at (1)millisec precision with DBRM quantum (100)ms

Mon Nov 12 11:46:05 2018

GEN0 started with pid=5, OS id=16313

Mon Nov 12 11:46:05 2018

DIAG started with pid=6, OS id=16315

Mon Nov 12 11:46:05 2018

DBRM started with pid=7, OS id=16317

Mon Nov 12 11:46:05 2018

DIA0 started with pid=8, OS id=16319

Mon Nov 12 11:46:05 2018

MMAN started with pid=9, OS id=16321

Mon Nov 12 11:46:05 2018

DBW0 started with pid=10, OS id=16323

Mon Nov 12 11:46:05 2018

DBW1 started with pid=11, OS id=16325

Mon Nov 12 11:46:05 2018

DBW2 started with pid=12, OS id=16327

Mon Nov 12 11:46:05 2018

DBW3 started with pid=13, OS id=16329

Mon Nov 12 11:46:05 2018

LGWR started with pid=14, OS id=16331

Mon Nov 12 11:46:05 2018

CKPT started with pid=15, OS id=16333

Mon Nov 12 11:46:05 2018

SMON started with pid=16, OS id=16335

Mon Nov 12 11:46:05 2018

RECO started with pid=17, OS id=16337

Mon Nov 12 11:46:05 2018

MMON started with pid=18, OS id=16339

Mon Nov 12 11:46:05 2018

MMNL started with pid=19, OS id=16341

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

starting up 1 shared server(s) ...

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /sjgxpt/oracle/product

Mon Nov 12 11:46:05 2018

ALTER DATABASE   MOUNT

Successful mount of redo thread 1, with mount id 605261437

Database mounted in Exclusive Mode

Lost write protection disabled

Completed: ALTER DATABASE   MOUNT

Mon Nov 12 11:46:09 2018

alter database open

Beginning crash recovery of 1 threads

 parallel recovery started with 31 processes

Started redo scan

Completed redo scan

 read 47 KB redo, 30 data blocks need recovery

Started redo application at

 Thread 1: logseq 250, block 12222

Recovery of Online Redo Log: Thread 1 Group 1 Seq 250 Reading mem 0

  Mem# 0: /sjgxpt/oracle/data/dsdb/redo01.log

Completed redo application of 0.02MB

Completed crash recovery at

 Thread 1: logseq 250, block 12317, scn 4277695

 30 data blocks read, 30 data blocks written, 47 redo k-bytes read

LGWR: STARTING ARCH PROCESSES

Mon Nov 12 11:46:10 2018

ARC0 started with pid=54, OS id=16641

ARC0: Archival started

LGWR: STARTING ARCH PROCESSES COMPLETE

ARC0: STARTING ARCH PROCESSES

Thread 1 advanced to log sequence 251 (thread open)

Mon Nov 12 11:46:10 2018

ARC1 started with pid=55, OS id=16667

Mon Nov 12 11:46:10 2018

ARC2 started with pid=56, OS id=16683

Mon Nov 12 11:46:10 2018

ARC3 started with pid=57, OS id=16697

ARC1: Archival started

ARC2: Archival started

ARC1: Becoming the 'no FAL' ARCH

ARC1: Becoming the 'no SRL' ARCH

ARC2: Becoming the heartbeat ARCH

Thread 1 opened at log sequence 251

  Current log# 2 seq# 251 mem# 0: /sjgxpt/oracle/data/dsdb/redo02.log

Successful open of redo thread 1

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

SMON: enabling cache recovery

ARC3: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

Archived Log entry 244 added for thread 1 sequence 250 ID 0x23e8dee0 dest 1:

[16498] Successfully onlined Undo Tablespace 2.

Undo initialization finished serial:0 start:501291664 end:501291704 diff:40 (0 seconds)

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed

SMON: enabling tx recovery

Database Characterset is AL32UTF8

No Resource Manager plan active

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

Mon Nov 12 11:46:10 2018

QMNC started with pid=58, OS id=16794

Completed: alter database open

Starting background process CJQ0

Mon Nov 12 11:46:11 2018

CJQ0 started with pid=59, OS id=16892
复制

主机故障
主机重启方式进行模拟：

[root@pcs01 ~]# reboot

Connection to 10.228.110.126 closed by remote host.

Connection to 10.228.110.126 closed.
复制

软件资源维护
更新psu以及jvm补丁
如果需要对oracle数据库进行维护，包括停止数据库或者监听等需要采用维护模式，避免资源重启以及切换：

进入维护模式：pcs resource unmanage resource1
解除维护模式 pcs resource manage resource1

资源状态察看pcs状态

[root@pcs01 ~]# pcs status

Cluster name: cluster1

Stack: corosync

Current DC: pcs02 (version 1.1.16-12.el7-94ff4df) - partition with quorum

Last updated: Thu May 30 10:07:00 2019

Last change: Wed Apr  3 15:40:08 2019 by hacluster via crmd on pcs01

 

2 nodes configured

9 resources configured

Online: [ pcs01 pcs02 ]

 

Full list of resources:

 

 ibmca  (stonith:fence_ipmilan):        Stopped

 ibmcb  (stonith:fence_ipmilan):        Started pcs01

 Resource Group: dsdb

     VIP        (ocf::heartbeat:IPaddr):        Started pcs01

     LVM        (ocf::heartbeat:LVM):   Started pcs01

     orahome    (ocf::heartbeat:Filesystem):    Started pcs01

     arch       (ocf::heartbeat:Filesystem):    Started pcs01

     data       (ocf::heartbeat:Filesystem):    Started pcs01

     database   (ocf::heartbeat:oracle):        Started pcs01

     lsnrctl    (ocf::heartbeat:oralsnr):       Started pcs01

 

Failed Actions:

* ibmca_start_0 on pcs02 'unknown error' (1): call=239, status=Timed Out, exitreason='none',

    last-rc-change='Wed Apr  3 14:42:27 2019', queued=0ms, exec=20177ms

* ibmcb_monitor_60000 on pcs01 'unknown error' (1): call=182, status=Timed Out, exitreason='none',

    last-rc-change='Wed May 29 19:30:59 2019', queued=0ms, exec=20136ms

 

 

Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

[root@pcs01 ~]#
复制

将节点置为维护模式:

[oracle@pcs01 20190415psu_11.2.0.4]$ lsnrctl stop listener_dsdb

 

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 30-MAY-2019 10:10:11

 

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521)))

The command completed successfully

[oracle@pcs01 20190415psu_11.2.0.4]$

[oracle@pcs01 29141056]$ opatch apply
 

Jvm

Opatch apply

Startup upgrade
复制

取消维护模式.

pcs resource clear dsdb

pcs resource move dsdb pcs01

pcs resource move dsdb pcs02
复制

oracle

最后修改时间：2020-04-17 09:37:47

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

在 LINUX 下通过 PCS配置 Oracle 的 HA 架构模式

评论

相关阅读