暂无图片
暂无图片
4
暂无图片
暂无图片
暂无图片

PCS+Oracle HA实战安装配置参考

原创 jieguo 2022-08-16
3584

1.实现目标:

两台主机的oracle数据库存放在共享磁盘上,通过pcs实现oracle ha自动主备切换,任意一台主机出现宕机故障,另外一台主机可自动实现快速接管,最大限度保障业务的连续性运行。
image.png
(您不再需要使用linux cluster、roseha等复杂的HA软件,PCS实现的效果完全可满足一般HA需求;本文虽然写的是oracle ha,但其同样适用于其它数据库mysql/pg等和其它应用软件,关键点就是两台主机配置相同的用户和环境变量,数据库或应用安装在共享存储上即可)
image.png

2.环境准备:

image.png

操作系统 Oracle Linux7.9
root密码 secure_password
hacluster密码 secure_password
数据库版本11.2.0.4
数据库名称:orcl
system/sys密码oracle

其中:共享磁盘/dev/sdb通过lvm方式划分逻辑卷/dev/vg01/lvol01并挂接xfs类型的/u01文件系统,用来安装oracle数据库。

[root@pcs01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.52.191 pcs01
192.168.52.192 pcs02

systemctl disable firewalld
systemctl stop firewalld
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' etc/selinux/config
setenforce 0

时间同步:
设置时间同步

yum install -y chrony
systemctl enable chronyd
systemctl start chronyd
systemctl status chronyd

添加时间同步服务器
vi /etc/chrony.conf
server 时钟服务器IP iburst
重启chronyd服务
systemctl restart chronyd.service
检查同步是否正常
chronyc sources –v
timedatectl

3.安装步骤:

3.1安装pcs软件:(所有节点)

其中:pcs为CRM的管理接口工具,pacemaker为集群资源管理器(Cluster Resource Management),corosync为集群消息事务层(Massage Layer)。
yum -y install pcs
离线情况下,需要配置本地yum源:

[redhat7.9]
name = redhat 7.9
baseurl=file:///mnt
gpgcheck=0
enabled=1

[HighAvailability]
name=HighAvailability
baseurl=file:///mnt/addons/HighAvailability
gpgcheck=0
enabled=1

[ResilientStorage]
name=ResilientStorage
baseurl=file:///mnt/addons/ResilientStorage
gpgcheck=0
enabled=1

systemctl start pcsd.service
systemctl enable pcsd.service

3.2配置集群用户和互信权限:(1节点)

echo secure_password | passwd --stdin hacluster
pcs cluster auth pcs01 pcs02 -u hacluster -p secure_password

3.3安装集群:(1节点)

pcs cluster setup --start --name cluster01 pcs01 pcs02
检查状态:忽略stonith的警告。
pcs status
启动集群:

pcs cluster enable --all
pcs cluster status
pcs property set stonith-enabled=false
pcs status

3.4创建虚拟服务IP:(1节点)

pcs resource create virtualip IPaddr2 ip=192.168.52.190 cidr_netmask=24 nic=eth0 op monitor interval=10s
pcs status
通过如下方式确认ip正常挂接并可用,核查网卡名称比如eth0/ens32:
ip a
ping -c 2 192.168.52.190
ip addr show dev ens32
测试IP切换:
pcs resource move virtualip pcs02

3.5创建共享磁盘卷组(1节点)

vgcreate vg01 /dev/sdb
vgdisplay|grep Free
lvcreate -n lvol01 -l 2598 vg01(根据柱面数划逻辑卷大小)
lvcreate -n lvol01 -L 9G vg01(直接分配大小,存在浪费空间问题)
mkfs -t xfs /dev/vg01/lvol01
mkdir /u01
systemctl daemon-reload
mount -t xfs /dev/vg01/lvol01 /u01
df -Th /u01

vi /etc/lvm/lvm.conf找到volume_list =修改volume_list = [],最终可能需改成volume_list = [ "ol" ],其中ol为本地vg(通过vgs等核查本地磁盘卷组排除掉)
egrep -v "#|^$" /etc/lvm/lvm.conf
 
lvmconf --enable-halvm --services --startstopservices

3.5.1创建卷组资源:

pcs resource create vg01 LVM volgrpname=vg01 exclusive=true
pcs resource show
pcs status
pcs resource move vg01 pcs02
pcs status

3.6创建文件系统资源:

pcs resource create u01 Filesystem device="/dev/vg01/lvol01" directory="/u01" fstype=“xfs”
pcs status
把资源加入oracle组
pcs resource group add oracle virtualip vg01 u01
pcs status
测试资源启停:
pcs cluster standby pcs01
pcs cluster unstandby pcs01

3.6.1在/u01文件系统安装oracle数据库:

系统参数:(所有节点)
vi /etc/sysctl.conf

fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 64424509440 ##小于物理内存
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500

执行生效sysctl -p

vi /etc/profile(所有节点)


if [ $USER = "oracle" ]; then
        if [ $SHELL = "/bin/ksh" ]; then
                ulimit -p 16384
                ulimit -n 65536
        else
                ulimit -u 16384 -n 65536
        fi
fi

执行生效:source /etc/profile

在 /etc/security/limits.conf 文件中添加:(所有节点)

oracle soft nofile 10240
oracle hard nofile 65536
oracle soft nproc 16384
oracle hard nproc 16384
oracle soft stack 10240
oracle hard stack 32768
oracle hard memlock unlimited
oracle soft memlock unlimited

安装包:(所有节点)

yum -y install binutils compat-libstdc++-33 gcc gcc-c++ glibc glibc-common glibc-devel ksh libaio libaio-devel libgcc libstdc++ libstdc++-devel make sysstat openssh-clients compat-libcap1 xorg-x11-utils xorg-x11-xauth elfutils unixODBC unixODBC-devel libXp elfutils-libelf elfutils-libelf-devel smartmontools unzip

建用户和组:(所有节点)

groupadd -g 54321 oinstall
groupadd -g 54322 dba
groupadd -g 54323 oper
useradd -u 54321 -g oinstall -G dba,oper oracle

目录权限(1节点)

mkdir -p /u01/db
mkdir -p /u01/soft
chown -R oracle:oinstall /u01
chmod -R 755 /u01

环境变量:(所有节点)

su - oracle
vi .bash_profile
export ORACLE_BASE=/u01/db/oracle
export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/dbhome_1
export ORACLE_SID=orcl
export LANG=en_US.UTF-8
export NLS_LANG=american_america.ZHS16GBK
export NLS_DATE_FORMAT="yyyy-mm-dd hh24:mi:ss"
export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin:$ORACLE_HOME/OPatch
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/common/oracle/bin:/u01/oracle/run
export ORACLE_TERM=xterm
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
umask 022
export TMOUT=0

安装软件:(1节点)

vi /etc/oraInst.loc
inventory_loc=/u01/db/oraInventory
inst_group=oinstall
./runInstaller -silent -debug -force -noconfig -IgnoreSysPreReqs \
FROM_LOCATION=/u01/soft/database/stage/products.xml \
oracle.install.option=INSTALL_DB_SWONLY \
UNIX_GROUP_NAME=oinstall \
INVENTORY_LOCATION=/u01/db/oraInventory \
ORACLE_HOME=/u01/db/oracle/product/11.2.0/dbhome_1 \
ORACLE_HOME_NAME="Oracle11g" \
ORACLE_BASE=/u01/db/oracle \
oracle.install.db.InstallEdition=EE \
oracle.install.db.isCustomInstall=false \
oracle.install.db.DBA_GROUP=dba \
oracle.install.db.OPER_GROUP=dba \
DECLINE_SECURITY_UPDATES=true

建库:(1节点)

cd /u01/db/oracle/product/11.2.0/dbhome_1/assistants/dbca/templates
dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbname orcl -sid orcl -sysPassword oracle -systemPassword oracle -responseFile NO_VALUE -datafileDestination /u01/db/oracle/oradata -redoLogFileSize 200 -recoveryAreaDestination NO_VALUE -storageType FS -characterSet ZHS16GBK -nationalCharacterSet AL16UTF16 -sampleSchema false -memoryPercentage 60 -databaseType OLTP -emConfiguration NONE

创建监听:(1节点)

netca -silent -responsefile /u01/db/oracle/product/11.2.0/dbhome_1/assistants/netca/netca.rsp

服务名和静态监听修改(注意红色部分):(1节点)

[oracle@pcs02 ~]$ cd $ORACLE_HOME/network/admin
[oracle@pcs02 admin]$ more listener.ora 
# listener.ora Network Configuration File: /u01/db/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.52.190)(PORT = 1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = orcl)
      (ORACLE_HOME = /u01/db/oracle/product/11.2.0/dbhome_1)
      (SID_NAME = orcl)
    )
  )

ADR_BASE_LISTENER = /u01/db/oracle

[oracle@pcs02 admin]$ more tnsnames.ora 
# tnsnames.ora Network Configuration File: /u01/db/oracle/product/11.2.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.52.190)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
    )
  )

注意tnsnames的服务名ORCL必须与sid名称一致,否则当前节点的pcs服务会当掉发生切换,并且集群切换到另外一个节点,监听和数据库服务stop,如下图:
image.png
image.png
image.png
修改正确后,两节点都重启systemctl restart pacemaker恢复正常。

调优基本参数:(1节点)

alter profile default limit failed_login_attempts unlimited;
alter profile default limit password_life_time unlimited;
alter system set audit_trail=none scope=spfile sid='*';
alter system set recyclebin=off scope=spfile sid='*';
alter system set sga_target=2000M scope=spfile sid='*';
alter system set pga_aggregate_target=500M sid='*';

拷贝节点1文件到节点2:

scp -p /etc/oraInst.loc pcs02:/etc/
scp -p /etc/oratab pcs02:/etc/
scp -p /usr/local/bin/coraenv pcs02:/usr/local/bin/
scp -p /usr/local/bin/dbhome pcs02:/usr/local/bin/
scp -p /usr/local/bin/oraenv pcs02:/usr/local/bin/

3.7创建监听资源:(1节点)

pcs resource create listener_orcl oralsnr sid="orcl" listener="listener" --group=oracle
pcs status

3.8创建oracle db资源:(1节点)

pcs resource create orcl oracle sid=“orcl” --group=oracle
pcs status

3.9定义资源依赖(1节点)

pcs constraint colocation add vg01 with virtualip
pcs constraint colocation add u01 with vg01
pcs constraint colocation add listener with u01
pcs constraint colocation add orcl with listener

3.10定义资源启动顺序(1节点)

pcs constraint order start virtualip then vg01
pcs constraint order start vg01 then start u01
pcs constraint order start u01 then start listener
pcs constraint order start listener then start orcl

查看所有依赖:

[root@pcs01 ~]# pcs constraint show --full
Location Constraints:
  Resource: vg01
    Enabled on: pcs02 (score:INFINITY) (role: Started) (id:cli-prefer-vg01)
  Resource: virtualip
    Enabled on: pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-virtualip)
Ordering Constraints:
  start virtualip then start vg01 (kind:Mandatory) (id:order-virtualip-vg01-mandatory)
  start vg01 then start u01 (kind:Mandatory) (id:order-vg01-u01-mandatory)
  start u01 then start listener (kind:Mandatory) (id:order-u01-listener-mandatory)
  start listener then start orcl (kind:Mandatory) (id:order-listener-orcl-mandatory)
Colocation Constraints:
  vg01 with virtualip (score:INFINITY) (id:colocation-vg01-virtualip-INFINITY)
  u01 with vg01 (score:INFINITY) (id:colocation-u01-vg01-INFINITY)
  listener with u01 (score:INFINITY) (id:colocation-listener-u01-INFINITY)
  orcl with listener (score:INFINITY) (id:colocation-orcl-listener-INFINITY)
Ticket Constraints:

3.11安装fence devices

3.11.1sbd方式

此处采有共享磁盘sbd方式,磁盘只需100M大小足够(理论上>4M即可)

#pcs property | grep stonith-enabled

#pcs property set stonith-enabled=true

#yum install fence-agents-ipmilan fence-agents-sbd fence-agents-drac5  (all nodes)
 
Configure softdog as a watchdog device and start automatic at boot time (all nodes):
# yum install -y watchdog sbd

Change the SBD configuration SBD_DEVICE to point to the shared disk (all nodes):
# echo softdog > /etc/modules-load.d/softdog.conf
# /sbin/modprobe softdog

Create the SBD device (just in one node)
# vi /etc/sysconfig/sbd
Change:
SBD_DEVICE="/dev/sdc" ? # /dev/sdc is the shared disk
SBD_OPTS="-n node1" ? ?? # if cluster node name is different from hostname this option must be used
++参考:
[root@pcs01 ~]# cat /etc/sysconfig/sbd|egrep -v "#|^$"
SBD_DEVICE="/dev/sdc"
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=no
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_OPTS=
++
Enable SBD service (all nodes):
#pcs stonith sbd device setup --device=/dev/sdc

The Pacemaker STONITH fence can be created (just one node):
# systemctl enable --now sbd
SBD is configured. 
# pcs stonith create sbd_fencing fence_sbd devices=/dev/sdc
To test is the SBD is working:
# pcs stonith fence pcs02
node2 should be rebooted.

3.11.2idrac方式

This article explains how to configure fencing on a Dell physical server, which is the most commonly used server in NetEye 4 installations. A fencing configuration is not required for voting-only cluster nodes or for elastic-only nodes as they are not part of the PCS cluster.
Configuring iDRAC
Dell Remote Access Controller (iDRAC) is a hardware component located on the motherboard which provides both a web interface and a command line interface to perform remote management tasks.
Before beginning, you should properly configure IPMI settings (Intelligent Platform Management Interface) and create a new account.
You can access the iDRAC web interface and enable IPMI access Over Lan at: iDRAC Settings > Connectivity > Network > IPMI Settings:
 
Then create a new user with the username and password of your choice, read-only privileges for the console, and administrative privileges on IPMI.
  
Please note that you must replicate this configuration on each physical server.
Install Fence Devices
Next you need to install ipmilan fence devices on each server in order to use fencing on Dell servers:
yum install fence-agents-ipmilan
Now you will be able to find several new fence devices including fence_iDRAC and show its properties:
pcs stonith list
pcs stonith describe fence_idrac
Test that the iDRAC interface is reachable using the default port 623:
nmap -sU -p623 <idrac_ip>
Finally you can safely test your configuration by printing the chassis status on each node remotely.
ipmitool -I lanplus -H <iDRAC IP> -U <your_IPMI_username> -P <your_IPMI_password> -y <your_encryption_key> -v chassis status
Configuring PCS
Fencing can be enabled by setting the property called stonith, which is an acronym for Shoot-The-Other-Node-In-The-Head. Disable stonith until fencing is correctly configured in order to avoid any issues during the procedure:
pcs property set stonith-enabled=false
pcs stonith cleanup
At this point you can create a stonith resource for each node. In a 2-node cluster it may happen that both nodes are unable to contact each other and then each node tries to fence the other one. But you can’t reboot both nodes at the same time since that will result in downtime and possibly harm cluster integrity. To avoid this you need to configure a different delay (e.g., one without delay, and the other one with at least a 5 second delay). To ensure the safety of your cluster, you should set the reboot method to “cycle“ instead of “onoff”.
pcs stonith create fence_node1 fence_iDRAC ipaddr="<iDRAC ip or fqdn>" "delay=0" lanplus="1" login="IPMI_username" passwd_script="IPMI_password" method="cycle" pcmk_host_list="node1.neteyelocal"
pcs stonith create fence_node2 fence_iDRAC ipaddr="<iDRAC ip or fqdn>" "delay=5" lanplus="1" login="IPMI_username" passwd_script="IPMI_password" method="cycle" pcmk_host_list="node2.neteyelocal"
You should set up a password script instead of directly using your password, for instance with a very simple bash script like the one below. The script should be readable only by the root user, preventing your iDRAC password from being extracted from the PCS resource. You should place this script in /usr/local/bin/ allowing you to invoke it as a regular command:
#! /bin/bash
echo “my_secret_psw“
If everything has been properly configured, then running pcs status should show the fence device with status Started.
To prevent unwanted fencing in the event of minor network outages, increase the totem token timeout to at least 5 seconds by editing /etc/corosync/corosync.conf as follows:
totem {
    version: 2
    cluster_name: neteye
    secauth: off
    transport: udpu
    token: 5000  
}
then sync this config file to all other cluster nodes and reload corosync:
pcs cluster sync
pcs cluster reload corosync
Unwanted fencing might happen also when a node “commit suicide”, i.e., shut itself down because it was not able to contact the other node of the cluster. This is an unwanted situation because all nodes of a cluster might be fenced at the same time. To avoid this you should set a constraint to prevent a node’s stonith resource from running on the cluster node itself:
pcs constraint location fence_node1 avoids node1.neteyelocal
Now that fencing is configured, you only need to set the stonith property to true to enable it:
pcs property set stonith-enabled=true
pcs stonith cleanup
Always remember to temporarily disable fencing during updates/upgrades.

3.12图形界面控制台:

netstat -tunlp|grep LISTEN|grep 2224
https://192.168.52.191:2224 建议用谷歌浏览器
hacluster/secure_password
image.png
image.png
image.png
image.png
image.png
image.png

3.13主机宕机测试

crm_mon或pcs status观察
reboot或shutdown -h now重启某一台
pcs status观察
df -h
su - oracle
sqlplus system/oracle@orcl测试连接

任意重启一台机器,pcs resource均可正常切换。
但如果同时关闭了两台主机,然后再起其中任意一台(另外一台保持关闭状态,模拟无法修复启动),那么起来的那台资源resource显示一直是stopped状态。
此时只能手工强制启动资源。
操作步骤如下:
pcs resource
根据上述结果的顺序依赖关系依次启动资源
pcs resource debug-start xxx
解决办法:
image.png
[root@jycdb01 ~]# pcs property set stonith-enabled=true
[root@jycdb01 ~]# pcs property show
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: jycdb_cluster
dc-version: 1.1.19-8.el7-c3c624ea3d
have-watchdog: true
last-lrm-refresh: 1727425883
maintenance-mode: false
no-quorum-policy: stop
stonith-enabled: true
相关参考:
https://docs.redhat.com/zh_hans/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_high_availability_clusters/ref_general-fence-device-properties-configuring-fencing
https://docs.oracle.com/en/operating-systems/oracle-linux/8/availability/availability-AboutFencingConfigurationstonith.html#fencing-examples
https://blog.csdn.net/jycjyc/article/details/142621356
参考操作日志如下:

[root@pcs01 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: NONE
Last updated: Wed Aug 17 10:03:39 2022
Last change: Wed Aug 17 09:49:18 2022 by root via cibadmin on pcs01

2 nodes configured
6 resource instances configured

Node pcs01: UNCLEAN (offline)
Node pcs02: UNCLEAN (offline)

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Stopped
     vg01       (ocf::heartbeat:LVM):   Stopped
     u01        (ocf::heartbeat:Filesystem):    Stopped
     listener   (ocf::heartbeat:oralsnr):       Stopped
     orcl       (ocf::heartbeat:oracle):        Stopped
 sbd_fencing    (stonith:fence_sbd):    Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  sbd: active/enabled
[root@pcs01 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: pcs01 (version 1.1.23-1.0.1.el7_9.1-9acf116022) - partition WITHOUT quorum
Last updated: Thu Aug 18 09:14:13 2022
Last change: Wed Aug 17 10:05:26 2022 by root via cibadmin on pcs01

2 nodes configured
6 resource instances configured

Node pcs02: UNCLEAN (offline)
Online: [ pcs01 ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Stopped
     vg01       (ocf::heartbeat:LVM):   Stopped
     u01        (ocf::heartbeat:Filesystem):    Stopped
     listener   (ocf::heartbeat:oralsnr):       Stopped
     orcl       (ocf::heartbeat:oracle):        Stopped
 sbd_fencing    (stonith:fence_sbd):    Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  sbd: active/enabled
[root@pcs01 ~]# journalctl |grep -i error
Aug 18 09:06:08 localhost.localdomain kernel: RAS: Correctable Errors collector initialized.
Aug 18 09:06:13 pcs01 corosync[1267]:  [QB    ] Error in connection setup (/dev/shm/qb-1267-1574-30-pR9ltr/qb): Broken pipe (32)
[root@pcs01 ~]# corosync-cmapctl |grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.52.191) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
[root@pcs01 ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pcs01 (local)
[root@pcs01 ~]# pcs status pcsd
  pcs01: Online
  pcs02: Offline
[root@pcs01 ~]# pcs resource debug-start virtualip
Operation start for virtualip (ocf:heartbeat:IPaddr2) returned: 'ok' (0)
 >  stderr: Aug 17 10:03:56 INFO: Adding inet address 192.168.52.190/24 with broadcast address 192.168.52.255 to device ens32
 >  stderr: Aug 17 10:03:56 INFO: Bringing device ens32 up
 >  stderr: Aug 17 10:03:56 INFO: /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.52.190 ens32 192.168.52.190 auto not_used not_used
[root@pcs01 ~]# pcs resource debug-start vg01
Operation start for vg01 (ocf:heartbeat:LVM) returned: 'ok' (0)
 >  stdout: volume_list=[]
 >  stdout:   Volume group "vg01" successfully changed
 >  stdout: volume_list=[]
 >  stderr: Aug 17 10:04:05 WARNING: Disable lvmetad in lvm.conf. lvmetad should never be enabled in a clustered environment. Set use_lvmetad=0 and kill the lvmetad process
 >  stderr: Aug 17 10:04:05 INFO: Activating volume group vg01
 >  stderr: Aug 17 10:04:06 INFO:  Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 Found volume group "vg01" using metadata type lvm2 
 >  stderr: Aug 17 10:04:06 INFO: New tag "pacemaker" added to vg01
 >  stderr: Aug 17 10:04:06 INFO:  1 logical volume(s) in volume group "vg01" now active 
[root@pcs01 ~]# pcs resource debug-start u01
Operation start for u01 (ocf:heartbeat:Filesystem) returned: 'ok' (0)
 >  stderr: Aug 17 10:04:13 INFO: Running start for /dev/vg01/lvol01 on /u01
[root@pcs01 ~]# pcs resource debug-start listener
Operation start for listener (ocf:heartbeat:oralsnr) returned: 'ok' (0)
 >  stderr: Aug 17 10:04:20 INFO: Listener listener running: 
 >  stderr: LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 17-AUG-2022 10:04:18
 >  stderr: 
 >  stderr: Copyright (c) 1991, 2013, Oracle.  All rights reserved.
 >  stderr: 
 >  stderr: Starting /u01/db/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
 >  stderr: 
 >  stderr: TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 >  stderr: System parameter file is /u01/db/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
 >  stderr: Log messages written to /u01/db/oracle/diag/tnslsnr/pcs01/listener/alert/log.xml
 >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.52.190)(PORT=1521)))
 >  stderr: 
 >  stderr: Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
 >  stderr: STATUS of the LISTENER
 >  stderr: ------------------------
 >  stderr: Alias                     listener
 >  stderr: Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 >  stderr: Start Date                17-AUG-2022 10:04:19
 >  stderr: Uptime                    0 days 0 hr. 0 min. 0 sec
 >  stderr: Trace Level               off
 >  stderr: Security                  ON: Local OS Authentication
 >  stderr: SNMP                      OFF
 >  stderr: Listener Parameter File   /u01/db/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
 >  stderr: Listener Log File         /u01/db/oracle/diag/tnslsnr/pcs01/listener/alert/log.xml
 >  stderr: Listening Endpoints Summary...
 >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.52.190)(PORT=1521)))
 >  stderr: Services Summary...
 >  stderr: Service "orcl" has 1 instance(s).
 >  stderr:   Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
 >  stderr: The command completed successfully
 >  stderr: Last login: Wed Aug 17 09:58:46 CST 2022
[root@pcs01 ~]# pcs resource debug-start orcl
Operation start for orcl (ocf:heartbeat:oracle) returned: 'ok' (0)
 >  stderr: Aug 17 10:04:31 INFO: Oracle instance orcl started: 
[root@pcs01 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: pcs01 (version 1.1.23-1.0.1.el7_9.1-9acf116022) - partition WITHOUT quorum
Last updated: Wed Aug 17 10:04:37 2022
Last change: Wed Aug 17 09:49:18 2022 by root via cibadmin on pcs01

2 nodes configured
6 resource instances configured

Node pcs02: UNCLEAN (offline)
Online: [ pcs01 ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Stopped
     vg01       (ocf::heartbeat:LVM):   Stopped
     u01        (ocf::heartbeat:Filesystem):    Stopped
     listener   (ocf::heartbeat:oralsnr):       Stopped
     orcl       (ocf::heartbeat:oracle):        Stopped
 sbd_fencing    (stonith:fence_sbd):    Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  sbd: active/enabled
[root@pcs01 ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 3.8G     0  3.8G   0% /dev
tmpfs                    3.8G   65M  3.7G   2% /dev/shm
tmpfs                    3.8G  8.7M  3.8G   1% /run
tmpfs                    3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/mapper/ol-root       26G  7.1G   19G  28% /
/dev/sda1               1014M  184M  831M  19% /boot
tmpfs                    768M     0  768M   0% /run/user/0
/dev/mapper/vg01-lvol01   10G  6.2G  3.9G  62% /u01
[root@pcs01 ~]# su - oracle
Last login: Wed Aug 17 10:04:31 CST 2022
[oracle@pcs01 ~]$ sqlplus system/oracle@orcl

SQL*Plus: Release 11.2.0.4.0 Production on Wed Aug 17 10:04:55 2022

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> exit

4.可能的问题:如果涉及存储多路径问题,需要保证做了聚合成一个磁盘,否则创建lv会失败:

image.png
处理:安装多路径软件做磁盘聚合
image.png
可参考:https://blog.csdn.net/weixin_41607523/article/details/126540525?spm=1001.2014.3001.5502
处理:
修改volume_list = []里边不要空格
image.png
问题:
image.png
处理:状态都正常,但存在历史错误信息,想清理掉:尝试pcs stonith cleanup未解决,最终systemctl stop pacemaker两台都停,然后都起systemctl start pacemaker,信息清理完毕。参考https://www.suse.com/support/kb/doc/?id=000019816

5.参考文档:

Pacemaker configuration for an Oracle database and its listener
https://blog.yannickjaquier.com/linux/pacemaker-configuration-oracle-database.html
Configuring Fencing on Dell Servers
https://www.neteye-blog.com/2020/06/configuring-fencing-on-dell-servers/
pacemaker搭建oracle ha
https://cdn.modb.pro/db/66956
时间同步:
https://www.xiexianbin.cn/linux/softwares/2016-02-08-chrony/index.html?to_index=1
独占启用(exclusive activation)丛集中的卷册群组
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/s1-exclusiveactivenfs-haaa
oracle11g快速安装参考:
https://blog.csdn.net/jycjyc/article/details/103198741
Centos7.6加pcs搭建高可用数据库集群
https://www.cnblogs.com/monkey6/p/14890292.html
pcs常用命令:
https://blog.csdn.net/hhhh2012/article/details/48313909
Dell Drac 5
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/6/html/fence_configuration_guide/s1-software-fence-drac5-ca#tb-software-fence-drac5-CA
统信ha搭建和部分命令手册(非原创)
https://blog.csdn.net/m0_47670786/article/details/123382132
Active-Passive Cluster for Near HA Using Pacemaker, DRBD, Corosync and MySQL
https://houseofbrick.com/blog/active-passive-cluster-for-near-ha-using-pacemaker-drbd-corosync-and-mysql/
[命令] Pacemaker 命令 pcs resource (管理资源)
https://eternalcenter.com/pcs-resource/
Rhel 7 pcs 实现oracle 12c高可用
https://blog.csdn.net/solore/article/details/106492348
ssh互信脚本参考:(取自oracle12c软件包,此处并不需要配置互信,互信只是为了两台主机直接拷贝文件免密方便而已)
命令参考:./sshUserSetup.sh -user root -hosts “pcs01 pcs02” -advanced –noPromptPassphrase
根据提示输入密码和yes即可,然后验证:
ssh pcs01 date
ssh pcs02 date

#!/bin/sh
# Nitin Jerath - Aug 2005
#Usage sshUserSetup.sh  -user <user name> [ -hosts \"<space separated hostlist>\" | -hostfile <absolute path of cluster configuration file> ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile <desired absolute path of logfile> ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]
#eg. sshUserSetup.sh -hosts "host1 host2" -user njerath -advanced
#This script is used to setup SSH connectivity from the host on which it is
# run to the specified remote hosts. After this script is run, the user can use # SSH to run commands on the remote hosts or copy files between the local host
# and the remote hosts without being prompted for passwords or confirmations.
# The list of remote hosts and the user name on the remote host is specified as 
# a command line parameter to the script. Note that in case the user on the 
# remote host has its home directory NFS mounted or shared across the remote 
# hosts, this script should be used with -shared option. 
#Specifying the -advanced option on the command line would result in SSH 
# connectivity being setup among the remote hosts which means that SSH can be 
# used to run commands on one remote host from the other remote host or copy 
# files between the remote hosts without being prompted for passwords or 
# confirmations.
#Please note that the script would remove write permissions on the remote hosts
#for the user home directory and ~/.ssh directory for "group" and "others". This
# is an SSH requirement. The user would be explicitly informed about this by teh script and prompted to continue. In case the user presses no, the script would exit. In case the user does not want to be prompted, he can use -confirm option.
# As a part of the setup, the script would use SSH to create files within ~/.ssh
# directory of the remote node and to setup the requisite permissions. The 
#script also uses SCP to copy the local host public key to the remote hosts so
# that the remote hosts trust the local host for SSH. At the time, the script 
#performs these steps, SSH connectivity has not been completely setup  hence
# the script would prompt the user for the remote host password. 
#For each remote host, for remote users with non-shared homes this would be 
# done once for SSH and  once for SCP. If the number of remote hosts are x, the 
# user would be prompted  2x times for passwords. For remote users with shared 
# homes, the user would be prompted only twice, once each for SCP and SSH.
#For security reasons, the script does not save passwords and reuse it. Also, 
# for security reasons, the script does not accept passwords redirected from a 
#file. The user has to key in the confirmations and passwords at the prompts.
#The -verify option means that the user just wants to verify whether SSH has 
#been set up. In this case, the script would not setup SSH but would only check
# whether SSH connectivity has been setup from the local host to the remote 
# hosts. The script would run the date command on each remote host using SSH. In
# case the user is prompted for a password or sees a warning message for a 
#particular host, it means SSH connectivity has not been setup correctly for
# that host.
#In case the -verify option is not specified, the script would setup SSH and 
#then do the verification as well.
#In case the user speciies the -exverify option, an exhaustive verification would be done. In that case, the following would be checked:
# 1. SSH connectivity from local host to all remote hosts.
# 2. SSH connectivity from each remote host to itself and other remote hosts.

#echo Parsing command line arguments
numargs=$#

ADVANCED=false
HOSTNAME=`hostname`
CONFIRM=no
SHARED=false
i=1
USR=$USER

if  test -z "$TEMP"
then
  TEMP=/tmp
fi

IDENTITY=id_rsa
LOGFILE=$TEMP/sshUserSetup_`date +%F-%H-%M-%S`.log
VERIFY=false
EXHAUSTIVE_VERIFY=false
HELP=false
PASSPHRASE=no
RERUN_SSHKEYGEN=no
NO_PROMPT_PASSPHRASE=no

while [ $i -le $numargs ]
do
  j=$1 
  if [ $j = "-hosts" ] 
  then
     HOSTS=$2
     shift 1
     i=`expr $i + 1`
  fi
  if [ $j = "-user" ] 
  then
     USR=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-logfile" ] 
  then
     LOGFILE=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-confirm" ] 
  then
     CONFIRM=yes
   fi
  if [ $j = "-hostfile" ] 
  then
     CLUSTER_CONFIGURATION_FILE=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-usePassphrase" ] 
  then
     PASSPHRASE=yes
   fi
  if [ $j = "-noPromptPassphrase" ] 
  then
     NO_PROMPT_PASSPHRASE=yes
   fi
  if [ $j = "-shared" ] 
  then
     SHARED=true
   fi
  if [ $j = "-exverify" ] 
  then
     EXHAUSTIVE_VERIFY=true
   fi
  if [ $j = "-verify" ] 
  then
     VERIFY=true
   fi
  if [ $j = "-advanced" ] 
  then
     ADVANCED=true
   fi
  if [ $j = "-help" ] 
  then
     HELP=true
   fi
  i=`expr $i + 1`
  shift 1
done


if [ $HELP = "true" ]
then
  echo "Usage $0 -user <user name> [ -hosts \"<space separated hostlist>\" | -hostfile <absolute path of cluster configuration file> ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile <desired absolute path of logfile> ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]"
echo "This script is used to setup SSH connectivity from the host on which it is run to the specified remote hosts. After this script is run, the user can use  SSH to run commands on the remote hosts or copy files between the local host and the remote hosts without being prompted for passwords or confirmations.  The list of remote hosts and the user name on the remote host is specified as a command line parameter to the script. "
echo "-user : User on remote hosts. " 
echo "-hosts : Space separated remote hosts list. " 
echo "-hostfile : The user can specify the host names either through the -hosts option or by specifying the absolute path of a cluster configuration file. A sample host file contents are below: " 
echo
echo  "   stacg30 stacg30int 10.1.0.0 stacg30v  -"
echo  "   stacg34 stacg34int 10.1.0.1 stacg34v  -"
echo 
echo " The first column in each row of the host file will be used as the host name."
echo 
echo "-usePassphrase : The user wants to set up passphrase to encrypt the private key on the local host. " 
echo "-noPromptPassphrase : The user does not want to be prompted for passphrase related questions. This is for users who want the default behavior to be followed." 
echo "-shared : In case the user on the remote host has its home directory NFS mounted or shared across the remote hosts, this script should be used with -shared option. " 
echo "  It is possible for the user to determine whether a user's home directory is shared or non-shared. Let us say we want to determine that user user1's home directory is shared across hosts A, B and C."
echo " Follow the following steps:"
echo "    1. On host A, touch ~user1/checkSharedHome.tmp"
echo "    2. On hosts B and C, ls -al ~user1/checkSharedHome.tmp" 
echo "    3. If the file is present on hosts B and C in ~user1 directory and"
echo "       is identical on all hosts A, B, C, it means that the user's home "
echo "       directory is shared."
echo "    4. On host A, rm -f ~user1/checkSharedHome.tmp"
echo " In case the user accidentally passes -shared option for non-shared homes or viceversa,SSH connectivity would only be set up for a subset of the hosts. The user would have to re-run the setyp script with the correct option to rectify this problem."
echo "-advanced :  Specifying the -advanced option on the command line would result in SSH  connectivity being setup among the remote hosts which means that SSH can be used to run commands on one remote host from the other remote host or copy files between the remote hosts without being prompted for passwords or confirmations."
echo "-confirm: The script would remove write permissions on the remote hosts for the user home directory and ~/.ssh directory for "group" and "others". This is an SSH requirement. The user would be explicitly informed about this by the script and prompted to continue. In case the user presses no, the script would exit. In case the user does not want to be prompted, he can use -confirm option."
echo  "As a part of the setup, the script would use SSH to create files within ~/.ssh directory of the remote node and to setup the requisite permissions. The script also uses SCP to copy the local host public key to the remote hosts so that the remote hosts trust the local host for SSH. At the time, the script performs these steps, SSH connectivity has not been completely setup  hence the script would prompt the user for the remote host password.  "
echo "For each remote host, for remote users with non-shared homes this would be done once for SSH and  once for SCP. If the number of remote hosts are x, the user would be prompted  2x times for passwords. For remote users with shared homes, the user would be prompted only twice, once each for SCP and SSH.  For security reasons, the script does not save passwords and reuse it. Also, for security reasons, the script does not accept passwords redirected from a file. The user has to key in the confirmations and passwords at the prompts. "
echo "-verify : -verify option means that the user just wants to verify whether SSH has been set up. In this case, the script would not setup SSH but would only check whether SSH connectivity has been setup from the local host to the remote hosts. The script would run the date command on each remote host using SSH. In case the user is prompted for a password or sees a warning message for a particular host, it means SSH connectivity has not been setup correctly for that host.  In case the -verify option is not specified, the script would setup SSH and then do the verification as well. "
echo "-exverify : In case the user speciies the -exverify option, an exhaustive verification for all hosts would be done. In that case, the following would be checked: "
echo "   1. SSH connectivity from local host to all remote hosts. "
echo "   2. SSH connectivity from each remote host to itself and other remote hosts.  "
echo The -exverify option can be used in conjunction with the -verify option as well to do an exhaustive verification once the setup has been done.  
echo "Taking some examples: Let us say local host is Z, remote hosts are A,B and C. Local user is njerath. Remote users are racqa(non-shared), aime(shared)."
echo "$0 -user racqa -hosts "A B C" -advanced -exverify -confirm"
echo "Script would set up connectivity from Z -> A, Z -> B, Z -> C, A -> A, A -> B, A -> C, B -> A, B -> B, B -> C, C -> A, C -> B, C -> C."
echo "Since user has given -exverify option, all these scenario would be verified too."
echo
echo "Now the user runs : $0 -user racqa -hosts "A B C" -verify"
echo "Since -verify option is given, no SSH setup would be done, only verification of existing setup. Also, since -exverify or -advanced options are not given, script would only verify connectivity from Z -> A, Z -> B, Z -> C"

echo "Now the user runs : $0 -user racqa -hosts "A B C" -verify -advanced"
echo "Since -verify option is given, no SSH setup would be done, only verification of existing setup. Also, since  -advanced options is given, script would verify connectivity from Z -> A, Z -> B, Z -> C, A-> A, A->B, A->C, A->D"

echo "Now the user runs:"
echo "$0 -user aime -hosts "A B C" -confirm -shared"
echo "Script would set up connectivity between  Z->A, Z->B, Z->C only since advanced option is not given."
echo "All these scenarios would be verified too."

exit
fi

if test -z "$HOSTS"
then
   if test -n "$CLUSTER_CONFIGURATION_FILE" && test -f "$CLUSTER_CONFIGURATION_FILE"
   then
      HOSTS=`awk '$1 !~ /^#/ { str = str " " $1 } END { print str }' $CLUSTER_CONFIGURATION_FILE` 
   elif ! test -f "$CLUSTER_CONFIGURATION_FILE"
   then
     echo "Please specify a valid and existing cluster configuration file."
   fi
fi

if  test -z "$HOSTS" || test -z $USR
then
echo "Either user name or host information is missing"
echo "Usage $0 -user <user name> [ -hosts \"<space separated hostlist>\" | -hostfile <absolute path of cluster configuration file> ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile <desired absolute path of logfile> ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]" 
exit 1
fi

if [ -d $LOGFILE ]; then
    echo $LOGFILE is a directory, setting logfile to $LOGFILE/ssh.log
    LOGFILE=$LOGFILE/ssh.log
fi

echo The output of this script is also logged into $LOGFILE | tee -a $LOGFILE

if [ `echo $?` != 0 ]; then
    echo Error writing to the logfile $LOGFILE, Exiting
    exit 1
fi

echo Hosts are $HOSTS | tee -a $LOGFILE
echo user is  $USR | tee -a $LOGFILE
SSH="/usr/bin/ssh"
SCP="/usr/bin/scp"
SSH_KEYGEN="/usr/bin/ssh-keygen"
calculateOS()
{
    platform=`uname -s`
    case "$platform"
    in
       "SunOS")  os=solaris;;
       "Linux")  os=linux;;
       "HP-UX")  os=hpunix;;
         "AIX")  os=aix;;
             *)  echo "Sorry, $platform is not currently supported." | tee -a $LOGFILE
                 exit 1;;
    esac

    echo "Platform:- $platform " | tee -a $LOGFILE
}
calculateOS
BITS=1024
ENCR="rsa"

deadhosts=""
alivehosts=""
if [ $platform = "Linux" ]
then
    PING="/bin/ping"
else
    PING="/usr/sbin/ping"
fi
#bug 9044791
if [ -n "$SSH_PATH" ]; then
    SSH=$SSH_PATH
fi
if [ -n "$SCP_PATH" ]; then
    SCP=$SCP_PATH
fi
if [ -n "$SSH_KEYGEN_PATH" ]; then
    SSH_KEYGEN=$SSH_KEYGEN_PATH
fi
if [ -n "$PING_PATH" ]; then
    PING=$PING_PATH
fi
PATH_ERROR=0
if test ! -x $SSH ; then
    echo "ssh not found at $SSH. Please set the variable SSH_PATH to the correct location of ssh and retry."
    PATH_ERROR=1
fi 
if test ! -x $SCP ; then
    echo "scp not found at $SCP. Please set the variable SCP_PATH to the correct location of scp and retry."
    PATH_ERROR=1
fi 
if test ! -x $SSH_KEYGEN ; then
    echo "ssh-keygen not found at $SSH_KEYGEN. Please set the variable SSH_KEYGEN_PATH to the correct location of ssh-keygen and retry."
    PATH_ERROR=1
fi 
if test ! -x $PING ; then
    echo "ping not found at $PING. Please set the variable PING_PATH to the correct location of ping and retry."
    PATH_ERROR=1
fi 
if [ $PATH_ERROR = 1 ]; then
    echo "ERROR: one or more of the required binaries not found, exiting"
    exit 1
fi
#9044791 end
echo Checking if the remote hosts are reachable | tee -a $LOGFILE
for host in $HOSTS
do
   if [ $platform = "SunOS" ]; then
       $PING -s $host 5 5
   elif [ $platform = "HP-UX" ]; then
       $PING $host -n 5 -m 5
   else
       $PING -c 5 -w 5 $host
   fi
  exitcode=`echo $?`
  if [ $exitcode = 0 ]
  then
     alivehosts="$alivehosts $host"
  else
     deadhosts="$deadhosts $host"
  fi
done

if test -z "$deadhosts"
then
   echo Remote host reachability check succeeded.  | tee -a $LOGFILE
   echo The following hosts are reachable: $alivehosts.  | tee -a $LOGFILE
   echo The following hosts are not reachable: $deadhosts.  | tee -a $LOGFILE
   echo All hosts are reachable. Proceeding further...  | tee -a $LOGFILE
else
   echo Remote host reachability check failed.  | tee -a $LOGFILE
   echo The following hosts are reachable: $alivehosts.  | tee -a $LOGFILE
   echo The following hosts are not reachable: $deadhosts.  | tee -a $LOGFILE
   echo Please ensure that all the hosts are up and re-run the script.  | tee -a $LOGFILE
   echo Exiting now...  | tee -a $LOGFILE
   exit 1
fi

firsthost=`echo $HOSTS | awk '{print $1}; END { }'`
echo firsthost $firsthost
numhosts=`echo $HOSTS | awk '{ }; END {print NF}'`
echo numhosts $numhosts

if [ $VERIFY = "true" ]
then
   echo Since user has specified -verify option, SSH setup would not be done. Only, existing SSH setup would be verified. | tee -a $LOGFILE
   continue
else
echo The script will setup SSH connectivity from the host ''`hostname`'' to all  | tee -a $LOGFILE 
echo the remote hosts. After the script is executed, the user can use SSH to run  | tee -a $LOGFILE 
echo commands on the remote hosts or copy files between this host ''`hostname`'' | tee -a $LOGFILE 
echo and the remote hosts without being prompted for passwords or confirmations. | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo NOTE 1: | tee -a $LOGFILE 
echo As part of the setup procedure, this script will use 'ssh' and 'scp' to copy | tee -a $LOGFILE 
echo files between the local host and the remote hosts. Since the script does not  | tee -a $LOGFILE 
echo store passwords, you may be prompted for the passwords during the execution of  | tee -a $LOGFILE 
echo the script whenever 'ssh' or 'scp' is invoked. | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo NOTE 2: | tee -a $LOGFILE 
echo "AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY" | tee -a $LOGFILE 
echo AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEGES TO THESE  | tee -a $LOGFILE 
echo "directories." | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo "Do you want to continue and let the script make the above mentioned changes (yes/no)?" | tee -a $LOGFILE 

if [ "$CONFIRM" = "no" ] 
then 
  read CONFIRM 
else
  echo "Confirmation provided on the command line" | tee -a $LOGFILE
fi 
   
echo  | tee -a $LOGFILE 
echo The user chose ''$CONFIRM'' | tee -a $LOGFILE 

if [ -z "$CONFIRM" -o "$CONFIRM" != "yes" -a "$CONFIRM" != "no" ]
then
  echo "You haven't specified proper input. Please enter 'yes' or 'no'. Exiting...."
  exit 0
fi
if [ "$CONFIRM" = "no" ] 
then 
  echo "SSH setup is not done." | tee -a $LOGFILE 
  exit 1 
else 
  if [ $NO_PROMPT_PASSPHRASE = "yes" ]
  then
    echo "User chose to skip passphrase related questions."  | tee -a $LOGFILE
  else
    if [ $SHARED = "true" ]
    then
	  hostcount=`expr ${numhosts} + 1`
	  PASSPHRASE_PROMPT=`expr 2 \* $hostcount`
    else
	  PASSPHRASE_PROMPT=`expr 2 \* ${numhosts}`
    fi
    echo "Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type 'yes' or 'no' and then press enter. In case you press 'yes', you would need to enter the passphrase whenever the script executes ssh or scp. $PASSPHRASE " | tee -a $LOGFILE
    echo "The estimated number of times the user would be prompted for a passphrase is $PASSPHRASE_PROMPT. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion. " | tee -a $LOGFILE
    echo "Enter 'yes' or 'no'." | tee -a $LOGFILE
    if [ "$PASSPHRASE" = "no" ]
    then
      read PASSPHRASE
    else
      echo "Confirmation provided on the command line" | tee -a $LOGFILE
    fi 

    echo  | tee -a $LOGFILE 
    echo The user chose ''$PASSPHRASE'' | tee -a $LOGFILE 
    if [ -z "$PASSPHRASE"  -o "$PASSPHRASE" != "yes" -a "$PASSPHRASE" != "no" ]
    then
      echo "You haven't specified whether to use Passphrase or not. Please specify 'yes' or 'no'. Exiting..."
      exit 0
    fi

    if [ "$PASSPHRASE" = "yes" ] 
    then 
       RERUN_SSHKEYGEN="yes"
#Checking for existence of ${IDENTITY} file
       if test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
       then
	     echo "The files containing the client public and private keys already exist on the local host. The current private key may or may not have a passphrase associated with it. In case you remember the passphrase and do not want to re-run ssh-keygen, press 'no' and enter. If you press 'no', the script will not attempt to create any new public/private key pairs. If you press 'yes', the script will remove the old private/public key files existing and create new ones prompting the user to enter the passphrase. If you enter 'yes', any previous SSH user setups would be reset. If you press 'change', the script will associate a new passphrase with the old keys." | tee -a $LOGFILE
	     echo "Press 'yes', 'no' or 'change'" | tee -a $LOGFILE
             read RERUN_SSHKEYGEN 
             echo The user chose ''$RERUN_SSHKEYGEN'' | tee -a $LOGFILE 
	     if [ -z "$RERUN_SSHKEYGEN" -o "$RERUN_SSHKEYGEN" != "yes" -a "$RERUN_SSHKEYGEN" != "no" -a "$RERUN_SSHKEYGEN" != "change" ]
	     then
	       echo "You haven't specified whether to re-run 'ssh-keygen' or not. Please enter 'yes' , 'no' or 'change'. Exiting..."
	       exit 0;
	     fi
       fi 
     else
       if test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
       then
         echo "The files containing the client public and private keys already exist on the local host. The current private key may have a passphrase associated with it. In case you find using passphrase inconvenient(although it is more secure), you can change to it empty through this script. Press 'change' if you want the script to change the passphrase for you. Press 'no' if you want to use your old passphrase, if you had one."
         read RERUN_SSHKEYGEN 
         echo The user chose ''$RERUN_SSHKEYGEN'' | tee -a $LOGFILE 
	 if [ -z "$RERUN_SSHKEYGEN" -o "$RERUN_SSHKEYGEN" != "yes" -a "$RERUN_SSHKEYGEN" != "no" -a "$RERUN_SSHKEYGEN" != "change" ]
	 then
	   echo "You haven't specified whether to re-run 'ssh-keygen' or not. Please enter 'yes' , 'no' or 'change'. Exiting..."
	   exit 0
	 fi
       fi
     fi
  fi
  echo Creating .ssh directory on local host, if not present already | tee -a $LOGFILE
  mkdir -p $HOME/.ssh | tee -a $LOGFILE
echo Creating authorized_keys file on local host  | tee -a $LOGFILE
touch $HOME/.ssh/authorized_keys  | tee -a $LOGFILE
echo Changing permissions on authorized_keys to 644 on local host  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/authorized_keys  | tee -a $LOGFILE
mv -f $HOME/.ssh/authorized_keys  $HOME/.ssh/authorized_keys.tmp | tee -a $LOGFILE
echo Creating known_hosts file on local host  | tee -a $LOGFILE
touch $HOME/.ssh/known_hosts  | tee -a $LOGFILE
echo Changing permissions on known_hosts to 644 on local host  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/known_hosts  | tee -a $LOGFILE
mv -f $HOME/.ssh/known_hosts $HOME/.ssh/known_hosts.tmp | tee -a $LOGFILE


echo Creating config file on local host | tee -a $LOGFILE
echo If a config file exists already at $HOME/.ssh/config, it would be backed up to $HOME/.ssh/config.backup.
echo "Host *" > $HOME/.ssh/config.tmp | tee -a $LOGFILE
echo "ForwardX11 no" >> $HOME/.ssh/config.tmp | tee -a $LOGFILE

if test -f $HOME/.ssh/config 
then
  cp -f $HOME/.ssh/config $HOME/.ssh/config.backup
fi

mv -f $HOME/.ssh/config.tmp $HOME/.ssh/config  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/config

if [ "$RERUN_SSHKEYGEN" = "yes" ]
then
  echo Removing old private/public keys on local host | tee -a $LOGFILE
  rm -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
  rm -f $HOME/.ssh/${IDENTITY}.pub | tee -a $LOGFILE
  echo Running SSH keygen on local host | tee -a $LOGFILE
  $SSH_KEYGEN -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY}   | tee -a $LOGFILE

elif [ "$RERUN_SSHKEYGEN" = "change" ]
then
    echo Running SSH Keygen on local host to change the passphrase associated with the existing private key | tee -a $LOGFILE
    $SSH_KEYGEN -p -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
elif test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
then
    continue
else
    echo Removing old private/public keys on local host | tee -a $LOGFILE
    rm -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
    rm -f $HOME/.ssh/${IDENTITY}.pub | tee -a $LOGFILE
    echo Running SSH keygen on local host with empty passphrase | tee -a $LOGFILE
    $SSH_KEYGEN -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY} -N ''  | tee -a $LOGFILE
fi

if [ $SHARED = "true" ]
then
  if [ $USER = $USR ]
  then
#No remote operations required
    echo Remote user is same as local user | tee -a $LOGFILE
    REMOTEHOSTS=""
    chmod og-w $HOME $HOME/.ssh | tee -a $LOGFILE
  else    
    REMOTEHOSTS="${firsthost}"
  fi
else
  REMOTEHOSTS="$HOSTS"
fi

for host in $REMOTEHOSTS
do
     echo Creating .ssh directory and setting permissions on remote host $host | tee -a $LOGFILE
     echo "THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR "group" AND "others" ON THE HOME DIRECTORY FOR $USR. THIS IS AN SSH REQUIREMENT." | tee -a $LOGFILE
     echo The script would create ~$USR/.ssh/config file on remote host $host. If a config file exists already at ~$USR/.ssh/config, it would be backed up to ~$USR/.ssh/config.backup. | tee -a $LOGFILE
     echo The user may be prompted for a password here since the script would be running SSH on host $host. | tee -a $LOGFILE
     $SSH -o StrictHostKeyChecking=no -x -l $USR $host "/bin/sh -c \"  mkdir -p .ssh ; chmod og-w . .ssh;   touch .ssh/authorized_keys .ssh/known_hosts;  chmod 644 .ssh/authorized_keys  .ssh/known_hosts; cp  .ssh/authorized_keys .ssh/authorized_keys.tmp ;  cp .ssh/known_hosts .ssh/known_hosts.tmp; echo \\"Host *\\" > .ssh/config.tmp; echo \\"ForwardX11 no\\" >> .ssh/config.tmp; if test -f  .ssh/config ; then cp -f .ssh/config .ssh/config.backup; fi ; mv -f .ssh/config.tmp .ssh/config\""  | tee -a $LOGFILE
     echo Done with creating .ssh directory and setting permissions on remote host $host. | tee -a $LOGFILE
done

for host in $REMOTEHOSTS
do
  echo Copying local host public key to the remote host $host | tee -a $LOGFILE
  echo The user may be prompted for a password or passphrase here since the script would be using SCP for host $host. | tee -a $LOGFILE

  $SCP $HOME/.ssh/${IDENTITY}.pub  $USR@$host:.ssh/authorized_keys | tee -a $LOGFILE
  echo Done copying local host public key to the remote host $host | tee -a $LOGFILE
done

cat $HOME/.ssh/${IDENTITY}.pub >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE

for host in $HOSTS
do
  if [ "$ADVANCED" = "true" ] 
  then
    echo Creating keys on remote host $host if they do not exist already. This is required to setup SSH on host $host. | tee -a $LOGFILE
    if [ "$SHARED" = "true" ]
    then
      IDENTITY_FILE_NAME=${IDENTITY}_$host
      COALESCE_IDENTITY_FILES_COMMAND="cat .ssh/${IDENTITY_FILE_NAME}.pub >> .ssh/authorized_keys"
    else
      IDENTITY_FILE_NAME=${IDENTITY}
    fi

   $SSH  -o StrictHostKeyChecking=no -x -l $USR $host " /bin/sh -c \"if test -f  .ssh/${IDENTITY_FILE_NAME}.pub && test -f  .ssh/${IDENTITY_FILE_NAME}; then echo; else rm -f .ssh/${IDENTITY_FILE_NAME} ;  rm -f .ssh/${IDENTITY_FILE_NAME}.pub ;  $SSH_KEYGEN -t $ENCR -b $BITS -f .ssh/${IDENTITY_FILE_NAME} -N '' ; fi; ${COALESCE_IDENTITY_FILES_COMMAND} \"" | tee -a $LOGFILE
  else 
#At least get the host keys from all hosts for shared case - advanced option not set
    if test  $SHARED = "true" && test $ADVANCED = "false"
    then
      if [ "$PASSPHRASE" = "yes" ]
      then
	 echo "The script will fetch the host keys from all hosts. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
      fi
      $SSH  -o StrictHostKeyChecking=no -x -l $USR $host "/bin/sh -c true"
    fi
  fi
done

for host in $REMOTEHOSTS
do
  if test $ADVANCED = "true" && test $SHARED = "false"  
  then
      $SCP $USR@$host:.ssh/${IDENTITY}.pub $HOME/.ssh/${IDENTITY}.pub.$host | tee -a $LOGFILE
      cat $HOME/.ssh/${IDENTITY}.pub.$host >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE
      rm -f $HOME/.ssh/${IDENTITY}.pub.$host | tee -a $LOGFILE
    fi
done

for host in $REMOTEHOSTS
do
   if [ "$ADVANCED" = "true" ]
   then
      if [ "$SHARED" != "true" ]
      then
         echo Updating authorized_keys file on remote host $host | tee -a $LOGFILE
         $SCP $HOME/.ssh/authorized_keys  $USR@$host:.ssh/authorized_keys | tee -a $LOGFILE
      fi 
     echo Updating known_hosts file on remote host $host | tee -a $LOGFILE
     $SCP $HOME/.ssh/known_hosts $USR@$host:.ssh/known_hosts | tee -a $LOGFILE
   fi
   if [ "$PASSPHRASE" = "yes" ]
   then
	 echo "The script will run SSH on the remote machine $host. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
   fi
     $SSH -x -l $USR $host "/bin/sh -c \"cat .ssh/authorized_keys.tmp >> .ssh/authorized_keys; cat .ssh/known_hosts.tmp >> .ssh/known_hosts; rm -f  .ssh/known_hosts.tmp  .ssh/authorized_keys.tmp\"" | tee -a $LOGFILE
done

cat  $HOME/.ssh/known_hosts.tmp >> $HOME/.ssh/known_hosts | tee -a $LOGFILE
cat  $HOME/.ssh/authorized_keys.tmp >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE
#Added chmod to fix BUG NO 5238814
chmod 644 $HOME/.ssh/authorized_keys
#Fix for BUG NO 5157782
chmod 644 $HOME/.ssh/config
rm -f  $HOME/.ssh/known_hosts.tmp $HOME/.ssh/authorized_keys.tmp | tee -a $LOGFILE
echo SSH setup is complete. | tee -a $LOGFILE
fi
fi

echo                                                                          | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
echo Verifying SSH setup | tee -a $LOGFILE
echo =================== | tee -a $LOGFILE
echo The script will now run the 'date' command on the remote nodes using ssh | tee -a $LOGFILE
echo to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,  | tee -a $LOGFILE
echo THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR | tee -a $LOGFILE
echo PASSWORDS. If you see any output other than date or are prompted for the | tee -a $LOGFILE
echo password, ssh is not setup correctly and you will need to resolve the  | tee -a $LOGFILE
echo issue and set up ssh again. | tee -a $LOGFILE
echo The possible causes for failure could be:  | tee -a $LOGFILE
echo   1. The server settings in /etc/ssh/sshd_config file do not allow ssh | tee -a $LOGFILE
echo      for user $USR. | tee -a $LOGFILE
echo   2. The server may have disabled public key based authentication.
echo   3. The client public key on the server may be outdated.
echo   4. ~$USR or  ~$USR/.ssh on the remote host may not be owned by $USR.  | tee -a $LOGFILE
echo   5. User may not have passed -shared option for shared remote users or | tee -a $LOGFILE
echo     may be passing the -shared option for non-shared remote users.  | tee -a $LOGFILE
echo   6. If there is output in addition to the date, but no password is asked, | tee -a $LOGFILE
echo   it may be a security alert shown as part of company policy. Append the | tee -a $LOGFILE
echo   "additional text to the <OMS HOME>/sysman/prov/resources/ignoreMessages.txt file." | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
#read -t 30 dummy
  for host in $HOSTS
  do
    echo --$host:-- | tee -a $LOGFILE

     echo Running $SSH -x -l $USR $host date to verify SSH connectivity has been setup from local host to $host.  | tee -a $LOGFILE
     echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR." | tee -a $LOGFILE
     if [ "$PASSPHRASE" = "yes" ]
     then
       echo "The script will run SSH on the remote machine $host. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
     fi
     $SSH -l $USR $host "/bin/sh -c date"  | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
  done


if [ "$EXHAUSTIVE_VERIFY" = "true" ]
then
   for clienthost in $HOSTS
   do

      if [ "$SHARED" = "true" ]
      then
         REMOTESSH="$SSH -i .ssh/${IDENTITY}_${clienthost}"
      else
         REMOTESSH=$SSH
      fi

      for serverhost in  $HOSTS
      do
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
         echo Verifying SSH connectivity has been setup from $clienthost to $serverhost  | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
         echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL."  | tee -a $LOGFILE
         $SSH -l $USR $clienthost "$REMOTESSH $serverhost \"/bin/sh -c date\""  | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
      done  
       echo -Verification from $clienthost complete- | tee -a $LOGFILE
   done
else
   if [ "$ADVANCED" = "true" ]
   then
      if [ "$SHARED" = "true" ]
      then
         REMOTESSH="$SSH -i .ssh/${IDENTITY}_${firsthost}"
      else
         REMOTESSH=$SSH
      fi
     for host in $HOSTS
     do
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
        echo Verifying SSH connectivity has been setup from $firsthost to $host  | tee -a $LOGFILE
        echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL." | tee -a $LOGFILE
       $SSH -l $USR $firsthost "$REMOTESSH $host \"/bin/sh -c date\"" | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
    done
    echo -Verification from $clienthost complete- | tee -a $LOGFILE
  fi
fi
echo "SSH verification complete." | tee -a $LOGFILE


最后修改时间:2024-09-29 15:56:47
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论