今天群中一网友在问linux rac心跳网卡冗余的问题,我这里用自己的vm环境模拟下,如下是通过vm 10gR2 rac环境,
心跳网卡冗余的配置和测试。仅供大家参考!
---修改网卡设置文件,修改为如下内容(2个节点都需要为如下内容):
这里说明一下,网卡绑定后的工作模式有2种,0描述双活即active/active,也就是负载均衡模式,相当于是2个网卡同时使用。
mode属性值为1描述active/standby模式,即主备模式,换句话讲,eth1网卡故障后,eth2可以立即替换上,几乎不会影响rac。
----分别执行如下命令(2个节点都要执行):
----修复oracle cluster配置
删除复制到网卡文件,然后重启下network服务,再次运行oifcfg设置即可,如下:
最后来简单测试下:
不一会儿,rac2 reboot重启了,经查是心跳出问题了。最后检查发现是测试的方式有问题。
最后参考mos官方文档 Configure Ethernet Bonding Interface on EL5 or RHEL5 [ID 877012.1],进行如下配置修改,这样是
oracle mos文档推荐的配置方式,是linux 5/linux 5+版本的推荐设置方式:
参考mos这个文档修改以后,再次测试,发现rac2节点仍然会被驱逐进而reboot,如下:
从目前测试的linux 网卡bond来看,似乎不靠谱,经查是我这里测试的方式不太对,不能通过ifconfig eth1 down的方式。
补充:
总结:
1. oracle rac环境,心跳冗余建议用mode=1,不建议使用0或6以及其他模式;例如使用mode=6可能导致vip飘逸的情况出现。
2. 测试网卡绑定效果,不能使用ifconfig down的方式,只能通过插拔网线来实现。应该ifconfig down操作以后,
该网卡信息会被从/etc/sysconfig/network-scripts/ifcfg-bond0 中清除掉。
进而导致crs 节点被驱逐。
3. 其他平台如aix 可以使用ether channel,hpux可以使用APA 进行绑定。
4. 从11.2.0.2开始,支持HAIP,当然,仍然是支持os级别的bond等技术。
心跳网卡冗余的配置和测试。仅供大家参考!
----停掉crs资源
步骤略
----修改ip文件
rac1:
[root@rac1 network-scripts]# pwd
/etc/sysconfig/network-scripts
[root@rac1 network-scripts]# cp ifcfg-eth1 ifcfg-bond0
[root@rac1 network-scripts]# cat ifcfg-bond0
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=bond0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.73.10 --心跳ip
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
USERCTL=no
BONDING_MASTER=yes
TYPE=Ethernet
rac2:
[root@rac2 network-scripts]# pwd
/etc/sysconfig/network-scripts
[root@rac2 network-scripts]# cp ifcfg-eth1 ifcfg-bond0
[root@rac2 network-scripts]# cat ifcfg-bond0
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=bond0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.73.11
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
USERCTL=no
BONDING_MASTER=yes
TYPE=Ethernet
复制
---修改网卡设置文件,修改为如下内容(2个节点都需要为如下内容):
[root@rac1 devices]# pwd
/etc/sysconfig/networking/devices
[root@rac1 devices]#
[root@rac1 devices]# cat ifcfg-eth1
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=eth1
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
TYPE=ethernet
[root@rac1 devices]# cat ifcfg-eth2
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=eth2
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
TYPE=ethernet
----将bond0 信息加入到/etc/modprobe.conf文件中(2个节点都需要添加):
[root@rac1 devices]# cat /etc/modprobe.conf
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptspi
alias scsi_hostadapter2 ata_piix
alias snd-card-0 snd-ens1371
options snd-card-0 index=0
options snd-ens1371 index=0
remove snd-ens1371 { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-ens1371
# Added by VMware Tools
install pciehp /sbin/modprobe -q --ignore-install acpiphp; /bin/true
install pcnet32 (/sbin/modprobe -q --ignore-install vmxnet || /sbin/modprobe -q --ignore-install pcnet32 $CMDLINE_OPTS);/bin/true
alias eth0 vmxnet
alias eth1 vmxnet
###add by Roger
alias bond0 bonding
options bond0 mode=1 miimon=100 downdelay=200 primary=eth1 primary_reselect=1
[root@rac1 devices]#
复制
这里说明一下,网卡绑定后的工作模式有2种,0描述双活即active/active,也就是负载均衡模式,相当于是2个网卡同时使用。
mode属性值为1描述active/standby模式,即主备模式,换句话讲,eth1网卡故障后,eth2可以立即替换上,几乎不会影响rac。
----分别执行如下命令(2个节点都要执行):
[root@rac1 devices]# modprobe bonding
[root@rac1 devices]#
---check network
[root@rac1 devices]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8
inet addr:192.168.73.10 Bcast:192.168.73.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:248296 errors:0 dropped:0 overruns:0 frame:0
TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:198712779 (189.5 MiB) TX bytes:83112558 (79.2 MiB)
eth0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE
inet addr:192.168.0.128 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:8910 errors:0 dropped:0 overruns:0 frame:0
TX packets:6416 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:961007 (938.4 KiB) TX bytes:897232 (876.2 KiB)
Interrupt:75 Base address:0x2424
eth1 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:248290 errors:0 dropped:0 overruns:0 frame:0
TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:198712419 (189.5 MiB) TX bytes:83112558 (79.2 MiB)
Interrupt:67 Base address:0x24a4
eth2 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:360 (360.0 b) TX bytes:0 (0.0 b)
Interrupt:59 Base address:0x28a4
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:43711 errors:0 dropped:0 overruns:0 frame:0
TX packets:43711 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9908534 (9.4 MiB) TX bytes:9908534 (9.4 MiB)
[root@rac2 ~]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:0C:29:68:6B:52
inet addr:192.168.73.11 Bcast:192.168.73.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe68:6b52/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:25 errors:0 dropped:0 overruns:0 frame:0
TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4853 (4.7 KiB) TX bytes:7044 (6.8 KiB)
eth0 Link encap:Ethernet HWaddr 00:0C:29:68:6B:48
inet addr:192.168.0.129 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe68:6b48/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:68 errors:0 dropped:0 overruns:0 frame:0
TX packets:93 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6205 (6.0 KiB) TX bytes:12528 (12.2 KiB)
Interrupt:75 Base address:0x2424
eth1 Link encap:Ethernet HWaddr 00:0C:29:68:6B:52
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:25 errors:0 dropped:0 overruns:0 frame:0
TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4853 (4.7 KiB) TX bytes:7044 (6.8 KiB)
Interrupt:67 Base address:0x24a4
eth2 Link encap:Ethernet HWaddr 00:0C:29:68:6B:5C
inet6 addr: fe80::20c:29ff:fe68:6b5c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3425 (3.3 KiB) TX bytes:5321 (5.1 KiB)
Interrupt:59 Base address:0x2824
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:3517 errors:0 dropped:0 overruns:0 frame:0
TX packets:3517 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4765745 (4.5 MiB) TX bytes:4765745 (4.5 MiB)
复制
----修复oracle cluster配置
---rac1
[root@rac1 bin]# cd /home/oracle/app/oracle/product/10.2.0/crs/bin
[root@rac1 bin]# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@rac1 bin]# ./oifcfg iflist
eth0 192.168.0.0
bond0 192.168.73.0
[root@rac1 bin]# ./oifcfg delif
[root@rac1 bin]#
[root@rac1 bin]# ./oifcfg setif -global eth0/192.168.0.0:public
[root@rac1 bin]#
[root@rac1 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect
[root@rac1 bin]#
---rac2
[root@rac2 bin]# ./oifcfg delif
[root@rac2 bin]#
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public
PRIF-50: duplicate interface is given in the input
[root@rac2 network-scripts]# service network start
Bringing up loopback interface: [ OK ]
Bringing up interface bond0: [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface eth2:
Determining IP information for eth2... failed.
[FAILED]
复制
删除复制到网卡文件,然后重启下network服务,再次运行oifcfg设置即可,如下:
[root@rac2 bin]# ./oifcfg iflist
eth0 192.168.0.0
bond0 192.168.73.0
[root@rac2 bin]# ./oifcfg delif
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public
[root@rac2 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect
---启动crs资源
[root@rac1 bin]# ./crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1
ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1
ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1
ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2
ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2
ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2
ora.roger.db application 0/0 0/1 ONLINE ONLINE rac1
ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1
ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1
ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2
ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1
ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2
复制
最后来简单测试下:
[root@rac1 bin]# ifconfig eth1 down
[root@rac1 bin]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8
inet addr:192.168.73.10 Bcast:192.168.73.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:320916 errors:0 dropped:0 overruns:0 frame:0
TX packets:212511 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:258401205 (246.4 MiB) TX bytes:102844897 (98.0 MiB)
eth0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE
inet addr:192.168.0.128 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:10976 errors:0 dropped:0 overruns:0 frame:0
TX packets:8114 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1239956 (1.1 MiB) TX bytes:1140393 (1.0 MiB)
Interrupt:75 Base address:0x2424
eth0:1 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE
inet addr:192.168.0.130 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
Interrupt:75 Base address:0x2424
eth2 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:63 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:13911 (13.5 KiB) TX bytes:0 (0.0 b)
Interrupt:59 Base address:0x28a4
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:55766 errors:0 dropped:0 overruns:0 frame:0
TX packets:55766 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:12093653 (11.5 MiB) TX bytes:12093653 (11.5 MiB)
[root@rac1 bin]# ./crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1
ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1
ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1
ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2
ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2
ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2
ora.roger.db application 0/0 0/1 ONLINE ONLINE rac1
ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1
ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1
ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2
ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1
ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2
[root@rac1 bin]#
[root@rac1 bin]# ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE OFFLINE
ora....C2.lsnr application ONLINE OFFLINE
ora.rac2.gsd application ONLINE OFFLINE
ora.rac2.ons application ONLINE OFFLINE
ora.rac2.vip application ONLINE ONLINE rac1
ora.roger.db application ONLINE ONLINE rac1
ora....lldb.cs application ONLINE ONLINE rac1
ora....er1.srv application ONLINE ONLINE rac1
ora....er2.srv application ONLINE OFFLINE
ora....r1.inst application ONLINE ONLINE rac1
ora....r2.inst application ONLINE OFFLINE
复制
不一会儿,rac2 reboot重启了,经查是心跳出问题了。最后检查发现是测试的方式有问题。
[root@rac1 bin]# ./oifcfg iflist
eth0 192.168.0.0
bond0 192.168.73.0
[root@rac1 bin]#
[root@rac1 bin]# ping 192.168.73.11
PING 192.168.73.11 (192.168.73.11) 56(84) bytes of data.
From 192.168.73.10 icmp_seq=2 Destination Host Unreachable
From 192.168.73.10 icmp_seq=3 Destination Host Unreachable
From 192.168.73.10 icmp_seq=4 Destination Host Unreachable
--- 192.168.73.11 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} packet loss, time 3663ms
, pipe 3
[root@rac1 bin]# ifconfig eth1 up
当把rac1上的eth1 激活后,rac2 心跳能ping通了。 经过多次测试发现,不管是mode=0 还是1,
当我在rac1上执行ifconfig eth1 down后,最后都会导致rac2节点reboot。ocssd.log会出现如下类似信息:
[ CSSD]2013-02-01 01:02:13.065 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 29.930 seconds seedhbimpd 0
[ CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE: clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30070
[ CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[ CSSD]2013-02-01 01:02:14.343 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 28.920 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:02:23.434 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:02:23.435 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:02:32.181 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 14.900 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:02:33.413 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 13.900 seconds seedhbimpd 1
复制
最后参考mos官方文档 Configure Ethernet Bonding Interface on EL5 or RHEL5 [ID 877012.1],进行如下配置修改,这样是
oracle mos文档推荐的配置方式,是linux 5/linux 5+版本的推荐设置方式:
1.configure bonding driver
# grep bond0 /etc/modprobe.conf
alias bond0 bonding
2.configure under-layer interfaces
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
# cat /etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
3.configure bonding interface with bonding parameters
# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
MASTER=yes
BOOTPROTO=dhcp
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"
这第3步骤我感觉不对,应该改成静态ip,于是修改为如下:
rac1:
DEVICE=bond0
MASTER=yes
#BOOTPROTO=dhcp
BOOTPROTO=static
IPADDR=192.168.73.10
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"
rac2:
DEVICE=bond0
MASTER=yes
#BOOTPROTO=dhcp
BOOTPROTO=static
IPADDR=192.168.73.11
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"
4.activate bonding interface
# ifup bond0
复制
参考mos这个文档修改以后,再次测试,发现rac2节点仍然会被驱逐进而reboot,如下:
[root@rac1 bin]# ./crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1
ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1
ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1
ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2
ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2
ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2
ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2
ora.roger.db application 0/0 0/1 ONLINE ONLINE rac2
ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1
ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1
ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2
ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1
ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2
[root@rac1 bin]#
[root@rac1 bin]# ifconfig eth1 down
rac2的ocssd.log:
[ CSSD]2013-02-01 01:31:53.698 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:31:54.897 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 29.650 seconds seedhbimpd 0
[ CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE: clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30350
[ CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2013-02-01 01:31:56.137 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 28.650 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2013-02-01 01:32:13.682 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 14.620 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:14.941 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 13.620 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[ CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[ CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE: clssgmAllocateRPCIndex: allocated rpc 262 (0x19ddd0)
[ CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE: clssgmRPC: rpc 0x19ddd0 (RPC#262) tag(106002a) sent to node 1
[ CSSD]2013-02-01 01:32:25.001 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 5.610 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:26.253 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 4.610 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:27.496 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 3.600 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes
[ CSSD]2013-02-01 01:32:28.783 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 2.600 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:30.043 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 1.600 seconds seedhbimpd 1
[ CSSD]2013-02-01 01:32:31.327 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 0.600 seconds seedhbimpd 1
[ CSSD]------- Begin Dump -------
复制
从目前测试的linux 网卡bond来看,似乎不靠谱,经查是我这里测试的方式不太对,不能通过ifconfig eth1 down的方式。
补充:
bonding mode=1 miimon=100。miimon是用来进行链路监测的。 比如:miimon=100,那么系统每100ms监测一次链路连接状态,如果有一条线路不通就转入另一条线路;
mode的值表示工作模式,他共有0-6七种模式,常用的为0,1,6三种。
mode=0:平衡负载模式,有自动备援,但需要"Switch"支援及设定。
mode=1:自动备援模式,其中一条线若断线,其他线路将会自动备援。
mode=6:平衡负载模式,有自动备援,不需要"Switch"支援及设定。
mode=0 (balance-rr)
Round-robin policy: Transmit packets in sequential order from the first available slave through the last.
This mode provides load balancing and fault tolerance.
mode=1 (active-backup)
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if,
the active slave fails. The bond’s MAC address is externally visible on only one port (network adapter) to
avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode.
mode=2 (balance-xor)
XOR policy: Transmit based on [(source MAC address XOR'd with destination MAC address) modulo slave count].
This selects the same slave for each destination MAC address. This mode provides load balancing and fault tolerance.
mode=3 (broadcast)
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.
mode=4 (802.3ad)
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings.
Utilizes all slaves in the active aggregator according to the 802.3ad specification. Pre-requisites: 1.
Ethtool support in the base drivers for retrieving
the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link
aggregation. Most switches will require some type of configuration to enable 802.3ad mode.
mode=5 (balance-tlb)
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave.
Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC
address of the failed receiving slave. Prerequisite: Ethtool support in the base drivers for retrieving the speed of each slave.
mode=6 (balance-alb)
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not
require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver
intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address
with the unique hardware address of one of the slaves in the bond such that different peers use different
hardware addresses for the server.
复制
总结:
1. oracle rac环境,心跳冗余建议用mode=1,不建议使用0或6以及其他模式;例如使用mode=6可能导致vip飘逸的情况出现。
2. 测试网卡绑定效果,不能使用ifconfig down的方式,只能通过插拔网线来实现。应该ifconfig down操作以后,
该网卡信息会被从/etc/sysconfig/network-scripts/ifcfg-bond0 中清除掉。
进而导致crs 节点被驱逐。
3. 其他平台如aix 可以使用ether channel,hpux可以使用APA 进行绑定。
4. 从11.2.0.2开始,支持HAIP,当然,仍然是支持os级别的bond等技术。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
554次阅读
2025-04-15 17:24:06
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
477次阅读
2025-04-18 14:18:38
Oracle SQL 执行计划分析与优化指南
Digital Observer
453次阅读
2025-04-01 11:08:44
XTTS跨版本迁移升级方案(11g to 19c RAC for Linux)
zwtian
451次阅读
2025-04-08 09:12:48
墨天轮个人数说知识点合集
JiekeXu
447次阅读
2025-04-01 15:56:03
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
440次阅读
2025-04-22 00:20:37
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
416次阅读
2025-04-22 00:13:51
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
416次阅读
2025-04-20 10:07:02
Oracle 19c RAC更换IP实战,运维必看!
szrsu
395次阅读
2025-04-08 23:57:08
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
365次阅读
2025-04-17 17:02:24
TA的专栏
Roger's Database Notes
收录77篇内容