oracle 10gR2 rac for Linux--心跳网卡冗余配置和测试

原创 Roger 2013-02-01

1194

今天群中一网友在问linux rac心跳网卡冗余的问题，我这里用自己的vm环境模拟下，如下是通过vm 10gR2 rac环境，
心跳网卡冗余的配置和测试。仅供大家参考！


----停掉crs资源
步骤略
----修改ip文件

rac1：
[root@rac1 network-scripts]# pwd
/etc/sysconfig/network-scripts
[root@rac1 network-scripts]# cp  ifcfg-eth1 ifcfg-bond0
[root@rac1 network-scripts]# cat ifcfg-bond0
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=bond0   
BOOTPROTO=static  
ONBOOT=yes      
IPADDR=192.168.73.10  --心跳ip
NETWORK=192.168.73.0    
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
USERCTL=no       
BONDING_MASTER=yes 
TYPE=Ethernet 

rac2：
[root@rac2 network-scripts]# pwd
/etc/sysconfig/network-scripts
[root@rac2 network-scripts]# cp  ifcfg-eth1 ifcfg-bond0
[root@rac2 network-scripts]# cat ifcfg-bond0
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=bond0   
BOOTPROTO=static  
ONBOOT=yes      
IPADDR=192.168.73.11
NETWORK=192.168.73.0    
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
USERCTL=no       
BONDING_MASTER=yes 
TYPE=Ethernet 
复制

---修改网卡设置文件，修改为如下内容(2个节点都需要为如下内容):


[root@rac1 devices]# pwd
/etc/sysconfig/networking/devices
[root@rac1 devices]# 
[root@rac1 devices]# cat ifcfg-eth1
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=eth1
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
TYPE=ethernet
[root@rac1 devices]# cat ifcfg-eth2
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=eth2
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
TYPE=ethernet

----将bond0 信息加入到/etc/modprobe.conf文件中(2个节点都需要添加)：

[root@rac1 devices]# cat /etc/modprobe.conf
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptspi
alias scsi_hostadapter2 ata_piix
alias snd-card-0 snd-ens1371
options snd-card-0 index=0
options snd-ens1371 index=0
remove snd-ens1371 { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-ens1371
# Added by VMware Tools
install pciehp /sbin/modprobe -q --ignore-install acpiphp; /bin/true
install pcnet32 (/sbin/modprobe -q --ignore-install vmxnet || /sbin/modprobe -q --ignore-install pcnet32 $CMDLINE_OPTS);/bin/true
alias eth0 vmxnet
alias eth1 vmxnet

###add by Roger
alias bond0 bonding 
options bond0 mode=1 miimon=100 downdelay=200 primary=eth1 primary_reselect=1
[root@rac1 devices]# 
复制

这里说明一下,网卡绑定后的工作模式有2种，0描述双活即active/active，也就是负载均衡模式，相当于是2个网卡同时使用。
mode属性值为1描述active/standby模式，即主备模式，换句话讲，eth1网卡故障后，eth2可以立即替换上，几乎不会影响rac。

----分别执行如下命令(2个节点都要执行)：


[root@rac1 devices]# modprobe bonding
[root@rac1 devices]# 

---check network
[root@rac1 devices]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:0C:29:A7:65:F8  
          inet addr:192.168.73.10  Bcast:192.168.73.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:248296 errors:0 dropped:0 overruns:0 frame:0
          TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:198712779 (189.5 MiB)  TX bytes:83112558 (79.2 MiB)

eth0      Link encap:Ethernet  HWaddr 00:0C:29:A7:65:EE  
          inet addr:192.168.0.128  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1460  Metric:1
          RX packets:8910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6416 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:961007 (938.4 KiB)  TX bytes:897232 (876.2 KiB)
          Interrupt:75 Base address:0x2424 

eth1      Link encap:Ethernet  HWaddr 00:0C:29:A7:65:F8  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:248290 errors:0 dropped:0 overruns:0 frame:0
          TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:198712419 (189.5 MiB)  TX bytes:83112558 (79.2 MiB)
          Interrupt:67 Base address:0x24a4 

eth2      Link encap:Ethernet  HWaddr 00:0C:29:A7:65:F8  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:360 (360.0 b)  TX bytes:0 (0.0 b)
          Interrupt:59 Base address:0x28a4 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:43711 errors:0 dropped:0 overruns:0 frame:0
          TX packets:43711 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:9908534 (9.4 MiB)  TX bytes:9908534 (9.4 MiB)
          


[root@rac2 ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:0C:29:68:6B:52  
          inet addr:192.168.73.11  Bcast:192.168.73.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe68:6b52/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:25 errors:0 dropped:0 overruns:0 frame:0
          TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4853 (4.7 KiB)  TX bytes:7044 (6.8 KiB)

eth0      Link encap:Ethernet  HWaddr 00:0C:29:68:6B:48  
          inet addr:192.168.0.129  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe68:6b48/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:68 errors:0 dropped:0 overruns:0 frame:0
          TX packets:93 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6205 (6.0 KiB)  TX bytes:12528 (12.2 KiB)
          Interrupt:75 Base address:0x2424 

eth1      Link encap:Ethernet  HWaddr 00:0C:29:68:6B:52  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:25 errors:0 dropped:0 overruns:0 frame:0
          TX packets:55 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4853 (4.7 KiB)  TX bytes:7044 (6.8 KiB)
          Interrupt:67 Base address:0x24a4 

eth2      Link encap:Ethernet  HWaddr 00:0C:29:68:6B:5C  
          inet6 addr: fe80::20c:29ff:fe68:6b5c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3425 (3.3 KiB)  TX bytes:5321 (5.1 KiB)
          Interrupt:59 Base address:0x2824 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3517 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3517 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4765745 (4.5 MiB)  TX bytes:4765745 (4.5 MiB)
          
复制

----修复oracle cluster配置


---rac1
[root@rac1 bin]# cd /home/oracle/app/oracle/product/10.2.0/crs/bin
[root@rac1 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly
[root@rac1 bin]#  ./oifcfg iflist 
eth0  192.168.0.0
bond0  192.168.73.0
[root@rac1 bin]# ./oifcfg delif
[root@rac1 bin]# 
[root@rac1 bin]# ./oifcfg setif -global eth0/192.168.0.0:public 
[root@rac1 bin]# 
[root@rac1 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect
[root@rac1 bin]# 

---rac2
[root@rac2 bin]# ./oifcfg delif
[root@rac2 bin]# 
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public 
PRIF-50: duplicate interface is given in the input
[root@rac2 network-scripts]# service network start
Bringing up loopback interface:  [  OK  ]
Bringing up interface bond0:  [  OK  ]
Bringing up interface eth0:  [  OK  ]
Bringing up interface eth2:  
Determining IP information for eth2... failed.
[FAILED]
复制

删除复制到网卡文件，然后重启下network服务，再次运行oifcfg设置即可，如下：


[root@rac2 bin]# ./oifcfg iflist
eth0  192.168.0.0
bond0  192.168.73.0
[root@rac2 bin]# ./oifcfg delif
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public
[root@rac2 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect 


---启动crs资源
[root@rac1 bin]# ./crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    0/0    0/0    ONLINE    ONLINE    rac2        
ora.roger.db   application    0/0    0/1    ONLINE    ONLINE    rac1        
ora....lldb.cs application    0/0    0/1    ONLINE    ONLINE    rac1        
ora....er1.srv application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....er2.srv application    0/0    0/0    ONLINE    ONLINE    rac2        
ora....r1.inst application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....r2.inst application    0/5    0/0    ONLINE    ONLINE    rac2        
复制

最后来简单测试下：


[root@rac1 bin]# ifconfig eth1 down       
[root@rac1 bin]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:0C:29:A7:65:F8  
          inet addr:192.168.73.10  Bcast:192.168.73.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:320916 errors:0 dropped:0 overruns:0 frame:0
          TX packets:212511 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:258401205 (246.4 MiB)  TX bytes:102844897 (98.0 MiB)

eth0      Link encap:Ethernet  HWaddr 00:0C:29:A7:65:EE  
          inet addr:192.168.0.128  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1460  Metric:1
          RX packets:10976 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8114 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1239956 (1.1 MiB)  TX bytes:1140393 (1.0 MiB)
          Interrupt:75 Base address:0x2424 

eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:A7:65:EE  
          inet addr:192.168.0.130  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1460  Metric:1
          Interrupt:75 Base address:0x2424 

eth2      Link encap:Ethernet  HWaddr 00:0C:29:A7:65:F8  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:63 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13911 (13.5 KiB)  TX bytes:0 (0.0 b)
          Interrupt:59 Base address:0x28a4 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:55766 errors:0 dropped:0 overruns:0 frame:0
          TX packets:55766 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:12093653 (11.5 MiB)  TX bytes:12093653 (11.5 MiB)

[root@rac1 bin]# ./crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    0/0    0/0    ONLINE    ONLINE    rac2        
ora.roger.db   application    0/0    0/1    ONLINE    ONLINE    rac1        
ora....lldb.cs application    0/0    0/1    ONLINE    ONLINE    rac1        
ora....er1.srv application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....er2.srv application    0/0    0/0    ONLINE    ONLINE    rac2        
ora....r1.inst application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....r2.inst application    0/5    0/0    ONLINE    ONLINE    rac2        
[root@rac1 bin]# 
[root@rac1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    ONLINE    ONLINE    rac1        
ora....SM2.asm application    ONLINE    OFFLINE               
ora....C2.lsnr application    ONLINE    OFFLINE               
ora.rac2.gsd   application    ONLINE    OFFLINE               
ora.rac2.ons   application    ONLINE    OFFLINE               
ora.rac2.vip   application    ONLINE    ONLINE    rac1        
ora.roger.db   application    ONLINE    ONLINE    rac1        
ora....lldb.cs application    ONLINE    ONLINE    rac1        
ora....er1.srv application    ONLINE    ONLINE    rac1        
ora....er2.srv application    ONLINE    OFFLINE               
ora....r1.inst application    ONLINE    ONLINE    rac1        
ora....r2.inst application    ONLINE    OFFLINE               
复制

不一会儿，rac2 reboot重启了，经查是心跳出问题了。最后检查发现是测试的方式有问题。


[root@rac1 bin]# ./oifcfg iflist
eth0  192.168.0.0
bond0  192.168.73.0
[root@rac1 bin]# 
[root@rac1 bin]# ping 192.168.73.11
PING 192.168.73.11 (192.168.73.11) 56(84) bytes of data.
From 192.168.73.10 icmp_seq=2 Destination Host Unreachable
From 192.168.73.10 icmp_seq=3 Destination Host Unreachable
From 192.168.73.10 icmp_seq=4 Destination Host Unreachable

--- 192.168.73.11 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} packet loss, time 3663ms
, pipe 3
[root@rac1 bin]# ifconfig eth1 up

当把rac1上的eth1 激活后，rac2 心跳能ping通了。 经过多次测试发现，不管是mode=0 还是1，
当我在rac1上执行ifconfig eth1 down后，最后都会导致rac2节点reboot。ocssd.log会出现如下类似信息：
[    CSSD]2013-02-01 01:02:13.065 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 29.930 seconds seedhbimpd 0
[    CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE:   clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30070
[    CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-02-01 01:02:14.343 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 28.920 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:02:23.434 [3053439888] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:02:23.435 [3053439888] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:02:32.181 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 14.900 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:02:33.413 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 13.900 seconds seedhbimpd 1
复制

最后参考mos官方文档 Configure Ethernet Bonding Interface on EL5 or RHEL5 [ID 877012.1]，进行如下配置修改，这样是
oracle mos文档推荐的配置方式，是linux 5/linux 5+版本的推荐设置方式：


1.configure bonding driver

# grep bond0 /etc/modprobe.conf
alias bond0 bonding

2.configure under-layer interfaces

# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes

# cat /etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes


3.configure bonding interface with bonding parameters

# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
MASTER=yes
BOOTPROTO=dhcp
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"

这第3步骤我感觉不对，应该改成静态ip，于是修改为如下：
rac1：

DEVICE=bond0
MASTER=yes
#BOOTPROTO=dhcp
BOOTPROTO=static 
IPADDR=192.168.73.10
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"

rac2：

DEVICE=bond0
MASTER=yes
#BOOTPROTO=dhcp
BOOTPROTO=static 
IPADDR=192.168.73.11
NETWORK=192.168.73.0
BROADCAST=192.168.73.255
NETMASK=255.255.255.0
ONBOOT=yes
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1"


4.activate bonding interface

# ifup bond0
复制

参考mos这个文档修改以后，再次测试，发现rac2节点仍然会被驱逐进而reboot，如下：


[root@rac1 bin]# ./crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    0/0    0/0    ONLINE    ONLINE    rac2        
ora.roger.db   application    0/0    0/1    ONLINE    ONLINE    rac2        
ora....lldb.cs application    0/0    0/1    ONLINE    ONLINE    rac1        
ora....er1.srv application    0/0    0/0    ONLINE    ONLINE    rac1        
ora....er2.srv application    0/0    0/0    ONLINE    ONLINE    rac2        
ora....r1.inst application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....r2.inst application    0/5    0/0    ONLINE    ONLINE    rac2        
[root@rac1 bin]# 
[root@rac1 bin]# ifconfig eth1 down

rac2的ocssd.log：
[    CSSD]2013-02-01 01:31:53.698 [3032460176] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:31:54.897 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 29.650 seconds seedhbimpd 0
[    CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE:   clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30350
[    CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2013-02-01 01:31:56.137 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 28.650 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-02-01 01:32:13.682 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 14.620 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:14.941 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 13.620 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE:   clssgmAllocateRPCIndex: allocated rpc 262 (0x19ddd0)
[    CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE:   clssgmRPC: rpc 0x19ddd0 (RPC#262) tag(106002a) sent to node 1
[    CSSD]2013-02-01 01:32:25.001 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 5.610 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:26.253 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 4.610 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:27.496 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 3.600 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-02-01 01:32:28.783 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 2.600 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:30.043 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 1.600 seconds seedhbimpd 1
[    CSSD]2013-02-01 01:32:31.327 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} heartbeat fatal, eviction in 0.600 seconds seedhbimpd 1
[    CSSD]------- Begin Dump -------

复制

从目前测试的linux 网卡bond来看，似乎不靠谱，经查是我这里测试的方式不太对，不能通过ifconfig eth1 down的方式。

补充：

bonding mode=1 miimon=100。miimon是用来进行链路监测的。 比如:miimon=100，那么系统每100ms监测一次链路连接状态，如果有一条线路不通就转入另一条线路；
mode的值表示工作模式，他共有0-6七种模式，常用的为0,1,6三种。
mode=0：平衡负载模式，有自动备援，但需要"Switch"支援及设定。
mode=1：自动备援模式，其中一条线若断线，其他线路将会自动备援。
mode=6：平衡负载模式，有自动备援，不需要"Switch"支援及设定。
mode=0 (balance-rr)

Round-robin policy: Transmit packets in sequential order from the first available slave through the last.
This mode provides load balancing and fault tolerance.

mode=1 (active-backup)

Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if,
the active slave fails. The bond’s MAC address is externally visible on only one port (network adapter) to
avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode.

mode=2 (balance-xor)

XOR policy: Transmit based on [(source MAC address XOR'd with destination MAC address) modulo slave count].
This selects the same slave for each destination MAC address. This mode provides load balancing and fault tolerance.

mode=3 (broadcast)
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.

mode=4 (802.3ad)
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings.
Utilizes all slaves in the active aggregator according to the 802.3ad specification. Pre-requisites: 1.
Ethtool support in the base drivers for retrieving
the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link
aggregation. Most switches will require some type of configuration to enable 802.3ad mode.

mode=5 (balance-tlb)
Adaptive transmit load balancing: channel bonding that does not require any special switch support.
The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave.
Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC
address of the failed receiving slave. Prerequisite: Ethtool support in the base drivers for retrieving the speed of each slave.

mode=6 (balance-alb)
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not
require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver
intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address
with the unique hardware address of one of the slaves in the bond such that different peers use different
hardware addresses for the server.
复制

总结：

1. oracle rac环境，心跳冗余建议用mode=1，不建议使用0或6以及其他模式;例如使用mode=6可能导致vip飘逸的情况出现。

2. 测试网卡绑定效果，不能使用ifconfig down的方式，只能通过插拔网线来实现。应该ifconfig down操作以后，
该网卡信息会被从/etc/sysconfig/network-scripts/ifcfg-bond0 中清除掉。
进而导致crs 节点被驱逐。

3. 其他平台如aix 可以使用ether channel，hpux可以使用APA 进行绑定。
4. 从11.2.0.2开始，支持HAIP，当然,仍然是支持os级别的bond等技术。

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

oracle 10gR2 rac for Linux--心跳网卡冗余配置和测试

评论

相关阅读