一、问题概述
刚安装的Oracle RAC数据库通过vip能正常登录
[oracle@zdb001 ~] sqlplus system/********@10.*.*.202:1521/ora19cdb
SQL*plus: Release 19.0.0.0.0- Production on Mon Nov 13 16:00:03 2023
Version 19.21.0.0.0
copyright(c)1982,2022,oracle. A11 rights reserved.
Last successful login time: Mon Nov 13 2023 15:56:38 +08:00
connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0
SQL>
但通过scan ip登录失败
[oracle@zdb001 ] sqlplus system/********@10.*.*.205:1521/ora19cdb
SQL*pTus: Release 19.0.0.0.0- Production on Mon Nov 13 16:00:13 2023
version 19.21.0.0.0
copyright(c)1982,2022,orac1e. A11 rights reserved.
ERROR :ORA-12514: TNs:1istener does not currently know of service requested in connectdescriptor
Enter user-name:
二、问题分析
(1)检查scan listener状态
检查scan listener状态,3个都正常
查看监听状态,发现监听中没有注册service
[gird@zdb002 ~] lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: version 19.0.0.0.0 - Production on 13-NOV-2023 16:07:16
copyright(c)1991,2023,oracle.A11 rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Linux:Version 19.0.0.0.0- Production
start Date 13-NOV-2023 14:13:14
Uptime 0 days 1 hr. 54 min. 1 sec
Trace Leve off
security ON: Local os Authentication
SNMP OFF
Listener Parameter File /u01/app/19c/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tns1snr/zdb002/1istener_scan1/alert/1og.xml
Listening Endpoints summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=iPC)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tCp)(HOST=10.*.*.205)(PORT-1521)))
The listener supports no services
The command completed successfu11y
(2)检查listener参数
检查local_listener和remote_listener参数配置,都正常。
SQL> show parameter listener
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
forward_listener string
listener_networks string
local_listener string (ADDRESS=(PROTOCOL=TCP)(HOST=
10.*.*.202)(PORT=1521))
remote_listener string zdbscan:1521
(3)解析scan name
解析scan name(zdbscan)发现解析不通。
[root@zdb001 ~]# nslookup zdbscan
;; Got SERVFAIL reply from 192.*.*.12, trying next server
Server: 192.*.*.19
Address: 192.*.*.19#53
** server can't find zdbscan: SERVFAIL
(4)检查域名配置
[root@zdb001 ~]# cat /etc/resolv.conf
nameserver 192.*.*.12
nameserver 192.*.*.19
最终发现原因是/etc/resolv.conf里的配置被覆盖了
之所以说是被覆盖,是因为与安装时的配置不一致,初始安装时的配置如下:
[root@zdb001 ~]# cat /etc/resolv.conf
search *****group.com
nameserver 192.*.*.12
nameserver 192.*.*.19
三、环境介绍
这里对该RAC环境做一下介绍,scan通过DNS解析3个scan ip
[root@zdb001 ~]# nslookup zdbscan.*****group.com
Server: 192.*.*.12
Address: 192.*.*.12#53
Name: zdbscan.*****group.com
Address: 10.*.*.207
Name: zdbscan.*****group.com
Address: 10.*.*.205
Name: zdbscan.*****group.com
Address: 10.*.*.206
同时为了使用方便,提供通过DNS短名进行访问。
/etc/resolv.conf文件中的search的作用是补全要访问的短域名,有时候域名太长,可以做一个短域名,但是DNS解析需要的是长名,而在resolv.conf中设置search能进行补全。
正确的域名解析顺序是:
- 查找/etc/hosts
- 根据nameserver查找域名
- 如果在nameserver查找不到域名就进行search补全,重新走1~2步
实际的效果如下,可以看到zdbscan等价于zdbscan.*****group.com,这就是被覆盖的search条目的作用。
[root@zdb001 ~]# cat /etc/resolv.conf
search *****group.com
nameserver 192.*.*.12
nameserver 192.*.*.19
[root@zdb001 ~]#
[root@zdb001 ~]# nslookup zdbscan
Server: 192.*.*.12
Address: 192.*.*.12#53
Name: zdbscan.*****group.com
Address: 10.*.*.206
Name: zdbscan.*****group.com
Address: 10.*.*.207
Name: zdbscan.*****group.com
Address: 10.*.*.205
四、进一步分析
接下来要找重启覆盖的原因。在此之前,集群进行了高可用测试,涉及到主机的重启,那么/etc/resolv.conf很可能是重启后被覆盖的。
在网上查找,案例基本上都是因为NetworkManager服务导致,NetworkManager是一个动态管理网络连接的工具,可以自动检测和配置网络连接,可能会修改/etc/resolv.conf文件。
但NetworkManager服务已关闭。
● NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:NetworkManager(8)
在检查网卡配置的时候,发现DNS1/DNS2两行配置比较可疑,注释后重启主机测试,/etc/resolv.conf未被覆盖。
[root@zdb001 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
TYPE=Bond
BOOTPROTO=none
DEVICE=bond0
ONBOOT=yes
IPADDR=10.*.*.201
PREFIX=16
GATEWAY=10.*.*.1
DNS1=192.*.*.12
DNS2=192.*.*.19
BONDING_MASTER=yes
BONDING_OPTS="miimon=100 mode=1"
五、问题原因
在找到是网卡配置的原因后,最终在mos上找到了同样的案例,/etc/resolv.conf keep restoring after reboot (Doc ID 1511082.1)
摘取文档中的关键部分:
Briefly, the above codes mean that: if we don’t set PEERDNS=no (by default, PEERDNS and RESOLV_MODS are null) in the network interface configuration file(/etc/sysconfig/network-scripts/ifcfg-xxx), system will check the DNS1 and DNS1 values of /etc/sysconfig/network-scripts/ifcfg-xxx, if these values are set, ifup-post script will replace the /etc/resolv.conf with DNS1 and DNS2, then create a backup file which names /etc/resolv.conf.save.
翻译:
如果在网络接口配置文件(/etc/sysconfig/network-scripts/ifcfg-xxx)中没有设置PEERDNS=no(默认PEERDNS和RESOLV_MODS为null),系统会检查/etc/sysconfig/network-scripts/ifcfg-xxx中DNS1和DNS1的值,如果设置了这些值,ifup-post脚本会将/etc/resolv.conf替换为DNS1和DNS2,然后创建一个名为/etc/resolv.conf.save的备份文件。
原因是网卡启动时,ifup-post执行的替换动作。给出的解决方案为在网卡配置中增加:PEERDNS=no
截取/etc/sysconfig/network-scripts/ifup-post脚本中的部分内容,可以看到的确有用DNS1/DNS2参数去覆盖/etc/resolv.conf文件的动作
而且从ifup-eth脚本中可以看到,ifup-post脚本在所有网卡启动后执行,这就是/etc/resolv.conf文件内容被覆盖的根本原因。
[root@zdb001 ~]# tail -10 /etc/sysconfig/network-scripts/ifup-eth
echo $" failed."
if [ "${dhcpipv4}" = "good" -o -n "${IPADDR}" ]; then
net_log "Unable to obtain IPv6 DHCP address ${DEVICE}." warning
else
exit 1
fi
fi
fi
exec /etc/sysconfig/network-scripts/ifup-post ${CONFIG} ${2}
六、解决方案
域名服务器通过在/etc/resolv.conf文件中指定即可,网卡配置中不需要。
所以,修改网卡配置,删除DNS1/DNS2配置条目即可。
七、参考文档
/etc/resolv.conf keep restoring after reboot (Doc ID 1511082.1)