暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ORA-16191 Primary log shipping client not logged on standby

原创 不吃草的牛_Nick 2023-02-08
2149

oerr ora 16191
16191, 0000, "Primary log shipping client not logged on standby"
// *Cause: An attempt to ship redo to standby without logging on
// to standby or with invalid user credentials.
// *Action: Check that primary and standby are using password files and that
// both primary and standby have the same SYS password.
// Restart primary and/or standby after ensuring that
// password file is accessible and REMOTE_LOGIN_PASSWORDFILE
// initialization parameter is set to SHARED or EXCLUSIVE.


set line 300
col dest_name for a30
col status for a20
select dest_name,status,error from v$archive_dest where rownum<5;
-- ORA-16191: Primary log shipping client not logged on standby


DATAGUARD的数据传输方式有两种:LGWR和ARCH。其中前者传送的是联机日志的redo,后者传输的是归档。
FAL是9i推出的DATAGUARD的一大改进,它可以自动传送备库缺失的日志,一般情况下,它是不需要手工干预的。

主库的日志要传输到备库,两者的密码文件中设置的口令必须一致。然而,如果我们不小心设置了不一致的密码,
则需要重新生成备库的密码文件,但重置密码文件后,FAL还并不能自动处理日志的GAP,需要DBA做一些额外的工作。


一、ARCH方式传送日志
--主库参数
show parameter dest_2 SERVICE=standby LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=standby


--在备库上查询,用LGWR传送日志,如果备库没有standby redo log,则oracle自动会变成用ARCH传送归档。
select * from v$standby_log;
--no rows selected


为了模拟主备库密码文件不一致,我们重新生成备库密码文件,并置一个与主库不一样的密码。
su - oracle
mv orapwprimary orapwprimary.bak
orapwd file=orapwprimary password=aaa entries=10


查看主库的alert日志,可以发现下面的信息:

Tue Jul 10 22:10:38 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc1_3338.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Tue Jul 10 22:10:38 2007
ORACLE Instance primary - Archival Error. Archiver continuing.
Tue Jul 10 22:10:47 2007
FAL[server]: Fail to queue the whole FAL gap
GAP - thread 1 sequence 68-68
DBID 1463588919 branch 626106231
Tue Jul 10 22:15:50 2007
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------
Tue Jul 10 22:15:50 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc1_3338.trc:
ORA-16191: Primary log shipping client not logged on standby
PING[ARC1]: Heartbeat failed to connect to standby 'standby'. Error is 16191.


从上面的信息不难看出,oracle的确换成了ARCH的方式传送日志,并且因为密码文件的原因不能传送到备库中。
查看备库的日志,也可以发现类似如下的信息:
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 68-68
DBID 1463588919 branch 626106231
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------


在备库还原密码文件,并重启备库:
su - oracle
rm orapwprimary
mv orapwprimary.bak orapwprimary

shutdown immediate.
startup mount
alter database recover managed standby database disconnect from session;


主库enable
$ORACLE_HOME/bin/sqlplus sys/xxxxxx@ces as sysdba<<EOF
alter system set log_archive_dest_state_2=enable;
alter system set log_archive_dest_state_2=defer;
alter system set log_archive_dest_state_2=enable;
alter system archive log current;
exit;
EOF


分别查看主、备库alert:
--主库
Tue Jul 10 22:41:57 2007
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************

--备库
Tue Jul 10 22:31:57 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[5]: Assigned to RFS process 16058
RFS[5]: Identified database type as 'physical standby'
Primary database is in MAXIMUM PERFORMANCE mode
Primary database is in MAXIMUM PERFORMANCE mode
RFS[5]: No standby redo logfiles created
Tue Jul 10 22:32:26 2007
Media Recovery Log /u01/archivelog/1_70_626106231.dbf
Media Recovery Waiting for thread 1 sequence 71 (in transit)


二、LGWR方式传送日志
为了让oracle可以用LGWR方式传送日志,我们先在备库上建立几个standby redo log。
alter database add standby logfile group 4 '/u01/oracle/oradata/primary/standbyredo04.log' size 50m;
alter database add standby logfile group 5 '/u01/oracle/oradata/primary/standbyredo05.log' size 50m;
alter database add standby logfile group 6 '/u01/oracle/oradata/primary/standbyredo06.log' size 50m;
alter database add standby logfile group 7 '/u01/oracle/oradata/primary/standbyredo07.log' size 50m;


重新生成一个与主库不一致的密码文件:
su - oracle
mv orapwprimary orapwprimary.bak
orapwd file=orapwprimary password=aaa entries=10


重新生成一个密码文件后,必须重启备库才能模拟错误(不重启的话是不会影响日志传输的)
shutdown immediate
startup mount
alter database recover managed standby database disconnect from session;


主库alert日志报错如下:
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------
Tue Jul 10 23:06:30 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc0_3457.trc:
ORA-16191: Primary log shipping client not logged on standby
PING[ARC0]: Heartbeat failed to connect to standby 'standby'. Error is 16191.


备库alert报错:
Tue Jul 10 22:53:06 2007
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 79-79
DBID 1463588919 branch 626106231
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------


在备库还原密码文件,并重启备库:
su - oracle
rm orapwprimary
mv orapwprimary.bak orapwprimary

shutdown immediate.
startup mount
alter database recover managed standby database disconnect from session;


主库enable
$ORACLE_HOME/bin/sqlplus sys/xxxxxx@ces as sysdba<<EOF
alter system set log_archive_dest_state_2=enable;
alter system set log_archive_dest_state_2=defer;
alter system set log_archive_dest_state_2=enable;
alter system archive log current;
exit;
EOF


1、ARCH方式传送日志时,如果主备库密码文件不一致,在重新设置密码文件后,必须重启主库才能使日志传送恢复正常。
2、LGWR方式传送日志时,如果主备库密码文件不一致,在重新设置密码文件后,只需可以重启主库或重启备库就可使日志传送恢复正常。


---------------------
ORA-16191 ORA-01017 and ORA-16000 shipping the logs to Standby database (文档ID 2129339.1)

正好这个环境这几天加了个登录审计的trigger,难道真是这个trigger影响?
怪不得刚刚重启备库的时候,备库alertlog中有ORA-16000: database open for read-only access报错。
Thu Jul 26 17:52:53 2018
Errors in file /oracle/diag/rdbms/PRODx/PROD1/trace/PROD1_ora_8978826.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
ORA-06512: at line 29


停备库,在参数文件中增加 _system_trig_enabled=false。起备库。
startup mount
alter database open
alter database recover managed standby database using current logfile disconnect from session;


查看alertlog,日志已经开始传输了。
Fri Jul 27 09:43:05 2018
alter database recover managed standby database using current logfile disconnect from session
Attempt to start background Managed Standby Recovery process (PROD1)
Fri Jul 27 09:43:05 2018
MRP0 started with pid=62, OS id=11796536
MRP0: Background Managed Standby Recovery process started (PROD1)
Fri Jul 27 09:43:10 2018
Archived Log entry 6489 added for thread 1 sequence 76934 rlc 801185643 ID 0xf2d8f210 dest 2:
RFS[6]: Opened log for thread 1 sequence 76949 dbid -395156247 branch 801185643
started logmerger process
Fri Jul 27 09:43:10 2018
Managed Standby Recovery starting Real Time Apply


深度分析
还记得背景分析那部分让大家关注报错时间么?有没有发现主库节点一和节点二的报错时间并不一致?
主库节点一:
Thu Jul 26 01:21:29 2018
主库节点二:
Thu Jul 26 10:00:32 2018

查看主库trigger创建时间
竟然跟上面两个时间都不一致?到底是为什么呢?

SR工程师给了如下解释:
From primary alert logs, we can see before it reported Error 1017. They were using standby redo log to do real time apply, but when applying service was not able to keep up with redo rate, they automatically went to the archive log files to do recovery on standby. Then this time they found issue to transport the archive logs to standby and reported error. Please refer to doc "Data Guard Real-Time Apply FAQ ( Doc ID 828274.1 )" for more details.
===============================================
What will happen if Log apply Service cannot cope with the Real Time Apply ?

If for some Reason the Apply Service is unable to keep up with the Redo Rate then the Apply Service it will automatically go to the Archive log Files as needed.

However, the Apply Service will automatically resume Real-Time Apply (reading the Standby Redo log Files) as soon as the Problem is resolved
===============================================

因为我们主备库之间同步用的是“standby redo log to do real time apply”,只有在备库apply赶不上redo的速度的时候,才会去找归档日志,此时才会触发ORA-16191的报错,所以才造成了上面三个节点报错时间不一致。

---------------------

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论