Yet Another Oracle DBA Blog: Oracle DataGuard Trouble Shooting

I have implemented a couple of DataGuard environments, and most of the time they tick along just fine. However they do sometimes break, and this is a brief guide to some of the errors I've encountered and how to fix them.

Data Guard Trouble shooting.

Dataguard Broker errors (I’ve removed the text to just show the errors):

DGMGRL> show configuration;
.
.
Warning: ORA-16607: one or more databases have failed

This is a general error showing there is a problem with either the primary or secondary database.

To determine which is causing the issue:

DGMGRL> show database verbose PROD_PRIM;
.
.
Error: ORA-16778: redo transport error for one or more databases

This indicates the logs are not getting to the secondary database.

DGMGRL> show database PROD_SBY;
.
.
Error: ORA-12541: TNS:no listener

This shows the listener on the secondary server is not started.

DGMGRL> show database PROD_SBY;
.
.
Error: ORA-16766: Redo Apply unexpectedly offline

This shows that the redo apply has not been enabled. You need to connect to the PROD_SBY database and issue the
“ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;” command.

DGMGRL> show database PROD_SBY;
.
.
Warning: ORA-16826: apply service state is inconsistent with the DelayMins property

This shows that the redo apply command was issued without the “NODELAY” parameter. You need to connect to the PROD_SBY database, stop the redo apply and then re-enable it with the “NODELAY” parameter:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;

Database altered.

Messages may appear in the alert log on the primary:

PING[ARC1]: Error 16146 when pinging standby EAM_SBY
ARC3: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16040)
ARC3: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /u01/app/oracle/admin/PRD/bdump/prd_arc3_10159.trc:
ORA-16040: standby destination archive log file is locked.

ORA-16011: Archivelog Remote File Server process in Error state
FAL[server, ARC3]: Error 16011 creating remote archivelog file 'EAM_SBY'
FAL[server, ARC3]: FAL archive failed, see trace file.
Wed Feb 18 09:10:33 2009
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Wed Feb 18 09:10:33 2009
ORACLE Instance PROD_PRIM - Archival Error. Archiver continuing.

And on the standby:

RFS[240]: Possible network disconnect with primary database
Aborting archivelog file creation:
/s01/oraarch/PROD_SBY/PROD_PRIM19019645361152.arc
If this a network disconnect, then this archivelog will be fetched again
by GAP resolution mechanism.

Not sure why this happens – maybe network activity causing time-outs, or some kind of timing conflict. It seems to resolve itself. Test it by “alter system switch logfile;” on the primary and making sure the new log gets applied on the standby. If it does, all is OK. If not, you may need to re-build the standby or do more investigation.

Wednesday, May 5, 2010

Oracle DataGuard Trouble Shooting