I have implemented a couple of DataGuard environments, and most of the time they tick along just fine. However they do sometimes break, and this is a brief guide to some of the errors I've encountered and how to fix them.
Data Guard Trouble shooting.
Dataguard Broker errors (I’ve removed the text to just show the errors):
DGMGRL> show configuration;
.
.
Warning: ORA-16607: one or more databases have failed
This is a general error showing there is a problem with either the primary or secondary database.
To determine which is causing the issue:
DGMGRL> show database verbose PROD_PRIM;
.
.
Error: ORA-16778: redo transport error for one or more databases
This indicates the logs are not getting to the secondary database.
DGMGRL> show database PROD_SBY;
.
.
Error: ORA-12541: TNS:no listener
This shows the listener on the secondary server is not started.
DGMGRL> show database PROD_SBY;
.
.
Error: ORA-16766: Redo Apply unexpectedly offline
This shows that the redo apply has not been enabled. You need to connect to the PROD_SBY database and issue the
“ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;” command.
DGMGRL> show database PROD_SBY;
.
.
Warning: ORA-16826: apply service state is inconsistent with the DelayMins property
This shows that the redo apply command was issued without the “NODELAY” parameter. You need to connect to the PROD_SBY database, stop the redo apply and then re-enable it with the “NODELAY” parameter:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;
Database altered.
Messages may appear in the alert log on the primary:
PING[ARC1]: Error 16146 when pinging standby EAM_SBY
ARC3: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16040)
ARC3: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /u01/app/oracle/admin/PRD/bdump/prd_arc3_10159.trc:
ORA-16040: standby destination archive log file is locked.
ORA-16011: Archivelog Remote File Server process in Error state
FAL[server, ARC3]: Error 16011 creating remote archivelog file 'EAM_SBY'
FAL[server, ARC3]: FAL archive failed, see trace file.
Wed Feb 18 09:10:33 2009
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Wed Feb 18 09:10:33 2009
ORACLE Instance PROD_PRIM - Archival Error. Archiver continuing.
And on the standby:
RFS[240]: Possible network disconnect with primary database
Aborting archivelog file creation:
/s01/oraarch/PROD_SBY/PROD_PRIM19019645361152.arc
If this a network disconnect, then this archivelog will be fetched again
by GAP resolution mechanism.
Not sure why this happens – maybe network activity causing time-outs, or some kind of timing conflict. It seems to resolve itself. Test it by “alter system switch logfile;” on the primary and making sure the new log gets applied on the standby. If it does, all is OK. If not, you may need to re-build the standby or do more investigation.