Thursday, April 8, 2010

RMAN duplicate - RMAN-06025 errors

We do RMAN duplicates of some PROD databases everyday, and for the most part they are pretty reliable.

Recently we started to get these errors:

RMAN-06025: no backup of log thread 1 seq 51996 lowscn 5982564789686 found to restore

Looking it up, we found that the what the Duplicate process does is restore the database files from the backup, connect to the live PROD database to find the latest archived log, then restore the differential logs and apply them. The problem is that the backup finishes around 02:00 but the Duplicates start around 05:00. I don't know why it's starting to happen, maybe there is more activity on the database in the early hours than there used to be so it's generating logs where it didn't before, but obviously the logs aren't backed up, so the error message is issued.

I looked into fixing it up and had a few thoughts. I could put a "SET UNTIL TIME" clause into the Duplicate script, but since the backups could finish at any time I couldn't guarantee that I wouldn't hit the same problem. I could also use a "SET UNTIL SCN" clause, but I would need to amend the script to connect to the PROD database, run a query to determine the SCN of the last backed up archive log, feed that to a variable and pass it to the Duplicate script.

I was about to sit down and write it when I decided there was a better, and a lazier way: I changed the time of the Duplicate job to start soon after the backup completed. Although not an exact science, it's good enough that it hasn't thrown up the RMAN-06025 errors since I changed it 3 weeks ago.

The users don't need up-to-the-minute copies of the databases, so a few hours older makes no difference.

We were also hitting another error which we couldn't figure out. We do Duplicates of the one PROD database to a few different databases on different servers. In one of the logs we were seeing error messages with the names of different database files. For example, on server1 we duplicate PROD to dup1, and on server2 we duplicate PROD to dup2. On server2 we were seeing errors in the log to the effect

"'/u03/oradata/dup1/dup1_data01.dbf' directory doesn't exist"

Which meant that even though the Duplicate script on server2 specifically set AUXNAME to

"/u04/oradata/dup2/dup2_data01.dbf" etc

the job was trying to restore the dup1 file.

We could only surmise that there were 2 Duplicate jobs running around the same time, and RMAN was getting confused as to which files were supposed to be restored where. We rescheduled one of them an hour apart and it fixed it up.

RMAN obviously doesn't keep the script entirely in memory, and if there are 2 Duplicates going at the same time against the same source database it gets confused.

No comments:

Post a Comment