Sunday, April 11, 2010

Flash Recovery Area full - ORA-19815

I got alerted over the weekend from a client site with the message "ORA-19815: WARNING: db_recovery_file_dest_size.."

It turns out that there was a new application release going in over the weekend so they had decided to suspend the database backups (without telling me).

This is a fairly easy fix, but it took a while due to the number of archived logs.

In case you're not familiar with the Flash Recovery Area (FRA), the location and size is defined in the init.ora file. Oracle will keep archived logs and disk backups there for as long as it can while maintaining the space. For example, if you give it 200Gb, Oracle will keep old archived logs in the FRA (in case a recovery is needed - doesn't have to go to tape) unless new ones need the space, in which case it will delete them if they have already been backed up. You will see messages in the alert log to the affect "Deleted Oracle managed file....".

What happened this weekend is that because the backups hadn't run, the FRA filled up and there were no old logs that hadn't been backed up for Oracle to delete, so the messages were issued, and the database hangs.

To allow the database to continue, the first thing to do is to issue the command "ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=400G scope=both;" - making sure the 400G is larger than the current setting (show parameter recovery).

This will start the database writing new archived logs again, but the disk will fill up and you'll be back to where you started unless you backup the existing archived logs.

Connect to rman, and run an archived log backup. If the site uses a media manager like Netbackup you'll need to find the parameters, they should be documented (yeah, right!), or find an existing backup script which should have them. If you can't find them you'll need to ask the Netbackup admin for them, but it's something you should know if you support a site.

rman catalog rman/xxxxx@rcat10.world

RMAN> connect target

connected to target database: PROD (DBID=99999999)


RMAN> run {
2> sql 'alter system archive log current';
3> ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE' ;
4> SEND 'NB_ORA_POLICY=Oracle_PROD, NB_ORA_SERV=nbu_server01';
5> BACKUP
6> filesperset 200
7> FORMAT 'al_%s_%p_%t'
8> ARCHIVELOG ALL;
9> RELEASE CHANNEL ch00;
10> }

sql statement: alter system archive log current

allocated channel: ch00
channel ch00: sid=352 devtype=SBT_TAPE
channel ch00: VERITAS NetBackup for Oracle - Release 6.0 (2008012304)

sent command to channel: ch00

Starting backup at 2010-04-11 17:48:12
current log archived
channel ch00: starting archive log backupset
channel ch00: specifying archive log(s) in backup set
input archive log thread=1 sequence=130197 recid=212695 stamp=715864266
input archive log thread=1 sequence=130198 recid=212696 stamp=715864428
input archive log thread=1 sequence=130199 recid=212697 stamp=715864599

etc.

Once the backup has been completed, these logs are flagged as having been backed up by RMAN so Oracle will consider them for deletion. However, this won't happen until you reset the DB_RECOVERY_FILE_DEST_SIZE back to what it was, otherwise Oracle thinks there's still plenty of space.

SQL> ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=300G scope=both;

Once this is done, the delete messages will show up in the alert log:

Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130013_5vwfhln1_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130014_5vwflfy1_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130015_5vwfo4fy_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130016_5vwfr3v9_.arc

It's a good idea to tell the site that it's better practice to reschedule rather than suspend backups, or at least let the archived log backups run because they won't impact the database files.