Thursday, November 13, 2014

Resolving a CDC process due to a missing log file - WAITING FOR DICTIONARY REDO

CDC on a database was stopped and restarted twice a day by an automated process. Apparently this was because it would send false-positive SMS alerts otherwise.

There must be a long-running transaction, because every time it restarted, it started to mine the same logfile (seq# 252207). This went on for a couple of days, until eventually this log was deleted by a clean-up job.

Then the CDC failed with this:

State        : WAITING FOR DICTIONARY REDO: FILE /apps/oracle/admin/PROD/PROD_arch/PROD_1_708192618_0000252207.arc

Luckily this file was still on ASM disk, so I copied it back down to the file system, and it started to mine it.

However, for some reason it wouldn't move on, it just sat there, even though the other files (252208, 252209) were also there on disk.

In the meantime, the older logs were getting deleted by the clean up job. I identified the logs that were missing from disk (252210 – 252243) copied what I could from ASM, and also had to restore some from tape.

The destination file system didn't have enough space, so I had to borrow some and moved them to a staging file system (I'll call it /cdc_arch_staging)

I then registered them with the database and voila – the mining kicked in.

I still don’t know why it just stopped after I’d copied 252207 – 252210.

Here is a summary of the commands I used:

To copy files from ASM:

for i in $(asmcmd ls +ARCH01/PROD/ARCHIVELOG/2014_11_04); do
  asmcmd cp +ARCH01/PROD/ARCHIVELOG/2014_11_04/$i /cdc_arch_staging/PROD/arch
done

Note that this copied them to the file system with names like “thread_1_seq_252369.6617.862722301”, and I had to rename them to “PROD_1_708192618_0000252369.arc” with a combination of awk and manually editing.


I had to restore archived logs that had been deleted from ASM by the clean up job:

connect target /
connect catalog rman/xxx@RMAN;
run {

set archivelog destination to '/cdc_arch_staging/PROD/arch';

  allocate channel T1 type SBT_TAPE;
  allocate channel T2 type SBT_TAPE;
  allocate channel T3 type SBT_TAPE;
  allocate channel T4 type SBT_TAPE;

restore (archivelog from logseq=252208 until logseq=252423);
}


Registering files with CDC:

alter database register or replace logical logfile '/cdc_arch_staging/PROD/arch/PROD_1_708192618_0000252491.arc' for 'CDC$C_CS_IEE';
alter database register or replace logical logfile '/cdc_arch_staging/PROD/arch/PROD_1_708192618_0000252492.arc' for 'CDC$C_CS_IEE';
alter database register or replace logical logfile '/cdc_arch_staging/PROD/arch/PROD_1_708192618_0000252493.arc' for 'CDC$C_CS_IEE';

I created a script with awk to do that, using an 'ls' to list the logs to a file and then wrapping the command around it.


Once I’d run that, I checked the alert log and it was mining away.