Yet Another Oracle DBA Blog: December 2013

Sunday, December 29, 2013

Working with Virtualbox - Installing Oracle 12c

If you read my last post (and I know you did), hopefully you'll have Oracle Virtualbox running an OEL6 virtual machine.

As an Oracle DBA, you'll probably want to install 12C to have a play, so go and download it from the Oracle edelivery website - and you'll need an ID. http://edelivery.oracle.com

Obviously the best place to look is the Oracle documentation - here's the Linux installation guide:

http://docs.oracle.com/cd/E16655_01/install.121/e17720/toc.htm

This page is a condensed version to (hopefully) get you up and running.

First, Oracle have very handily added a yum file that has all the necessary packages. Assuming you've added the public yum repository as in my previous post, type this to get it

yum install oracle-rdbms-server-12cR1-preinstall

This will install the required packages for the database, as well as create the oracle user, oinstall and dba groups etc (a neat trick).

Since this is an install on a VM, you can skip a lot of the pre-installation tasks.

Make the directories as root:

mkdir -p /u01
mkdir -p /u01/app
mkdir -p /u01/app/oracle
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle

Change the oracle user password

passwd oracle

Copy the 12c database software to a temporary directory and unzip it as oracle.

At this point I ran out of space in the VM so had to add a disk, so here's a slight detour on how to do that.

Shutdown the VM, then in Virtualbox go to Settings and select "Storage". Click on the "Controller" item and the "+" icon. Add a new disk with a reasonable size and restart the VM.

In an ideal world the disk would appear when you restarted the VM, but it's not that easy...

So, login as root, and do an "fdisk -l" and you should see the new disk, probably at "/dev/sdb".

Next, do "fdisk /dev/sdb" and you should see a command prompt. Enter "n" to create a new disk, then "p" to make it a primary partition. Choose "2", then enter, then enter at the cylinder prompts and it should come back to the command prompt. Enter 'w' to create it.

However, it needs to be formatted, so exit from the fdisk command if not already, then type

mkfs -t ext4 /dev/sdb

If it prompts you to proceed, enter 'y' (it's only a VM, what's the worst that could happen?)

Remove the "/u01 directory created earlier, and mount this new disk as /u01:

mkdir /u01
mount /dev/sdb /u01

It should be there:

df -h

Add the directories again

mkdir -p /u01/app
mkdir -p /u01/app/oracle
chown -R oracle:oinstall /u01/app/oracle
chmod -R 775 /u01/app/oracle

OK, back on track. Download Xming, because the installer uses Xwindows - http://sourceforge.net/projects/xming/

Install it and start the Xming server.

Open a CMD Prompt on your PC where Xming is running and find the IP address using 'ipconfig' - you may have a few show up, it's the one with "Virtualbox" in the name.

Next, sign on to the Linux VM as root and download the Xhost packages

yum install xhost

Enter 'y' when prompted.

Still as root, type export DISPLAY=ipaddress_from_command_prompt:0.0
Then, "xhost +"

You should see "access control disabled, clients can connect from any host"

Nearly there....

Still as root, type 'ifconfig' and note the IP address.
Edit the /etc/hosts file to look like this:

127.0.0.1 localhost.localdomain localhost
192.168.0.5 oel6.localdomain oel6

Where the second IP address is that found from the 'ifconfig' command and the 'oel6' is what you called your VM.

Save the file.

Log in to the VM as oracle, export the DISPLAY variable again, then cd to the place you extracted the software.

Run "./runInstaller"

Hopefully you'll see a few "Passed" messages, then the GUI will pop up. Let it think about itself for a minute, then you should see the first installer window.

You should be able to take it from here, because the responses will be specific to your set up.

Thursday, December 26, 2013

Working with Virtualbox - getting OEL6 up and running

Like most DBAs, I've used VMWare and VMWare Player for a while to create virtual machines as play pens to test out various Oracle versions, features and generally play around.

I've recently switched to using Oracle Virtualbox, and although I like it I have found it not as easy to get working as VMWare.

I've just created an OEL6 VM and had a few issues, especially around the networking side, so decided to document it. I've done some google searches for the resolution of the issues, so thanks to all the blogs I found.

First, get the latest copy of Oracle Virtual Box - https://www.virtualbox.org/wiki/Downloads

Next, get the latest version of Oracle Enterprise Linux from http://edelivery.oracle.com/linux - note you'll need an Oracle ID. If you don't have one, then get CentOS - http://www.centos.org/modules/tinycontent/index.php?id=30

This is a copy of RedHat, so is OK to install Oracle.

Once downloaded, open Virtualbox and create a new VM. Note that if you type "OEL" into the name field it will automatically select "Oracle" - be careful, I got caught out with this - use the drop-down box to select 'Oracle (64)' otherwise you'll get a message when you try to start it to this effect - "kernel requires an x86_64 cpu, but only detected an i686 cpu". Assuming you have a 64-bit CPU and I expect you do.

I selected the Database Server option, so no GUI desktop. When (if?) OEL6 installs, try a ping to google and see if you can connect to the internet. If you can, then congratulations, you did better than me.

If not, you'll need to go back to the Virtualbox settings and make sure the Network is set to "NAT". Assuming it is, you may need to reconfigure the network in the OS.

Type "ifconfig" and if you only see an entry for "lo" it means you'll have to edit a file.

cd to "/etc/sysconfig/network-scripts" and edit the "ifcfg-eth0" file to look like this:

DEVICE="eth0"
BOOTPROTO=dhcp
HWADDR= (leave this as it is)
NM_CONTROLLED="yes"
ONBOOT="yes"

Save it, then restart the network or the box - "service network restart" or "reboot".

If all goes OK, then try pinging Google again. Mine worked at this stage, if yours didn't then you'll need to look at someone else's blog.

Assuming you can now ping stuff, you'll want to get the latest packages. Also assuming you don't have a subscription to the Oracle Network, you need to set the yum repository to the public oracle one. This changes, so check http://public-yum.oracle.com/ for the latest info.

This page also tells you how to add the repository:

cd /etc/yum.repos.d
wget http://public-yum.oracle.com/public-yum-ol6.repo

This should download a file and then show "saved".

Next, update the packages using "yum update"

It should download the latest stuff. Depending on the version of OEL you installed, it could take a while, and you'll need to enter "y" a couple of times. Once it has downloaded the packages it will update and install them, which could also take a while.

When this has finished, it's probably a good idea to reboot the VM.

Next, you'll want to install the VirtualBox additions (think VMWare tools). You need to do some more yum installs (yeah, I know, thanks Oracle).

yum install kernel-uek-devel

And enter 'y' when prompted.

and

yum install gcc

Then set an environment variable:

KERN_DIR=/usr/src/kernels/`uname -r`-`uname-m`

Click on the menu item "Devices" and then on "Install Guest Additions CD Image". Nothing will happen, you need to mount it on the VM and run the installer.

cd /mnt
mkdir cdrom
mount /dev/cdrom /mnt/cdrom

cd /mnt/cdrom
ls

You should see a list of files. Type

./VBoxLinuxAdditions.run

Hopefully it will install. Mine issued a "FAILED" at the Windows System Drivers but I expected that.

Easy, hey?

Next you'll probably want to enable shared folders so you can copy files into the VM. Shutdown the guest first, then go to the settings, and at the bottom is the "shared folders" item. Click on that, then on the little folder icon with the "+" sign on it and browse to the folder you want to share and add it. Make it auto-mount and give full access. Restart the guest and if you do a "df -h" you should see it listed.

This is not for the VM, but for a 'bare-metal' Linux install. I've put it here because I had to do this a couple of times.

I installed OEL onto a spare PC, then connected an external HDD but got an 'unable to mount' message. To get it to work, I had to add another repository and install an ntfs package:

Go here : http://mirror.optus.net/epel/6/i386/repoview/epel-release.html
and download the "epel-release-6-8.noarch" file (or whichever is the latest). When the system prompts, opt to use the package installer to install it.

When that has been done, open a terminal as root and type

yum install ntfs-3g

This should install the required packages to allow the ntfs drive to be seen.

I'm still struggling with screen resolution, so I'll poke around to see if I have a solution to that.

Wednesday, December 18, 2013

Migrate a database from UFS to ASM

Earlier this month I posted a, er, post on migrating a Standby database from UFS to ASM. This week I had to migrate a 'normal' database to ASM from UFS. This was even easier than the standby, and here's how to do it.

Note, if the database is large, you should set block change tracking on. You can also perform a full backup the day before which will minimise the database down time.

Make a Level 0 backup of the database

rman
connect target /
run
{
      allocate channeL dev1 device type disk;
      allocate channel dev2 device type disk;
      allocate channel dev3 device type disk;
      allocate channel dev4 device type disk;
      allocate channel dev5 device type disk;
     allocate channel dev6 device type disk;
     backup as copy incremental level 0
     database format '+UATDB_DATA01' tag ‘ORA_ASM_MIGRATION_UATDB';
}

This will copy the database datafiles to the ASM disks.

If the database is large, nearer the migration time backup the database with a level 1:

connect target /

###
### If a new datafile is added after the last backup, you have run it again or the ASM switch will fail.
###
### select name, to_char(CREATION_TIME,'dd/mm hh24:mi' ) as CT from v$datafile where CREATION_TIME > sysdate - 1;
###

run
{
        allocate channeL dev1 device type disk;
        allocate channel dev2 device type disk;
        allocate channel dev3 device type disk;
        allocate channel dev4 device type disk;

        backup incremental level 1 for recover of copy datafilecopy FORMAT '+UATDB_DATA01'
        with tag 'ORA_ASM_MIGRATION_UATDB' database FORMAT '/apps/oracle/admin/UATDB/RMAN_BACK_DISK/%d.data.%U';

        recover copy of database with tag 'ORA_ASM_MIGRATION_UATDB';

}

This will update any changed files onto the ASM disks.

Stop the database.

Copy some files as a backup just in case

cp -p /oracle/product/10.2.0/db_2/dbs/spfileUATDB.ora /apps/oracle/admin/UATDB/SAVED_FILES/
cp -p /oracle/eu_data_1/data/UATDB/UATDB_ctrl_1.ctl /apps/oracle/admin/UATDB/SAVED_FILES/
cp -p /oracle/ora_redo/data/UATDB/redo_*.log /apps/oracle/admin/UATDB/SAVED_FILES/

Restart the database again in nomount mode

Set the create destinations

NOLOG> ALTER SYSTEM SET DB_CREATE_FILE_DEST='+UATDB_DATA01' SID='*';

System altered.

NOLOG> ALTER SYSTEM SET DB_CREATE_ONLINE_LOG_DEST_1 = '+UATDB_DATA01' SID='*';

System altered.

Restart the database in nomount mode

NOLOG> startup force nomount

Set the controlfile destination

NOLOG> ALTER SYSTEM SET CONTROL_FILES='+UATDB_DATA01' SCOPE=SPFILE SID='*';

System altered.

NOLOG> exit;

Now restore the controlfile to ASM:

RMAN> connect target /

connected to target database: UATDB (not mounted)

RMAN> RESTORE CONTROLFILE FROM '/oracle/eu_data_1/data/UATDB/UATDB_ctrl_1.ctl';

Starting restore at 19-DEC-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=254 devtype=DISK

channel ORA_DISK_1: copied control file copy
output filename=/oracle/eu_data_1/data/UATDB/UATDB_ctrl_1.ctl
Finished restore at 19-DEC-13

This will put the controlfile onto the ASM disk

Mount the database and run the command to switch the datafiles to use the ASM copies

RMAN> ALTER DATABASE MOUNT;

database mounted
released channel: ORA_DISK_1

RMAN> SWITCH DATABASE TO COPY;

datafile 41 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_idx03.341.834494307"
datafile 42 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_idx02.292.834493627"
datafile 43 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_tab03.419.834495445"
datafile 44 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_idx03.342.834494311"
datafile 45 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_idx02.293.834493627"
datafile 46 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/temp.415.834495405"
datafile 47 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/temp.416.834495429"
datafile 48 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_tab03.420.834495445"
datafile 49 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_idx02.267.834492661"
datafile 50 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/rpt_tab01.278.834493253"
datafile 51 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_tab02.268.834492687"
datafile 52 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_aud_tab01.528.834496019"
datafile 53 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_tab03.421.834495461"
datafile 54 switched to datafile copy "+UATDB_DATA01/UATDB_base/datafile/cis_tab02.307.834493799"

Once all these have been done, recover the database

RMAN> RUN
2> {
3> ALLOCATE CHANNEL dev1 DEVICE TYPE DISK;
4> ALLOCATE CHANNEL dev2 DEVICE TYPE DISK;
5> ALLOCATE CHANNEL dev3 DEVICE TYPE DISK;
6> ALLOCATE CHANNEL dev4 DEVICE TYPE DISK;
7> RECOVER DATABASE;
8> }

Finished recover at 19-DEC-13
released channel: dev1
released channel: dev2
released channel: dev3
released channel: dev4

RMAN> exit

Open the database

SQL> alter database open;

You also need to change the tempfiles

select NAME from v$tempfile;

ALTER DATABASE TEMPFILE '/oracle/129/data/UATDB/temp_01.dbf' DROP;
ALTER DATABASE TEMPFILE '/oracle/129/data/UATDB/temp_02.dbf' DROP;
ALTER DATABASE TEMPFILE '/oracle/129/data/UATDB/temp_03.dbf' DROP;
ALTER DATABASE TEMPFILE '/oracle/129/data/UATDB/temp_04.dbf' DROP;

ALTER TABLESPACE TEMP ADD TEMPFILE ;
ALTER TABLESPACE TEMP ADD TEMPFILE ;
ALTER TABLESPACE TEMP ADD TEMPFILE ;

Switch the logfiles a few times, then move the online redo log files with the following code

alter system switch logfile; - do it 3 or 4 times

Run this

SET SERVEROUTPUT ON;
DECLARE
CURSOR rlc IS
SELECT GROUP# GRP, THREAD# THR, BYTES, 'NO' SRL
FROM V$LOG
UNION
SELECT GROUP# GRP, THREAD# THR, BYTES, 'YES' SRL
FROM V$STANDBY_LOG
ORDER BY 1;
stmt VARCHAR2(2048);
BEGIN
FOR rlcRec IN rlc LOOP
IF (rlcRec.srl = 'YES') THEN
stmt := 'ALTER DATABASE ADD STANDBY LOGFILE THREAD ' ||
rlcRec.thr || ' SIZE ' || rlcRec.bytes;
EXECUTE IMMEDIATE stmt;
stmt := 'ALTER DATABASE DROP STANDBY LOGFILE GROUP ' || rlcRec.grp;
EXECUTE IMMEDIATE stmt;
ELSE
stmt := 'ALTER DATABASE ADD LOGFILE THREAD ' ||
rlcRec.thr || ' SIZE ' || rlcRec.bytes;
EXECUTE IMMEDIATE stmt;
BEGIN
stmt := 'ALTER DATABASE DROP LOGFILE GROUP ' || rlcRec.grp;
DBMS_OUTPUT.PUT_LINE(stmt);
EXECUTE IMMEDIATE stmt;
EXCEPTION
WHEN OTHERS THEN
EXECUTE IMMEDIATE 'ALTER SYSTEM SWITCH LOGFILE';
EXECUTE IMMEDIATE 'ALTER SYSTEM CHECKPOINT GLOBAL';
EXECUTE IMMEDIATE stmt;
END;
END IF;
END LOOP;
END;
/

This should move the log files

sys.UATDB> select member from v$logfile;

MEMBER
----------------------------------------------------------------------------------------------------
+UATDB_DATA01/UATDB_base/onlinelog/group_3.544.834579079
+UATDB_DATA01/UATDB_base/onlinelog/group_2.543.834579073
+UATDB_DATA01/UATDB_base/onlinelog/group_1.542.834579067
+UATDB_DATA01/UATDB_base/onlinelog/group_5.541.834579063

Stop and restart the database, and check all the files are now on ASM disk.

+ERCISU_DATA01/ercisu_base/datafile/rpt_idx01.538.834496023
+ERCISU_DATA01/ercisu_base/datafile/cis_idx01.525.834496015
+ERCISU_DATA01/ercisu_base/datafile/cis_tab01.509.834495999
+ERCISU_DATA01/ercisu_base/datafile/cis_tab03.256.834491477
+ERCISU_DATA01/ercisu_base/datafile/cis_idx03.261.834491477
+UATDB_DATA01/UATDB_base/onlinelog/group_3.544.834579079
+UATDB_DATA01/UATDB_base/onlinelog/group_2.543.834579073
+UATDB_DATA01/UATDB_base/onlinelog/group_1.542.834579067
+UATDB_DATA01/UATDB_base/onlinelog/group_5.541.834579063
+UATDB_DATA01/UATDB_base/controlfile/current.545.834579799

Maybe wait a day or two (and check all the last used dates) before deleting the old UFS files.

Taaa Daaaa.

Thursday, December 5, 2013

Migrating a standby database from UFS to ASM

I was tasked to migrate a standby database from a Unix File System to ASM. Never done it before. Nearly 30 years in IT and 20+ as an Oracle DBA and this was my first one.

It was relatively easy. The technique I used was an RMAN backup as copy, then a switch of the files. This assumes you already have ASM installed and a disk group available.

Here we go.

Make a note of the existing controlfile location:

select name from v$controlfile;

NAME
-----------------------------------
/oracle/eu_data_1/data/EMSBY/EMSBY_ctrl_1.ctl

Being a standby, the database is in mount mode:

sys.EMSBY> select open_mode from v$database;

OPEN_MODE

----------

MOUNTED

1 row selected.

These are some of the files:

sys.EMSBY> select name from v$datafile;

NAME
----------------------------------------------------------------------------------------------------
/oracle/eu_data_1/data/EMSBY/system_01.dbf
/oracle/eu_data_1/data/EMSBY/cis_idx03_67.dbf
/oracle/eu_data_1/data/EMSBY/undo_rbs_02.dbf

I need to stop the apply from the primary, so checked to see if it was using dg broker or not

sys.EMSBY> show parameter broker_s

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_start                      boolean     FALSE

Nope.

Turn off the apply:

sys.EMSBY> alter DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

Connect to RMAN and issue the backup as copy command, point the destination to the ASM disk.

rman

RMAN> connect target /

connected to target database: EMSBY (DBID=2614687418, not open)

RMAN> RUN
{
ALLOCATE CHANNEL dev1 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev2 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev3 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev4 DEVICE TYPE DISK;
BACKUP AS COPY
INCREMENTAL LEVEL 0
DATABASE
FORMAT '+EMSBY_DATA01'
TAG 'ORA_ASM_MIGRATION_EMSBY';
}

allocated channel: dev1
channel dev1: sid=114 devtype=DISK

allocated channel: dev2
channel dev2: sid=249 devtype=DISK

allocated channel: dev3
channel dev3: sid=120 devtype=DISK

allocated channel: dev4
channel dev4: sid=116 devtype=DISK

Starting backup at 05-DEC-13
channel dev1: starting datafile copy
input datafile fno=00036 name=/oracle/eu_data_1/data/EMSBY/cis_idx03_80.dbf
channel dev2: starting datafile copy
input datafile fno=00285 name=/oracle/eu_data_1/data/EMSBY/rpt_tab02_04.dbf
channel dev3: starting datafile copy
input datafile fno=00248 name=/oracle/eu_data_1/data/EMSBY/cis_idx03_70.dbf
channel dev4: starting datafile copy
input datafile fno=00035 name=/oracle/eu_data_1/data/EMSBY/cis_tab03_43.dbf
.
.
.
output filename=+EMSBY_DATA01/EMSBY_base/datafile/rpt_idx01.538.833376955 tag=ORA_ASM_MIGRATION_EMSBY recid=300 stamp=833376957
channel dev3: datafile copy complete, elapsed time: 00:00:04
output filename=+EMSBY_DATA01/EMSBY_base/controlfile/backup.539.833376955 tag=ORA_ASM_MIGRATION_EMSBY recid=298 stamp=833376956
channel dev4: datafile copy complete, elapsed time: 00:00:04
including current SPFILE in backupset
channel dev1: starting piece 1 at 05-DEC-13
channel dev1: finished piece 1 at 05-DEC-13
piece handle=+EMSBY_DATA01/EMSBY_base/backupset/2013_12_05/nnsnn0_ora_asm_migration_EMSBY_0.540.833376961 tag=ORA_ASM_MIGRATION_EMSBY comment=NONE
channel dev1: backup set complete, elapsed time: 00:00:03
Finished backup at 05-DEC-13
For record type DATAFILE COPY recids from 17 to 18 are re-used before resync
released channel: dev1
released channel: dev2
released channel: dev3
released channel: dev4

Just to be on the safe side, I Copied spfile, init.ora and control file to /apps/oracle/work/akh/EMSBY

Now logon to the database and do some fiddling:

Restart it in nomount mode:

/apps/oracle> sqlplus "/ as sysdba"

sys.EMSBY> startup force nomount;
ORACLE instance started.

Change the file creation destinations to the ASM disk group:

sys.EMSBY> ALTER SYSTEM SET DB_CREATE_FILE_DEST='+EMSBY_DATA01' sid='*';

System altered.

sys.EMSBY> ALTER SYSTEM SET DB_CREATE_ONLINE_LOG_DEST_1 = '+EMSBY_DATA01' sid='*';

System altered.

sys.EMSBY> ALTER SYSTEM SET CONTROL_FILES='+EMSBY_DATA01' scope=spfile sid='*';

System altered.

Exit, then connect to RMAN

/apps/oracle> rman target /

connected to target database: EMSBY (not mounted)

Restore the controlfile, using the location noted at the top:

RMAN> RESTORE CONTROLFILE FROM '/oracle/eu_data_1/data/EMSBY/EMSBY_ctrl_1.ctl';

This will put it into the ASM disk group:

Starting restore at 05-DEC-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=123 devtype=DISK

channel ORA_DISK_1: copied control file copy
output filename=+EMSBY_DATA01/EMSBY_base/controlfile/current.541.833378337
Finished restore at 05-DEC-13

Now mount the database (can be done within RMAN), and run the switch command. This will cause the database to start to use the ASM files:

RMAN> ALTER DATABASE MOUNT;

database mounted
released channel: ORA_DISK_1

RMAN> SWITCH DATABASE TO COPY;

datafile 1 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/system.476.833376833"
datafile 2 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/cis_idx03.263.833372371"
datafile 3 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/undo_rbs.488.833376881"
.
.
.
datafile 284 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/rpt_idx02.261.833372145"
datafile 285 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/rpt_tab02.256.833371465"
datafile 286 switched to datafile copy "+EMSBY_DATA01/EMSBY_base/datafile/rpt_tab02.483.833376865"

Now all the files are there, you need to RECOVER the database to apply all the logs that have been generated during the backup:

RMAN> run
{
ALLOCATE CHANNEL dev1 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev2 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev3 DEVICE TYPE DISK;
ALLOCATE CHANNEL dev4 DEVICE TYPE DISK;
RECOVER DATABASE;
}

allocated channel: dev1
channel dev1: sid=123 devtype=DISK

allocated channel: dev2
channel dev2: sid=254 devtype=DISK

allocated channel: dev3
channel dev3: sid=120 devtype=DISK

allocated channel: dev4
channel dev4: sid=116 devtype=DISK

Starting recover at 05-DEC-13

.
archive log thread 1 sequence 427121 is already on disk as file /apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427121.arc
archive log thread 1 sequence 427122 is already on disk as file /apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427122.arc
archive log thread 1 sequence 427123 is already on disk as file /apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427123.arc
archive log thread 1 sequence 427124 is already on disk as file /apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427124.arc
archive log filename=/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427014.arc thread=1 sequence=427014
.
.
.
archive log filename=/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427128.arc thread=1 sequence=427128
archive log filename=/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427129.arc thread=1 sequence=427129
unable to find archive log
archive log thread=1 sequence=427130
released channel: dev1
released channel: dev2
released channel: dev3
released channel: dev4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/05/2013 15:20:54
RMAN-06054: media recovery requesting unknown log: thread 1 seq 427130 lowscn 7619059397730

Note that you'll always get this error as it tries to find the latest log. If you check the alert log you'll see it's OK:

Media Recovery Log /apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427129.arc
ORA-279 signalled during: alter database recover logfile '/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427129.arc'...
Thu Dec 5 15:20:54 2013
alter database recover cancel
Thu Dec 5 15:20:58 2013
Media Recovery Canceled
Completed: alter database recover cancel

Log back into the database and check the file locations:

select name from v$datafile;

NAME
------------------------------------------------------------
+EMSBY_DATA01/EMSBY_base/datafile/cis_idx03.403.833375941
+EMSBY_DATA01/EMSBY_base/datafile/cis_idx02.459.833376693
+EMSBY_DATA01/EMSBY_base/datafile/cis_tab03.457.833376693
+EMSBY_DATA01/EMSBY_base/datafile/cis_tab02.333.833374681

sys.EMSBY> select name from v$controlfile;

NAME
------------------------------------------------------------
+EMSBY_DATA01/EMSBY_base/controlfile/current.541.833378337

1 row selected.

sys.EMSBY> select member from v$logfile;

MEMBER
----------------------------------------------------------------------------------------------------
/oracle/ora_redo/data/EMSBY/redo_04a.log
/oracle/ora_redo/data/EMSBY/redo_03a.log
/oracle/ora_redo/data/EMSBY/redo_02a.log
/oracle/ora_redo/data/EMSBY/redo_01a.log

4 rows selected.

Hmm, the redo logs are still on UFS. I need to move those as well.

sys.EMSBY> Select group#,status from V$LOG;

    GROUP# STATUS
---------- ----------------
         1 CLEARING
         4 CLEARING
         3 CLEARING_CURRENT
         2 CLEARING

sys.EMSBY> alter database clear logfile group 1;

Database altered.

sys.EMSBY> alter database clear logfile group 4;

Database altered.

sys.EMSBY> alter database clear logfile group 3;

Database altered.

sys.EMSBY> alter database clear logfile group 2;

Database altered.

sys.EMSBY> Select group#,status from V$LOG;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         4 UNUSED
         3 CLEARING_CURRENT
         2 UNUSED

4 rows selected.

sys.EMSBY> alter database drop logfile group 1;

Database altered

sys.EMSBY> alter database drop logfile group 2;

Database altered

sys.EMSBY> alter database drop logfile group 3;
alter database drop logfile group 3
*
ERROR at line 1:
ORA-01623: log 3 is current log for instance EMSBY (thread 1) - cannot drop
ORA-00312: online log 3 thread 1: '/oracle/ora_redo/data/EMSBY/redo_03a.log'

I'll come back to this

sys.EMSBY> alter database drop logfile group 4;
alter database drop logfile group 4
*
ERROR at line 1:
ORA-01567: dropping log 4 would leave less than 2 log files for instance EMSBY (thread 1)
ORA-00312: online log 4 thread 1: '/oracle/ora_redo/data/EMSBY/redo_04a.log'

OK, need to create the new ones first.

sys.EMSBY> alter database add logfile group 1 size 500m;

Database altered.

sys.EMSBY> alter database add logfile group 2 size 500m;

Database altered.

Back to the log group 3 that was in use:

sys.EMSBY> Select group#,status from V$LOG;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         4 UNUSED
         3 CLEARING
         2 CLEARING_CURRENT

4 rows selected.

sys.EMSBY> alter database clear logfile group 3;

Database altered.

sys.EMSBY> Select group#,status from V$LOG;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         4 UNUSED
         3 UNUSED
         2 CLEARING_CURRENT

4 rows selected.

Now it's UNUSED, it can be dropped and recreated:

sys.EMSBY> alter database drop logfile group 3;

Database altered.

sys.EMSBY> alter database add logfile group 3 size 500m;

Database altered.

Check the files:

sys.EMSBY> select member from v$logfile;

MEMBER
----------------------------------------------------------------------------------------------------
+EMSBY_DATA01/EMSBY_base/onlinelog/group_4.544.833384439
+EMSBY_DATA01/EMSBY_base/onlinelog/group_3.545.833384617
+EMSBY_DATA01/EMSBY_base/onlinelog/group_1.542.833384411
+EMSBY_DATA01/EMSBY_base/onlinelog/group_2.543.833384419

4 rows selected.

Looks OK, now to turn the log apply back on:

sys.EMSBY> alter system set standby_file_management=auto;

System altered.

sys.EMSBY> alter database recover managed standby database disconnect from session;

Database altered.

Go to PROD and switch some logs, and check the alert log on the standby:

sys.EMSBY> alter system switch logfile;

System altered.

sys.EMSBY> /

System altered.

sys.EMSBY> /

System altered.

Standby alert Log:

RFS[1]: Archived Log: '/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427132.arc'
Thu Dec 5 15:27:31 2013
RFS[1]: Archive log thread 1 sequence 427132 available in 1439 minute(s)
Thu Dec 5 15:27:32 2013
Media Recovery Delayed for 1439 minute(s) (thread 1 sequence 427130)
Thu Dec 5 15:27:32 2013
RFS[1]: Archived Log: '/apps/oracle/admin/EMSBY/arch/EMSBY_1_479644347_0000427131.arc'
Thu Dec 5 15:27:32 2013

Looks OK, all done!

Monday, December 2, 2013

GoldenGate Lag Issue

Someone recently implemented GoldenGate here, and it was happily chugging along when one day GG started issuing some nasty messages:

    2013-12-02 11:38:15 WARNING OGG-00947 Oracle GoldenGate Manager     for Oracle, mgr.prm: Lag for EXTRACT PRODP is 06:45:46     (checkpoint updated 00:00:04 ago).
    2013-12-02 11:38:15 WARNING OGG-00947 Oracle GoldenGate Manager     for Oracle, mgr.prm: Lag for EXTRACT PRODP is 06:45:46         (checkpoint updated 00:00:04 ago).

I haven't used GG much, so had to poke around in the documentation.

First off, finding where the files are. At this site, they are here :

    /apps/oracle/product/gg_11.2/ora10g_PRODP_PRODGP

Your mileage may vary. I don't know if there is a standard location, but going to the $ORACLE_HOME and navigating up to 'product' may help.

An 'ls -alrt' will show the latest log files to have a look at. The most obvious file is the error log:

    ggserr.log

This should show what the problem is. In my case it had some interesting entries:

    2013-12-02 13:41:29 WARNING OGG-01834 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Failed setting IPv6 socket to     dual stack mode (error: 99, Option not supported by protocol).
    2013-12-02 13:41:29 WARNING OGG-01834 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Failed setting IPv6 socket to     dual stack mode (error: 99, Option not supported by protocol).
    2013-12-02 13:41:29 WARNING OGG-01834 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Failed setting IPv6 socket to     dual stack mode (error: 99, Option not supported by protocol).
    2013-12-02 13:41:29 INFO    OGG-03035 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Operating system character     set identified as ISO-8859-1. Locale: en_AU, LC_ALL:.
    2013-12-02 13:41:30 WARNING OGG-00254 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: CACHEBUFFERSIZE 64KB is a     deprecated parameter.
    2013-12-02 13:41:30 INFO    OGG-01815 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Virtual Memory Facilities     for: COM     anon alloc: mmap(MAP_ANON) anon free: munmap
        file alloc: mmap(MAP_SHARED) file free: munmap
        target directories:
        /oracle/003/admin/PRODGP/PRODP_2_swap.
    2013-12-02 13:41:30 INFO    OGG-00996 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: REPLICAT aprodp_2 started.
    2013-12-02 13:41:30 INFO    OGG-01020 Oracle GoldenGate     Delivery for Oracle, aprodp_2.prm: Processed extract process     RESTART_ABEND record at seq 47832, rba 1051 (aborted 0 records).

At first glance, you might think "Aha! It's a network error with IPV6". But it's not. That's only a warning and (in this case at least) can be ignored.

The "RESTART_ABEND" also looks dodgy, but again, this is not the case. This line is telling you (and me) that GoldenGate was restarted after an abend.

I had to go back further into the log, and found this:

    2013-12-01 08:06:49 ERROR   OGG-00664 Oracle GoldenGate Capture for Oracle, PRODP_2.prm: OCI Error during OCIServerAttach (status = 1
    2547-ORA-12547: TNS:lost contact).
    2013-12-01 08:06:49 ERROR   OGG-01668 Oracle GoldenGate Capture for Oracle, PRODP_2.prm: PROCESS ABENDING.

This happened a few times, so while it is a network issue of some kind it tends to fix itself.

However, we were still getting the "Lag for EXTRACT PRODP is 06:45:46" messages, and the lag time was increasing, so it looked as if it was still stuck.

I did some more digging around.

Looking at the "ls -alrt" results, there is a directory that has also been updated recently - dirrpt . Navigating to that showed some recent files:

    -rw-rw-rw-   1 oracle   dba        12497 Dec 3 09:05 PRODP.rpt
    -rw-rw-rw-   1 oracle   dba        14812 Dec 3 09:05 aprodp.rpt
    -rw-rw-rw-   1 oracle   dba        28256 Dec 3 09:08 PRODP_2.rpt
    -rw-rw-rw-   1 oracle   dba        42520 Dec 3 09:08 aprodp_2.rpt

These report files are really useful. They show exactly what the GG parameters are, the tables that are being replicated, the directories involved and
other messages.

The PRODP.rpt showed this:

    2013-12-02 13:41:00 INFO    OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p13152_extr.

    2013-12-02 13:41:00 INFO    OGG-01640 BOUNDED RECOVERY: recovery start XID: 23.3.2889152.

    2013-12-02 13:41:00 INFO    OGG-01641 BOUNDED RECOVERY: recovery start position: SeqNo: 172754, RBA: 1182883856, SCN: 1773.3471757875     (7618448773683), Timestamp: 2013-12-02 02:42:55.000000.

    2013-12-02 13:41:00 INFO    OGG-01642 BOUNDED RECOVERY: recovery end position: SeqNo: 172780, RBA: 237458548, SCN: 1773.3480433166 (7618457448974),     Timestamp: 2013-12-02 03:39:00.000000, Thread: 1.

    2013-12-02 13:41:00 INFO    OGG-01579 BOUNDED RECOVERY: VALID BCP: CP.PRODP.000002188.

    2013-12-02 13:41:00 INFO    OGG-01629 BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED: 1.

Hmmm. The contents of this report file usually look like this:

               27562 records processed as of 2013-12-03 02:00:29 (rate 0,delta 0)
               27577 records processed as of 2013-12-03 02:30:33 (rate 0,delta 0)
               27585 records processed as of 2013-12-03 03:02:06 (rate 0,delta 0)
               27595 records processed as of 2013-12-03 04:19:41 (rate 0,delta 0)
               27603 records processed as of 2013-12-03 05:15:29 (rate 0,delta 0)

So it still looked like GG wasn't doing anything.

There is also a log file in this directory called MGR.rpt which is worth having a look at, it shows the activity of the GG manager:

    2013-12-02 13:35:53 INFO    OGG-00983 Manager started (port 7809).

    2013-12-02 13:35:53 INFO    OGG-00967 Manager performing AUTOSTART of ER processes.

    2013-12-02 13:35:53 INFO    OGG-00975 EXTRACT PRODP starting.

    2013-12-02 13:35:54 INFO    OGG-00975 EXTRACT PRODP_2 starting.

    2013-12-02 13:35:54 INFO    OGG-00975 REPLICAT aprodp starting.

    2013-12-02 13:35:54 INFO    OGG-00975 REPLICAT aprodp_2 starting.

    2013-12-02 13:35:54 INFO    OGG-00979 REPLICAT aprodp is down (gracefully).

    2013-12-02 13:35:54 INFO    OGG-00979 REPLICAT aprodp_2 is down (gracefully).

OK, so the log and report files are informative but still haven't explained what's happening (unless you're clever and have spotted what's going on).

Time to run some commands.

Make sure you're in the directory where 'ggsci' lives and run it:

    ./ggsci

    Oracle GoldenGate Command Interpreter for Oracle
    Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
    Solaris, sparc, 64bit (optimized), Oracle 10g on Apr 24 2012 09:06:57

    Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

    GGSCI (prodhost) 1>

These are some of the commands that you can use - I won't show the output because it's quite lengthy in some cases:

   info *

info *,detail

info *,showch

The most useful ones -

    send PRODP,status

Which showed this:

    Sending STATUS request to EXTRACT PRODP ...

    EXTRACT PRODP (PID 16605)
    Current status: In recovery[1]: Reading from data source

    Current read position:
    Redo thread #: 1
    Sequence #: 172790
    RBA: 775155132
    Timestamp: 2013-12-02 04:02:25.000000
    SCN: 1773.3484235828
    Current write position:
    Sequence #: 164
    RBA: 1110
    Timestamp: 2013-12-02 15:00:12.415826
    Extract Trail: /apps/oracle/product/gg_11.2/ora10g_PRODP_PRODGP/dirdat/pt

Note the "In Recovery[1]"

and

    send PRODP, showtrans

    Sending SHOWTRANS request to EXTRACT PRODP ...

    Extract is currently in recovery mode (reading transactions from trail file). Please try again in a few minutes.

OK, now we're getting somewhere. Time to read the manual (I know, I should have done this in the beginning).

Looking at this document - http://docs.oracle.com/cd/E15881_01/doc.104/gg_troubleshooting_v104.pdf

I found this:

    Extract freezes during recovery
    If Extract abends due to an error or system failure, it might appear to be stalled when it is restarted. The cause of this is probably because Extract had to search a long way back into the transaction logs to find the beginning of a long-running transaction that was open at the time of the failure.
    To find out whether a prolonged recovery is the cause of the delay, use the SEND EXTRACT command with the STATUS option. If one of the following messages is displayed on the Current status line, this indicates that recovery is in progress (or being completed) and that Extract is operating normally:

    ❍ In recovery[1] – Extract is recovering to its input checkpoint.
    ❍ In recovery[2] – Extract is recovering to its output checkpoint.
    ❍ Recovery complete – The recovery is finished, and normal processing will resume.

Ah. So even though there is a lag, and even though the error messages are getting generated, GG is actually performing as expected.

If the extract was for a long time, it will take time recovering. It needs to read from (data from info , showch):
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 172699
to
Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 172817
RBA: 725406320

Before it starts processing anything.

The send status shows

    GGSCI (prodhost) 4> send extract PRODP,status
    Sending STATUS request to EXTRACT PRODP ...
    EXTRACT PRODP (PID 16605)
    Current status: In recovery[1]: Reading from data source
    Current read position:
    Redo thread #: 1
    Sequence #: 172803
    RBA: 1024377380

So 172803 is slowly moving towards 172817, when it gets there it will complete recovery and start processing.

Later, this was the result of the

    send PRODP, showtrans

    Sending SHOWTRANS request to EXTRACT PRODP ...

    Oldest redo log file necessary to restart Extract is:

    Redo Log Sequence Number 172935, RBA 654089232

    ------------------------------------------------------------
    XID:                  150.1.40097
    Items:                0
    Extract:              PRODP
    Redo Thread:          1
    Start Time:           2013-12-02:17:14:56
    SCN:                  1773.3576958226 (7618553974034)
    Redo Seq:             172935
    Redo RBA:             654089232
    Status:               Running

And this was the status:

    GGSCI (prodhost) 3> send PRODP,status

    Sending STATUS request to EXTRACT PRODP ...

    EXTRACT PRODP (PID 16605)
    Current status: Recovery complete: At EOF

    Current read position:
    Redo thread #: 1
    Sequence #: 173092
    RBA: 1033393152
    Timestamp: 2013-12-03 09:57:01.000000
    SCN: 1773.3710127915
    Current write position:
    Sequence #: 164
    RBA: 17461664
    Timestamp: 2013-12-03 09:56:56.021460
    Extract Trail: /apps/oracle/product/gg_11.2/ora10g_PRODP_PRODGP/dirdat/pt

So, although it took over 10 hours to recover, it was doing what was expected.