Thursday, November 29, 2018

Cascading Standby Databases with the data guard broker - new V12 feature

I'm currently involved in a project migrating databases to the cloud, and am looking at using cascading standby database here to assist with the migration.

Note this is V12 only.

Basically -

On premises database - Golden Gate to staging database - data guard to new database (cloud)

For the migration, the On premises database and the staging database will be stopped, and the data guard will be made the primary. 

But this leaves the cloud database without a data guard, and rebuilding it will take days.

So someone suggested cascading standbys:

On premises database - Golden Gate to staging database - data guard to new database (cloud)  - cascading data guard (cloud)

However, when you make the standby the primary, it does a resetlogs and renders the cascading data guard database invalid (it's still waiting for the logs from staging).

Enter V12, the data guard broker, and the new RedoRoutes parameter.

Using these, you can set the redoroutes parameter to say "this is where my logs are going, and if there's a failover or switchover, change it".

So I set it up, shut down the primary (staging), did a failover and the first standby became the primary and the cascaded one became the standby to that primary. 

Exactly what I wanted. This is how it's done.

Set up the primary with the correct parameters

Primary, run these commands

ALTER DATABASE FORCE LOGGING;
ALTER SYSTEM SET db_unique_name='AH1' scope=spfile;
ALTER SYSTEM SET log_archive_config='dg_config=(AH1,AH1_STDBY1,AH1_STDBY2)' SCOPE=SPFILE;
ALTER SYSTEM SET log_archive_dest_2='service=AH1_STDBY1 async valid_for=(online_logfiles,primary_role) db_unique_name=AH1_STDBY1' SCOPE=SPFILE;
ALTER SYSTEM SET fal_server='AH1_STDBY1' SCOPE=SPFILE;
ALTER SYSTEM SET fal_client='AH1' SCOPE=SPFILE;
ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=SPFILE;
ALTER SYSTEM SET db_file_name_convert='AH1','AH1_STDBY1' SCOPE=SPFILE;
ALTER SYSTEM SET log_file_name_convert='AH1','AH1_STDBY1' SCOPE=SPFILE;

SELECT GROUP#,BYTES FROM V$LOG; 

   GROUP#      BYTES
---------- ----------
         1   104857600
         3   104857600
         2   104857600

ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 SIZE 104857600;
ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 SIZE 104857600;
ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 SIZE 104857600;

Create pfile that will be used as the basis of the new standbys.

create pfile='/home/oracle/initAH1_STDBY1.ora' from spfile;


Standby 1

pfile contents for standby1

*.audit_file_dest='/oracle/AH1/admin/adump'
*.audit_trail='db'
*.control_files='+DATA'
*.db_block_size=8192
*.db_domain=''
*.db_file_name_convert='AH1','AH1_STDBY1'
*.db_name='AH1'
*.db_recovery_file_dest='+ARCH'
*.db_recovery_file_dest_size=2G
*.db_securefile='PREFERRED'
*.db_unique_name='AH1_STDBY1'
*.diagnostic_dest='/oracle/AH1'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ORCLXDB)'
*.fal_client='AH1_STDBY1'
*.fal_server='AH1'
*.log_archive_config='dg_config=(AH1,AH1_STDBY1,AH1_STDBY2)'
*.log_archive_dest_2='service=AH1 async valid_for=(online_logfiles,primary_role) db_unique_name=AH1'
*.log_archive_dest_3='service=AH1_STDBY2 async valid_for=(standby_logfiles,standby_role) db_unique_name=AH1_STDBY2'
*.log_archive_dest_state_2='DEFER'
*.log_archive_dest_state_3='ENABLE'
*.log_file_name_convert='AH1','AH1_STDBY1'
*.memory_target=2G
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
*.undo_tablespace='UNDOTBS1'



pfile contents for standby2

*.audit_file_dest='/oracle/AH1/admin/adump'
*.audit_trail='db'
*.control_files='+DATA'
*.db_block_size=8192
*.db_domain=''
*.db_file_name_convert='AH1','AH1_STDBY2'
*.db_name='AH1'
*.db_recovery_file_dest='+ARCH'
*.db_recovery_file_dest_size=2G
*.db_securefile='PREFERRED'
*.db_unique_name='AH1_STDBY2'
*.diagnostic_dest='/oracle/AH1'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ORCLXDB)'
*.fal_client='AH1_STDBY2'
*.fal_server='AH1'
*.log_archive_config='dg_config=(AH1,AH1_STDBY1,AH1_STDBY2)'
*.log_archive_dest_2='service=AH1 async valid_for=(online_logfiles,primary_role) db_unique_name=AH1'
*.log_archive_dest_state_2='DEFER'
*.log_archive_dest_state_3='ENABLE'
*.log_file_name_convert='AH1','AH1_STDBY2'
*.memory_target=2G
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
*.undo_tablespace='UNDOTBS1'


create directories on standby hosts

mkdir -p /oracle/AH1/diag
mkdir -p /oracle/AH1/admin/audit


set up listener and tnsnames

standby1 

listener.ora - add

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = AH1_STDBY1)
      (SID_NAME = AH1)
      (ORACLE_HOME = /oracle/AH1/121)
    )
  )

standby2

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = AH1_STDBY2)
      (SID_NAME = AH1)
      (ORACLE_HOME = /oracle/AH1/121)
    )
  )


tnsnames (all 3 hosts)

AH1 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =
        (COMMUNITY = SAP.WORLD)
        (PROTOCOL = TCP)
        (HOST = staginghost)
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SID = AH1)
      (SERVICE_NAME = AH1)
    )
  )

AH1_STDBY1 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =
        (COMMUNITY = SAP.WORLD)
        (PROTOCOL = TCP)
        (HOST = standby1)
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SID = AH1)
      (SERVICE_NAME = AH1_STDBY1)
    )
  )

AH1_STDBY2 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS =
        (COMMUNITY = SAP.WORLD)
        (PROTOCOL = TCP)
        (HOST = standby2)
        (PORT = 1521)
      )
    )
    (CONNECT_DATA =
      (SID = AH1)
      (SERVICE_NAME = AH1_STDBY2)
    )
  )



Copy the orapwAH1 file from $ORACLE_HOME/dbs to the standby hosts

start the standby databases in nomount mode

tandby1 and standby2

sqlplus / as sysdba
startup nomount pfile='initAH1.ora'

Test the connections 

tnsping AH1
tnsping AH1_STDBY1
tnsping AH1_STDBY2

create the rman duplicate job on the standby1 and standby2 hosts, change the connection accordingly depending on which host you are on


#!/bin/bash
export DIR=/home/oracle/scripts
export LOG=$DIR/duplicate.log
rman target sys@AH1/password auxiliary sys@AH1_STDBY1/password << EOF > $LOG
#rman target sys@AH1/password auxiliary sys@AH1_STDBY2/password << EOF > $LOG
run {
 allocate channel pri1 type disk;
 allocate channel pri2 type disk;
 allocate channel prmy4 type disk;
 allocate auxiliary channel stby type disk;

 duplicate target database
  for standby
  from active database;
  }
EOF

Test sqlplus connections from all hosts

sqlplus sys@AH1/password as sysdba
sqlplus sys@AH1_STDBY1/password as sysdba
sqlplus sys@AH1_STDBY2/password as sysdba

All should connect

Run the rman duplicate on standby1

Run the rman duplicate on standby2

On both standbys - 

alter database recover managed standby database using current logfile disconnect;

Tail the alert logs of the standbys

switch a logfile on the primary - both standbys should apply the logs.

Check the gap on both standbys

select al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied"
from (select thread# thrd, max(sequence#) almax
      from v$archived_log
      where resetlogs_change#=(select resetlogs_change# from v$database)
      group by thread#) al,
     (select thread# thrd, max(sequence#) lhmax
      from v$log_history
      where first_time=(select max(first_time) from v$log_history)
      group by thread#) lh
where al.thrd = lh.thrd;

14:47:56 SYS@AH1_STDBY1> @dg_gap

    Thread Last Seq Received Last Seq Applied
---------- ----------------- ----------------
         1                29               29

14:48:36 SYS@AH1_STDBY2> @dg_gap

    Thread Last Seq Received Last Seq Applied
---------- ----------------- ----------------
         1                29               29



Set up the data guard broker

on all 3 databases

alter system set dg_broker_start=false;

remove old configuration files 

show parameter broker - then rm the broker files from the OS - they may not exists

Set the log_archive dests to null

primary and standby1
alter system set log_archive_dest_2='';

standby2
alter system set log_archive_dest_3='';

On all databases

alter system set dg_broker_start=true;

On primary

dgmgrl /
create configuration 'AH1_DG' as primary database is AH1 connect identifier is AH1;
add database 'AH1_STDBY1' as connect identifier is AH1_STDBY1;
add database 'AH1_STDBY2' as connect identifier is AH1_STDBY2;


enable configuration
show configuration

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  ah1        - Primary database
    AH1_STDBY1 - Physical standby database
    AH1_STDBY2 - Physical standby database (disabled)

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 46 seconds ago)

So now you have 2 standbys from the primary, but they are not cascading - you need to add the redoroutes parameters from dgmgrl


DGMGRL> show database AH1 redoroutes
  RedoRoutes = ''
DGMGRL> show database 'AH1_STDBY1' redoroutes
  RedoRoutes = ''
DGMGRL> show database 'AH1_STDBY2' redoroutes
  RedoRoutes = ''

Changed the redoroutes to test failover

DGMGRL> edit database AH1 set property RedoRoutes='(AH1:AH1_STDBY1)(AH1_STDBY1:AH1_STDBY2)';
Property "redoroutes" updated

DGMGRL> edit database 'AH1_STDBY1' set property RedoRoutes='(AH1:AH1_STDBY2)(AH1_STDBY1:AH1_STDBY2)';
Property "redoroutes" updated

DGMGRL> show configuration;

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  ah1        - Primary database
    AH1_STDBY1 - Physical standby database
      AH1_STDBY2 - Physical standby database (receiving archived redo)

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 41 seconds ago)

Now you can see they are off-set, which shows they are cascading

Test the failover

on primary

Shutdown the primary

On standby1

Connect to dgmgrl as sys

dgmgrl sys/mgadba

[oracle@orenjt199 ~]$ dgmgrl sys/mgadba

Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.

DGMGRL> show configuration;

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  ah1        - Primary database
    Error: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

    AH1_STDBY1 - Physical standby database
      AH1_STDBY2 - Physical standby database (receiving archived redo)

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 0 seconds ago)

Issue failover command

DGMGRL> failover to 'AH1_STDBY1';
Performing failover NOW, please wait...

Failover succeeded, new primary is "AH1_STDBY1"
DGMGRL> show configuration;

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  AH1_STDBY1 - Primary database

  Members Not Receiving Redo:
  ah1        - Physical standby database (disabled)
    ORA-16661: the standby database needs to be reinstated

  AH1_STDBY2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 158 seconds ago)

On standby1

Check database status

09:45:57 SYS@AH1_STDBY1> select open_mode from v$database;

OPEN_MODE
------------------------------------------------------------
READ WRITE

Switch a log and check the standby2 is applying

09:46:52 SYS@AH1_STDBY1> /

System altered.


On standby2 alert log

Media Recovery Waiting for thread 1 sequence 6 (in transit)
Wed Nov 28 09:46:55 2018
Recovery of Online Redo Log: Thread 1 Group 5 Seq 6 Reading mem 0
  Mem# 0: +ARCH/AH1_STDBY2/ONLINELOG/group_5.1682.993372329


DGMGRL> show configuration;

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  AH1_STDBY1 - Primary database
    AH1_STDBY2 - Physical standby database

  Members Not Receiving Redo:
  ah1        - Physical standby database (disabled)
    ORA-16661: the standby database needs to be reinstated

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 53 seconds ago)


Remove the original primary from the configuration - redoroutes no longer required

DGMGRL> edit database 'AH1_STDBY1' set property redoroutes='';
Property "redoroutes" updated
DGMGRL> edit database 'AH1_STDBY2' set property redoroutes='';
Property "redoroutes" updated
DGMGRL> edit database ah1 set property redoroutes='';
Property "redoroutes" updated
DGMGRL> remove database ah1;
Warning: ORA-16620: one or more databases could not be reached for a delete operation

Removed database "ah1" from the configuration
DGMGRL> show configuration;

Configuration - AH1_DG

  Protection Mode: MaxPerformance
  Members:
  AH1_STDBY1 - Primary database
    AH1_STDBY2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 25 seconds ago)


All good - you now have the standby1 as a primary, and the standby2 is the new standby, the logs are consistent.