Sunday, September 7, 2014

Applying a rolling patch on an Exadata box

Another day, another new thing to do.

Never applied a rolling patch on RAC. Never applied a patch on Exadata.

So. The first thing to do, obviously, is to read the aptly named "README.txt" file.

Looked quite straightforward, no nasty surprises in store (I hoped).

One passage did confuse me:

For a RAC environment, shut down all the services (database, ASM, listeners, nodeapps, and CRS daemons) running from the Oracle home of the node you want to patch. After you patch this node, start the services on this node.Repeat this process for each of the other nodes of the Oracle RAC system. OPatch is used on only one node at a time.

So did this mean I needed to shutdown ASM, listeners and the CRS on the host? Or just the databases in the HOME I was patching?

I decided to just do the databases in the HOME that was to be patched since it would fail if it wanted everything shut down and I'd revisit it if that was the case.

Here we go.

Check the databases in the ORACLE_HOME to be patched:

cat /etc/oratab | grep home_1

FSDB1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N
TESTD1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N
DEVD1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N

Check to see if they are running

ps -ef | grep pmon

oracle    13742      1  0 Jun13 ?        00:22:19 asm_pmon_+ASM1
oracle    38331      1  0 Jun13 ?        00:21:28 ora_pmon_TESTD1
oracle    38343      1  0 Jun13 ?        00:27:10 ora_pmon_FSDB1
oracle    52291      1  0 Aug15 ?        00:07:28 ora_pmon_DEVD1

Stop the databases on the first node

srvctl stop instance -d TESTD -i TESTD1
srvctl stop instance -d DEVD -i DEVD1
srvctl stop instance -d FSDB -i FSDB1
PRCR-1133 : Failed to stop database FSDB and its running services
PRCR-1132 : Failed to stop resources using a filter
CRS-2529: Unable to act on 'ora.fsdb.db' because that would require stopping or relocating 'dbfs_mount', but the force option was not specified
CRS-2529: Unable to act on 'ora.fsdb.db' because that would require stopping or relocating 'dbfs_mount', but the force option was not specified

Ah. At this point I put the keyboard down and backed away slowly.

I checked with my colleagues, they said this is normal on an Exadata - there is a shared file system that is mounted on every node and this FSDB database controls it (or something)

Apparently it's OK to stop it on the nodes as long as nothing is using it.

/sbin/fuser /dbfs_direct/

If this comes back blank, all is OK.

You need to set the environment to the ASM database, then unmount the dbfs

ORACLE_SID = [DEVD1] ? +ASM1
oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04

oracle@xd1db01[+ASM1]/home/oracle> crsctl stop resource dbfs_mount -n xd1db01
CRS-2673: Attempting to stop 'dbfs_mount' on 'xd1db01'
CRS-2677: Stop of 'dbfs_mount' on 'xd1db01' succeeded

oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               OFFLINE OFFLINE      xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04


Try stopping the FSDB1 instance again

oracle@xd1db01[+ASM1]/home/oracle> . oraenv
ORACLE_SID = [+ASM1] ? FSDB1
oracle@xd1db01[FSDB1]/home/oracle> srvctl stop instance -d FSDB -i FSDB1
oracle@xd1db01[FSDB1]/home/oracle>


Check the oracle executable in the HOME isn't being used

cd $ORACLE_HOME/bin
/sbin/fuser oracle

This should come back blank

Set the PATH to include OPATCH

export PATH=/u01/app/oracle/product/11.2.0.4/dbhome_1/OPatch:$PATH

run opatch lsinventory so you can compare after the patch

opatch lsinventory
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/opatch2014-09-08_08-46-27AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2014-09-08_08-46-27AM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.4.0
There are 1 products installed in this Oracle Home.


Interim patches (3) :

Patch  17839474     : applied on Thu Feb 27 09:22:39 EST 2014
Unique Patch ID:  17071278
Patch description:  "DISKMON PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17839474)"
   Created on 22 Dec 2013, 21:51:42 hrs PST8PDT
   Bugs fixed:
     17839474

Patch  17984784     : applied on Thu Feb 27 09:22:21 EST 2014
Unique Patch ID:  17154638
Patch description:  "CRS PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17984784)"
   Created on 8 Jan 2014, 22:53:15 hrs PST8PDT
   Bugs fixed:
     16613232, 14525998, 17065496, 17551223, 16346413

Patch  17943261     : applied on Thu Feb 27 09:21:42 EST 2014
Unique Patch ID:  17071278
Patch description:  "DATABASE PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17943261)"
   Created on 1 Jan 2014, 21:20:56 hrs PST8PDT
Sub-patch  17741631; "DATABASE PATCH FOR EXADATA (DEC 2013 - 11.2.0.4.2) : (17741631)"
Sub-patch  17628006; "DATABASE PATCH FOR EXADATA (NOV 2013 - 11.2.0.4.1) : (17628006)"
   Bugs fixed:
     17288409, 13944971, 16450169, 17265217, 16180763, 16220077, 17465741
     17614227, 16069901, 14010183, 16285691, 17726838, 13364795, 17088068
     17612828, 17443671, 17080436, 17761775, 16721594, 16043574, 16837842
     17446237, 16863422, 17332800, 13609098, 17610798, 17501491, 17239687
     17468141, 17752121, 17602269, 16850630, 17346671, 17313525, 14852021
     17783588, 17437634, 13866822, 12905058, 17546761



Rac system comprising of multiple nodes
  Local node = xd1db01
  Remote node = xd1db02
  Remote node = xd1db03
  Remote node = xd1db04


This assumes you've already extracted the patch to a directory and are in that directory

Check the patch doesn't conflict

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/opatch2014-09-08_08-47-04AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"

Prereq "checkConflictAgainstOHWithDetail" passed.

OPatch succeeded.

Looks good to go - apply the patch

opatch apply
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/18841764_Sep_08_2014_08_47_26/apply2014-09-08_08-47-26AM_1.log

Applying interim patch '18841764' to OH '/u01/app/oracle/product/11.2.0.4/dbhome_1'
Verifying environment and performing prerequisite checks...
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y



This node is part of an Oracle Real Application Cluster.
Remote nodes: 'xd1db02' 'xd1db03' 'xd1db04'
Local node: 'xd1db01'
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/11.2.0.4/dbhome_1')


Is the local system ready for patching? [y|n]

OK, enter 'Y' to apply the patch

y
User Responded with: Y
Backing up files...

Patching component oracle.network.rsf, 11.2.0.4.0...

Patching component oracle.rdbms.rsf, 11.2.0.4.0...

Patching component oracle.rdbms, 11.2.0.4.0...

Verifying the update...

The local system has been patched.  You can restart Oracle instances on it.


Patching in rolling mode.

Remaining nodes to be patched:
'xd1db02' 'xd1db03' 'xd1db04'
What is the next node to be patched?


OK, that was painless

I left this session open and opened another putty session to the first node and restarted the databases

oracle@xd1db01[FSDB1]/home/oracle> . oraenv
ORACLE_SID = [FSDB1] ? +ASM1
oracle@xd1db01[+ASM1]/home/oracle>
oracle@xd1db01[+ASM1]/home/oracle> crsctl start resource dbfs_mount -n xd1db01
CRS-2672: Attempting to start 'ora.fsdb.db' on 'xd1db01'
CRS-2676: Start of 'ora.fsdb.db' on 'xd1db01' succeeded
CRS-2672: Attempting to start 'dbfs_mount' on 'xd1db01'
CRS-2676: Start of 'dbfs_mount' on 'xd1db01' succeeded
oracle@xd1db01[+ASM1]/home/oracle>
oracle@xd1db01[+ASM1]/home/oracle> srvctl start instance -d FSDB -i FSDB1
PRCC-1015 : FSDB was already running on xd1db01
PRCR-1004 : Resource ora.fsdb.db is already running

oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04


oracle@xd1db01[DEVD1]/home/oracle> 
oracle@xd1db01[DEVD1]/home/oracle> srvctl start instance -d DEVD -i DEVD1
oracle@xd1db01[DEVD1]/home/oracle> srvctl start instance -d TESTD -i TESTD1

All started OK.

Opened a session on the second node to be patched and shut down the databases there

oracle@xd1db02[+ASM2]/home/oracle> crsctl stop resource dbfs_mount -n xd1db02
CRS-2673: Attempting to stop 'dbfs_mount' on 'xd1db02'
CRS-2677: Stop of 'dbfs_mount' on 'xd1db02' succeeded
oracle@xd1db02[+ASM2]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               OFFLINE OFFLINE      xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04
oracle@xd1db02[+ASM2]/home/oracle>
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d DEVD -i DEVD2
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d FSDB -i FSDB2
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d TESTD -i TESTD2

All stopped OK, back to the session on the first node where the patch apply is waiting for a response

Remaining nodes to be patched:
'xd1db02' 'xd1db03' 'xd1db04'
What is the next node to be patched?
xd1db02
You have selected 'xd1db02' from 'xd1db02' 'xd1db03' 'xd1db04'

The node 'xd1db02' will be patched next.


Please shutdown Oracle instances running out of this ORACLE_HOME on 'xd1db02'.
(Oracle Home = '/u01/app/oracle/product/11.2.0.4/dbhome_1')

Is the node ready for patching? [y|n]
y
User Responded with: Y
Updating nodes 'xd1db02'
   Apply-related files are:
     FP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt"
     DP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt"
     MP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt"
     RC = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/remote_cmds.txt"

Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt" with actual path.
Propagating files to remote nodes...
Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt" with actual path.
Propagating directories to remote nodes...
Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt" with actual path.
Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/network/lib; /usr/bin/make -f ins_net_client.mk client_sharedlib ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk liborasdksh ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk client_sharedlib ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2


The node 'xd1db02' has been patched.  You can restart Oracle instances on it.

Not sure what these "REMOTE_MAKE_FAILED" messages were. 

Went back to the second node and restarted the databases

oracle@xd1db02[+ASM2]/home/oracle> crsctl start resource dbfs_mount -n xd1db02
CRS-2672: Attempting to start 'ora.fsdb.db' on 'xd1db02'
CRS-2676: Start of 'ora.fsdb.db' on 'xd1db02' succeeded
CRS-2672: Attempting to start 'dbfs_mount' on 'xd1db02'
CRS-2676: Start of 'dbfs_mount' on 'xd1db02' succeeded
oracle@xd1db02[+ASM2]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04

oracle@xd1db02[+ASM2]/home/oracle> srvctl start instance -d DEVD -i DEVD2
oracle@xd1db02[+ASM2]/home/oracle> srvctl start instance -d TESTD -i TESTD2
oracle@xd1db02[+ASM2]/home/oracle> ps -ef | grep pmon

oracle    13690      1  0 Jun24 ?        00:19:37 asm_pmon_+ASM2
oracle    23768      1  0 Aug13 ?        00:09:43 ora_pmon_DEVD2
oracle    41299      1  0 Aug15 ?        00:08:13 ora_pmon_TESTD2

Back to the first node

Remaining nodes to be patched:
'xd1db03' 'xd1db04'
What is the next node to be patched?

I don't want to patch these nodes, so did a  ctrl-c to exit

OPatch failed with error code 130

Ignored this.

Check the patches were applied

opatch lsinventory on both nodes

xd1db01:

Patch  18841764     : applied on Mon Sep 08 08:47:50 EST 2014
Unique Patch ID:  17779414
   Created on 26 Jun 2014, 03:07:07 hrs PST8PDT
   Bugs fixed:
     18841764

xd1db02:

Patch  18841764     : applied on Mon Sep 08 08:47:50 EST 2014
Unique Patch ID:  17779414
   Created on 26 Jun 2014, 03:07:07 hrs PST8PDT
   Bugs fixed:
     18841764


So I'm not sure what the "REMOTE_MAKE_FAILED" messages were referring to - the patch applied successfully on the second node.

That was it, not too hard. If I'd known about the FSDB before it wouldn't have thrown me.









No comments:

Post a Comment