Monday, September 29, 2014

Script to list sessions on a RAC instance by node

As part of the Exadata Patching, I needed to provide the application owners with a list of sessions running on the node.

This script does that - it uses the oratab to grab the database names then loops through them and runs a couple of SQL scripts.

I thought I'd put it up here because it can be adapted to run a SQL command against every database on a host, something I often have cause to do and end up having to write the script again.  You may need to change the 'dbhome_1' to 'db_1' depending on how you installed.

#!/bin/sh
#
# This script lists all sessions connected to the specified hostname in the query.
# It ignores the SYS, DBSNMP and PUBLIC users
#
# Hard code the ORACLE_HOME and a valid SID, and the host_name
#
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export ORACLE_SID=PROD1
export PATH=$ORACLE_HOME/bin:$PATH
rm sessions.lst
db_name=`grep dbhome_1 /etc/oratab | cut -f 1 -d ':' | grep "[1-9]\b" | grep -v "#"`

for db in $db_name
do
export ORACLE_SID=$db
sqlplus -s "/ as sysdba" <
col inst_id form 99
col sid form 99999
col serial# form 9999999
col program form a40
col username form a20
col host_name form a10
set lines 160
set feedback off
spool ${db}_sessions.txt
select instance_name from v\$instance;
set feedback on
select a.inst_id,b.host_name,a.sid,a.serial#,a.program,a.username
from gv\$session a, gv\$instance b where username is not null and a.inst_id=b.inst_id and b.host_name='xu1db06' and username
not in ('SYS','DBSNMP','PUBLIC') order by 1;
spool off
exit
EOF
done
for f in `ls *.txt`
do
cat $f >> sessions.lst
done
rm *sessions.txt

#
# End of Script
#

While I'm at it, here's a script that sets up the patching process - it lists the services running, and generates the scripts to stop the services, the instances, and then restart them at the end.


#!/bin/sh
#
# This script lists all services running on the node
# and generates scripts to stop and start them
#
# Hard code the node name in the line below where it says "NODENAME"
#

db_name=`grep dbhome_1 /etc/oratab | cut -f 1 -d ':' | grep -v "[1-9]\b" | grep -v "#"`

echo "Make sure you've changed the node name in the script - ctrl-c to exit this if you didn't, otherwise hit Enter"
read input

rm service_status.lst
rm stop_services.sh
rm services_before_patch.lst

echo Scanning for Services...will take a minute
for db in $db_name
do
srvctl status service -d $db >> service_status.tmp
tail -1 service_status.tmp
done

cp service_status.tmp services_before_patch.lst
grep -v not service_status.tmp > service_status.lst
rm service_status.tmp

echo
echo
echo "This has been saved in services_before_patch.lst"
#
# Hard code the node name here (replace NODENAME)
awk '{print $2" "$7}' service_status.lst | awk -F, '{print $1}' | sed 's/[0-9]$//' | awk '{print "srvctl stop service -d "$2" -s "$1" -n NODENAME "}' > stop_services.sh

sed 's/stop/start/g' stop_services.sh > start_services.sh

inst_name=`ps -ef | grep pmon | grep -v grep | grep -v perl |awk '{print $8}' | grep -v ASM | cut -c 10-`

for i_name in $inst_name
do
echo "srvctl stop instance -d "$i_name" -i "$i_name >> temp_si
done

cat temp_si | sed 's/[0-9]//' > temp_si_1
grep -v XUFSDB temp_si_1 > stop_instances.sh

rm temp_si temp_si_1

sed 's/stop/start/g' stop_instances.sh > start_instances.sh

echo
echo "You need to do the following:"
echo
echo "chmod +x stop_services.sh"
echo
echo "chmod +x start_services.sh"
echo
echo "chmod +x stop_instances.sh"
echo
echo "chmod +x start_instances.sh"
echo
echo "Then run them in this order once the application owners have been advised:"
echo
echo "./stop_services.sh"
echo
echo "./stop_instances.sh"
echo
echo "Stop the DBFS mount service and the XU1FSDB instance"
echo
echo "APPLY THE PATCH"
echo
echo "./start_services.sh"
echo
echo "Restart the DBFS mount service"
echo
echo "./start_instances.sh"
echo
echo "Compare the running services and instances before and after the stop/start"

#
# End of Script
#



Sunday, September 7, 2014

Applying a rolling patch on an Exadata box

Another day, another new thing to do.

Never applied a rolling patch on RAC. Never applied a patch on Exadata.

So. The first thing to do, obviously, is to read the aptly named "README.txt" file.

Looked quite straightforward, no nasty surprises in store (I hoped).

One passage did confuse me:

For a RAC environment, shut down all the services (database, ASM, listeners, nodeapps, and CRS daemons) running from the Oracle home of the node you want to patch. After you patch this node, start the services on this node.Repeat this process for each of the other nodes of the Oracle RAC system. OPatch is used on only one node at a time.

So did this mean I needed to shutdown ASM, listeners and the CRS on the host? Or just the databases in the HOME I was patching?

I decided to just do the databases in the HOME that was to be patched since it would fail if it wanted everything shut down and I'd revisit it if that was the case.

Here we go.

Check the databases in the ORACLE_HOME to be patched:

cat /etc/oratab | grep home_1

FSDB1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N
TESTD1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N
DEVD1:/u01/app/oracle/product/11.2.0.4/dbhome_1:N

Check to see if they are running

ps -ef | grep pmon

oracle    13742      1  0 Jun13 ?        00:22:19 asm_pmon_+ASM1
oracle    38331      1  0 Jun13 ?        00:21:28 ora_pmon_TESTD1
oracle    38343      1  0 Jun13 ?        00:27:10 ora_pmon_FSDB1
oracle    52291      1  0 Aug15 ?        00:07:28 ora_pmon_DEVD1

Stop the databases on the first node

srvctl stop instance -d TESTD -i TESTD1
srvctl stop instance -d DEVD -i DEVD1
srvctl stop instance -d FSDB -i FSDB1
PRCR-1133 : Failed to stop database FSDB and its running services
PRCR-1132 : Failed to stop resources using a filter
CRS-2529: Unable to act on 'ora.fsdb.db' because that would require stopping or relocating 'dbfs_mount', but the force option was not specified
CRS-2529: Unable to act on 'ora.fsdb.db' because that would require stopping or relocating 'dbfs_mount', but the force option was not specified

Ah. At this point I put the keyboard down and backed away slowly.

I checked with my colleagues, they said this is normal on an Exadata - there is a shared file system that is mounted on every node and this FSDB database controls it (or something)

Apparently it's OK to stop it on the nodes as long as nothing is using it.

/sbin/fuser /dbfs_direct/

If this comes back blank, all is OK.

You need to set the environment to the ASM database, then unmount the dbfs

ORACLE_SID = [DEVD1] ? +ASM1
oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04

oracle@xd1db01[+ASM1]/home/oracle> crsctl stop resource dbfs_mount -n xd1db01
CRS-2673: Attempting to stop 'dbfs_mount' on 'xd1db01'
CRS-2677: Stop of 'dbfs_mount' on 'xd1db01' succeeded

oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               OFFLINE OFFLINE      xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04


Try stopping the FSDB1 instance again

oracle@xd1db01[+ASM1]/home/oracle> . oraenv
ORACLE_SID = [+ASM1] ? FSDB1
oracle@xd1db01[FSDB1]/home/oracle> srvctl stop instance -d FSDB -i FSDB1
oracle@xd1db01[FSDB1]/home/oracle>


Check the oracle executable in the HOME isn't being used

cd $ORACLE_HOME/bin
/sbin/fuser oracle

This should come back blank

Set the PATH to include OPATCH

export PATH=/u01/app/oracle/product/11.2.0.4/dbhome_1/OPatch:$PATH

run opatch lsinventory so you can compare after the patch

opatch lsinventory
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/opatch2014-09-08_08-46-27AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2014-09-08_08-46-27AM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.4.0
There are 1 products installed in this Oracle Home.


Interim patches (3) :

Patch  17839474     : applied on Thu Feb 27 09:22:39 EST 2014
Unique Patch ID:  17071278
Patch description:  "DISKMON PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17839474)"
   Created on 22 Dec 2013, 21:51:42 hrs PST8PDT
   Bugs fixed:
     17839474

Patch  17984784     : applied on Thu Feb 27 09:22:21 EST 2014
Unique Patch ID:  17154638
Patch description:  "CRS PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17984784)"
   Created on 8 Jan 2014, 22:53:15 hrs PST8PDT
   Bugs fixed:
     16613232, 14525998, 17065496, 17551223, 16346413

Patch  17943261     : applied on Thu Feb 27 09:21:42 EST 2014
Unique Patch ID:  17071278
Patch description:  "DATABASE PATCH FOR EXADATA (JAN2014 - 11.2.0.4.3) : (17943261)"
   Created on 1 Jan 2014, 21:20:56 hrs PST8PDT
Sub-patch  17741631; "DATABASE PATCH FOR EXADATA (DEC 2013 - 11.2.0.4.2) : (17741631)"
Sub-patch  17628006; "DATABASE PATCH FOR EXADATA (NOV 2013 - 11.2.0.4.1) : (17628006)"
   Bugs fixed:
     17288409, 13944971, 16450169, 17265217, 16180763, 16220077, 17465741
     17614227, 16069901, 14010183, 16285691, 17726838, 13364795, 17088068
     17612828, 17443671, 17080436, 17761775, 16721594, 16043574, 16837842
     17446237, 16863422, 17332800, 13609098, 17610798, 17501491, 17239687
     17468141, 17752121, 17602269, 16850630, 17346671, 17313525, 14852021
     17783588, 17437634, 13866822, 12905058, 17546761



Rac system comprising of multiple nodes
  Local node = xd1db01
  Remote node = xd1db02
  Remote node = xd1db03
  Remote node = xd1db04


This assumes you've already extracted the patch to a directory and are in that directory

Check the patch doesn't conflict

opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.

PREREQ session

Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/opatch2014-09-08_08-47-04AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"

Prereq "checkConflictAgainstOHWithDetail" passed.

OPatch succeeded.

Looks good to go - apply the patch

opatch apply
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0.4/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/dbhome_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/dbhome_1/cfgtoollogs/opatch/18841764_Sep_08_2014_08_47_26/apply2014-09-08_08-47-26AM_1.log

Applying interim patch '18841764' to OH '/u01/app/oracle/product/11.2.0.4/dbhome_1'
Verifying environment and performing prerequisite checks...
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Y



This node is part of an Oracle Real Application Cluster.
Remote nodes: 'xd1db02' 'xd1db03' 'xd1db04'
Local node: 'xd1db01'
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/11.2.0.4/dbhome_1')


Is the local system ready for patching? [y|n]

OK, enter 'Y' to apply the patch

y
User Responded with: Y
Backing up files...

Patching component oracle.network.rsf, 11.2.0.4.0...

Patching component oracle.rdbms.rsf, 11.2.0.4.0...

Patching component oracle.rdbms, 11.2.0.4.0...

Verifying the update...

The local system has been patched.  You can restart Oracle instances on it.


Patching in rolling mode.

Remaining nodes to be patched:
'xd1db02' 'xd1db03' 'xd1db04'
What is the next node to be patched?


OK, that was painless

I left this session open and opened another putty session to the first node and restarted the databases

oracle@xd1db01[FSDB1]/home/oracle> . oraenv
ORACLE_SID = [FSDB1] ? +ASM1
oracle@xd1db01[+ASM1]/home/oracle>
oracle@xd1db01[+ASM1]/home/oracle> crsctl start resource dbfs_mount -n xd1db01
CRS-2672: Attempting to start 'ora.fsdb.db' on 'xd1db01'
CRS-2676: Start of 'ora.fsdb.db' on 'xd1db01' succeeded
CRS-2672: Attempting to start 'dbfs_mount' on 'xd1db01'
CRS-2676: Start of 'dbfs_mount' on 'xd1db01' succeeded
oracle@xd1db01[+ASM1]/home/oracle>
oracle@xd1db01[+ASM1]/home/oracle> srvctl start instance -d FSDB -i FSDB1
PRCC-1015 : FSDB was already running on xd1db01
PRCR-1004 : Resource ora.fsdb.db is already running

oracle@xd1db01[+ASM1]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04


oracle@xd1db01[DEVD1]/home/oracle> 
oracle@xd1db01[DEVD1]/home/oracle> srvctl start instance -d DEVD -i DEVD1
oracle@xd1db01[DEVD1]/home/oracle> srvctl start instance -d TESTD -i TESTD1

All started OK.

Opened a session on the second node to be patched and shut down the databases there

oracle@xd1db02[+ASM2]/home/oracle> crsctl stop resource dbfs_mount -n xd1db02
CRS-2673: Attempting to stop 'dbfs_mount' on 'xd1db02'
CRS-2677: Stop of 'dbfs_mount' on 'xd1db02' succeeded
oracle@xd1db02[+ASM2]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               OFFLINE OFFLINE      xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04
oracle@xd1db02[+ASM2]/home/oracle>
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d DEVD -i DEVD2
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d FSDB -i FSDB2
oracle@xd1db02[DEVD2]/home/oracle> srvctl stop instance -d TESTD -i TESTD2

All stopped OK, back to the session on the first node where the patch apply is waiting for a response

Remaining nodes to be patched:
'xd1db02' 'xd1db03' 'xd1db04'
What is the next node to be patched?
xd1db02
You have selected 'xd1db02' from 'xd1db02' 'xd1db03' 'xd1db04'

The node 'xd1db02' will be patched next.


Please shutdown Oracle instances running out of this ORACLE_HOME on 'xd1db02'.
(Oracle Home = '/u01/app/oracle/product/11.2.0.4/dbhome_1')

Is the node ready for patching? [y|n]
y
User Responded with: Y
Updating nodes 'xd1db02'
   Apply-related files are:
     FP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt"
     DP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt"
     MP = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt"
     RC = "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/remote_cmds.txt"

Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_files.txt" with actual path.
Propagating files to remote nodes...
Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/copy_dirs.txt" with actual path.
Propagating directories to remote nodes...
Instantiating the file "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0.4/dbhome_1/.patch_storage/18841764_Jun_26_2014_03_07_07/rac/make_cmds.txt" with actual path.
Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/network/lib; /usr/bin/make -f ins_net_client.mk client_sharedlib ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk liborasdksh ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk client_sharedlib ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2

Running command on remote node 'xd1db02':
cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/lib; /usr/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1 || echo REMOTE_MAKE_FAILED::>&2


The node 'xd1db02' has been patched.  You can restart Oracle instances on it.

Not sure what these "REMOTE_MAKE_FAILED" messages were. 

Went back to the second node and restarted the databases

oracle@xd1db02[+ASM2]/home/oracle> crsctl start resource dbfs_mount -n xd1db02
CRS-2672: Attempting to start 'ora.fsdb.db' on 'xd1db02'
CRS-2676: Start of 'ora.fsdb.db' on 'xd1db02' succeeded
CRS-2672: Attempting to start 'dbfs_mount' on 'xd1db02'
CRS-2676: Start of 'dbfs_mount' on 'xd1db02' succeeded
oracle@xd1db02[+ASM2]/home/oracle> crsctl status resource dbfs_mount -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
dbfs_mount
               ONLINE  ONLINE       xd1db01
               ONLINE  ONLINE       xd1db02
               ONLINE  ONLINE       xd1db03
               ONLINE  ONLINE       xd1db04

oracle@xd1db02[+ASM2]/home/oracle> srvctl start instance -d DEVD -i DEVD2
oracle@xd1db02[+ASM2]/home/oracle> srvctl start instance -d TESTD -i TESTD2
oracle@xd1db02[+ASM2]/home/oracle> ps -ef | grep pmon

oracle    13690      1  0 Jun24 ?        00:19:37 asm_pmon_+ASM2
oracle    23768      1  0 Aug13 ?        00:09:43 ora_pmon_DEVD2
oracle    41299      1  0 Aug15 ?        00:08:13 ora_pmon_TESTD2

Back to the first node

Remaining nodes to be patched:
'xd1db03' 'xd1db04'
What is the next node to be patched?

I don't want to patch these nodes, so did a  ctrl-c to exit

OPatch failed with error code 130

Ignored this.

Check the patches were applied

opatch lsinventory on both nodes

xd1db01:

Patch  18841764     : applied on Mon Sep 08 08:47:50 EST 2014
Unique Patch ID:  17779414
   Created on 26 Jun 2014, 03:07:07 hrs PST8PDT
   Bugs fixed:
     18841764

xd1db02:

Patch  18841764     : applied on Mon Sep 08 08:47:50 EST 2014
Unique Patch ID:  17779414
   Created on 26 Jun 2014, 03:07:07 hrs PST8PDT
   Bugs fixed:
     18841764


So I'm not sure what the "REMOTE_MAKE_FAILED" messages were referring to - the patch applied successfully on the second node.

That was it, not too hard. If I'd known about the FSDB before it wouldn't have thrown me.