Tuesday, October 9, 2012

Using OS Watcher Black Box


Using OS Watcher Black Box


OS Watcher Black Box (formerly OS Watcher) is an Oracle provided tool which monitors the host hardware resources and can be presented as charts to identify potential bottle-necks outside of the database. It is relatively easy to set up, but since it is polling the CPU, memory and disk (using vmstat, iostat etc) it is essential that you advise the client if you intend to use it in Production in case it has an impact.


Go to  supporthtml.oracle.com and search for 301137.1 . 

This is the OSWatcher BB main document.

Download the latest oswbbxxx.tar , where xxx is the release number.

Extract to a directory on the Unix host.

There is an OSWbb_README.txt file in the docs directory which has detailed information.

Use the JAVA installation that comes with Oracle, put this in your PATH for V11:

export PATH=$ORACLE_HOME/jdk/jre/bin:$PATH

If you are using V10 of Oracle, then this should work:

export PATH=$ORACLE_HOME/jre/1.4.2/bin:$PATH

Navigate to the directory where you extracted the oswbb files and run the OSW start script.

I usually use

nohup ./startOSWbb.sh &

otherwise it holds the terminal session. It will advise that it is using defaults, then will start to log the data.

You can check to see if it’s running:

ps –ef | grep –i osw
/usr/bin/ksh ./OSWatcher.sh

/usr/bin/ksh ./OSWatcherFM.sh 48


The files are generated every 20 seconds and kept for 48 hours. If you want to keep them longer you need to specify this when you run startOSWbb.sh –

./startOSWbb.sh 60 72

This would start the tool and collect data at 60 second intervals and keep the last 72 hours of data.

To view the output as charts, you will need to have an X-server session on your PC or enable it on the linux host.
To enable it on the linux host, you may need to log in as root and type

xhost +

If you have an X-Server client installed on your PC (I use Xming), then you either need to set the DISPLAY variable to your ip address (with the suffix :0.0):

export DISPLAY=192.168.3.81:0.0

or in putty you can enable x-forwarding which doesn't need the DISPLAY variable to be set:



Remember to save the session settings.

Once done, you can start the OS Watcher session using this:

java –jar oswbba.jar –i

I usually create a shell script to do this, uncomment the line depending on the database version:

# V11

export PATH=$ORACLE_HOME/jdk/jre/bin:$PATH

# V10

#export PATH=$ORACLE_HOME/jre/1.4.2/bin:$PATH

java –jar oswbba.jar –i /home/oracle/mga/akh/oswbb/archive



The program should read the files in the archive folder, scroll through and parse them, then you should see a menu:

Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Specify Different Time Scale
Enter D to Return to Default Time scale
Enter R to Remove Currently Displayed Graphs
Enter P to Generate A Profile
Enter A to Analyze Data
Enter Q to Quit Program

Please Select an Option:




These options are fairly self-explanatory.

One thing to note is that if you display the charts, you can’t close them with the “X” – you need to go back into the menu and enter “R”.

To exit, type “Q”.


You can  use the “P” option to generate a profile which creates a folder with the gif files and an OSW_profile.htm file, which presents the charts in a page with some hints.

You can then either zip the files and copy it to a windows PC and open the file in a browser, or use firefox on the host if it is installed:



There is a new feature in this version called “Analyzer”. It can be accessed by entering “A” in the menu and it generates a text file with advice on where bottle-necks may be occurring. It places the file in the analyse directory. Here is an example of the first few lines: 

OSWatcher Black Box Analyzer
Input Archive:  /home/oracle/dba/work/akh/oswbb/archive
Hostname:       LNFSND92
OS Version:     Linux
Snapshot Freq:  30
CPU COUNT:      8

#####################################################################
Section 1: Overall Status
Subsystem       Status
------------------------
CPU             OK
MEMORY          UNKNOWN
I/O             CRITICAL
#####################################################################
Section 2.0: System Slowdown Summary Ordered By Impact
No System Wide Slowdowns Detected
#####################################################################
Section 3: Other General Findings
CRITICAL : Disk high service time observed.
#####################################################################
Section 4: CPU Detailed Findings
CPU RUN QUEUE:
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                  88   100.00
High (>3)                                   0        0
Very High (>6)                              0        0
High spanning multiple snaps                0        0

CPU UTILIZATION: PERCENT BUSY
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 88   100.00
High (>95%)                                0        0
Very High (100%)                           0        0
High spanning multiple snaps               0        0




You can add this to the index.html file with a bit of manual intervention.

If you want to narrow down the time span of the sample, for example if a user says they noticed a slow-down of the system for a 1 hour period, you can specify this in the menu by selecting “T”:

Enter T to Specify Different Time Scale

      Please Select an Option:T

      Specify Chart Start Time. Valid entry between May 30 09:16:19 2012      and May 30 11:24:22 2012

      Example Format To Enter Time: May 30 09:16:19 2012  :May 30 10:00:00    2012

      Specify Chart End Time. Valid entry between May 30 09:16:19 2012 and    May 30 11:24:22 2012

      Example Format To Enter Time: May 30 11:24:22 2012  :May 30 11:00:00    2012
      Dates accepted. Recalibrating charts...


It will return you to the menu, and any charts or analysis generated will be for the specified time period.


Generate the charts again (6,7,8 from the menu) and create the index.html file the same way as before – the charts will now just show the events through the specified start and end times:




That should get you started with OSW BB.



Thursday, October 4, 2012

Automating management of Oracle Database generated logs on linux

Automating management of Oracle Database generated logs on linux

As you know, Oracle 11G generates *.trc and *.trm log files as well as constantly generating alert log and listener log entries. Instead of writing shell scripts, you can use the 11G ADRCI utility and the linux logrotate command to manage these logs automatically.

Using ADRCI

ADRCI is fairly simple. Just fire it up and set the parameters accordingly. Note that you need to do this for each database on the host because they have separate ADRCI Homes.

Set the database environment as usual:

. oraenv
ORACLE_SID = [PRODDB] ?
The Oracle base remains unchanged with value /u01/app/oracle

Start up adrci:

adrci

ADRCI: Release 11.2.0.2.0 - Production on Fri Oct 5 08:07:40 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

ADR base = "/u01/app/oracle/admin/PRODDB"

Show the current settings for the log management:

adrci> show control

ADR Home = /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB: ************************************************************************* ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- 1117245929 720 8760 2012-06-06 09:08:17.132534 +10:00 2012-10-01 00:02:57.274304 +10:00 1 2 80 1 2012-06-06 09:08:17.132534 +10:00

1 rows fetched

Note the figures in red – these are the defaults, in hours.

This means that trace and dump files are kept for 30 days and incidents and health reports are kept for a year.

To change these to a more reasonable level (7 days for trace and dump files, 1 month for the others):

adrci> set control (SHORTP_POLICY=168);
adrci> set control (LONGP_POLICY=720);

The SHORTP_POLICY determines how long things like core dumps and trace files are kept for.

The LONGP_POLICY determines how long incidents and some health warnings are kept for.

Once set, you need to issue a 'purge' command to enable it (see below) otherwise it will wait for the next time it checks, which could be hours.

adrci> show control

ADR Home = /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB: ************************************************************************* ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- 349985628 168 720 2012-09-12 12:47:41.695214 +10:00 2012-09-30 00:09:18.714054 +10:00 2012-09-12 12:48:37.311103 +10:00 1 2 80 1 2012-07-26 09:12:35.866728 +10:00

1 rows fetched

That’s it for the dump and trace files. They will now be automatically deleted after the specified time.

Note you can also delete existing ones if there are too many using the purge command:

adrci> purge -age 10080 -type trace

Note that this figure is in minutes, not hours (wonderful!). This will delete any trace files older than 7 days.



If there is more than one database on the host, you must set the environment and adcri base accordingly before running the commands:


. oraenv

PRODDB1
adrci> set base /u01/app/oracle/admin/PRODDB1

adrci> show homes
ADR Homes:
diag/rdbms/proddb/PRODDB
diag/rdbms/proddb1/PRODDB1

adrci> show control
DIA-48448: This command does not support multiple ADR homes

If you see this, you need to set the homepath to run commands:

adrci> set homepath diag/rdbms/proddb1/PRODDB1
adrci> show control

ADR Home = /u01/app/oracle/admin/PRODDB1/diag/rdbms/proddb1/PRODDB1:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------
3297102354           168                  720                  2012-10-30 11:56:36.369326 +11:00        2012-10-30 11:16:13.086140 +11:00                                                 1                    2                    80                   1                    2012-07-20 15:01:07.750391 +10:00
1 rows fetched



Using linux logrotate

This is a bit more involved and needs some configuration files and a cron entry.

First, determine the name and location of the alert log and the listener log files:

sqlplus “/ as sysdba” 

show parameter back 
background_dump_dest string /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace

lsnrctl stat 

Listener Log File /u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml 

Check to see the sizes:

ls -al /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log

-rw-r----- 1 oracle oinstall 8759629 Oct 5 08:01 /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log

ls -al /u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml

-rw-r----- 1 oracle oinstall 4064218 Oct 5 08:01 /u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml


Create a logrotate.status file:

vi /home/oracle/dba/ora_logs/logrotate.status

with these contents:

logrotate state -- version 2 "/u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log" 2012-09-10

Make the date about a month in the past. This will be updated when the job runs.

Now make a oracle_logrotate.conf file:

vi /home/oracle/dba/ora_shells/oracle_logrotate.conf

With these contents:

# Logrotate configuration file for Oracle Alert Logs
#
# missingok - do not issue an error message if file is missing
# copytruncate - copy the file and truncate the original instead of deleting it
# rotate 7 - keep 7 copies, then delete old ones
# compress - compress old logs
# notifempty - don't do anything if the file is empty
#
/u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log {
missingok
copytruncate
rotate 7
compress
notifempty
}

This should be all that is needed – now run it and check the log sizes again:

/usr/sbin/logrotate -f -s /home/oracle/dba/ora_logs/logrotate.status /home/oracle/dba/ora_shells/oracle_logrotate.conf

ls -al /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log

-rw-r----- 1 oracle oinstall 0 Oct 5 08:01 /u01/app/oracle/admin/PRODDB/diag/rdbms/PRODDB/PRODDB/trace/alert_PRODDB.log

The size should be 0.

Add this as a regular cron job.

This runs once a week at 09:00:

00 09 * * 0 /usr/sbin/logrotate -f -s /home/oracle/dba/ora_logs/logrotate.status /home/oracle/dba/ora_shells/oracle_logrotate.conf 1>/home/oracle/dba/ora_logs/logrotate.log 2>&1

Repeat the process for the listener log:

Create the status file:

vi /home/oracle/dba/ora_logs/logrotate_listener.status

logrotate state -- version 2 “/u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml” 2012-9-10

Create the conf file:

vi /home/oracle/dba/ora_shells/oracle_listener_logrotate.conf

/u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml
{
missingok
copytruncate
rotate 7
compress
notifempty
}

Run the command:

/usr/sbin/logrotate -f -s /home/oracle/dba/ora_logs/logrotate_listener.status /home/oracle/dba/ora_shells/oracle_listener_logrotate.conf

Check the size:

ls -al /u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml

-rw-r----- 1 oracle oinstall 0 Oct 5 08:01 /u01/app/oracle/product/11.2.0.2/db_1/log/diag/tnslsnr/linux01/listener/alert/log.xml

Create a cron job:

00 09 * * 0 /usr/sbin/logrotate -f -s /home/oracle/dba/ora_logs/logrotate_listener.status /home/oracle/dba/ora_shells/oracle_listener_logrotate.conf 1>/home/oracle/dba/ora_logs/logrotate_listener.log 2>&1

Finally (?) delete the audit logs, using the old-fashioned find and remove :

00 12 * * 0 /usr/bin/find /u01/app/oracle/admin/PRODDB/adump -name '*.aud' -mtime +33 -exec rm {} \;

 The logs should now rotate and be managed automatically.