Sunday, September 27, 2015

Creating an APEX Database Status Monitor Based on Email alerts

As mentioned in my previous article, we have jobs set up at client sites that monitor the environments - the usual suspects like alert logs, disk space, tablespace space, backups etc. These jobs send us email and SMS alerts depending on the severity.

We all monitor the emails, but I wanted to create a 'Management Dashboard' that would show the current status of each client using the email subjects as input, since they are (fairly) standard.

The obvious candidate was an APEX application, and this meant reading the emails into an Oracle database that I could create pages against. As with all my APEX stuff, it starts out with an idea that gets built on as I think of things to add. There is no design involved so once you've read this you may want to put it together differently.

I did some research and it is possible using Gmail and PHP, so this is how I did it.
(Google are notorious for discontinuing services as they see fit, so one day this might stop working).

You'll need an Oracle database on a linux host with APEX installed.

This was the result (I've had to obscure the client names for obvious reasons):


It also has some pages with charts on how many warnings and failures each client has experienced over the previous month and what category (host space, alert log errors etc). This information allows you to see if there is a recurring problem or if (more likely) the thresholds need changing.




The reason these charts are large and there are no buttons for navigation is because this is designed to cycle around each page in turn. This is done with a Chrome or Firefox plug in that I'll describe later.

I have another app that has the same information but also has buttons for historical reporting and showing more details such as the host and database that the alert came from, which I'll detail later as well.






The process is this:

Clients have monitoring scripts installed that scan the alert logs, databases, report on backups, space etc.
The scripts email a central email address with any issues as well as status (backup reports).
The emails get forwarded to a newly created gmail account.
A php script running on a linux host reads the emails in the gmail account, loads the subject information into an oracle database and deletes the mail from gmail.
A SQL script formats the data (changes lower case to upper case for example) and shifts the information into different tables for reporting.

An APEX application acts as a GUI front end to the database to report on the client status.

Set up a new GMAIL account

It's a good idea to create a new gmail account dedicated to this purpose. So go and do that.
Gmail by default has security that prevents script access, so you need to go into the settings and change this. Note that these settings weaken the account security, so make sure that this email account will not contain sensitive mail.

Log into google as the user you created, then allow access:


Then go here and turn on ‘less secure apps’:


This allows a script to connect to the account and read the mail.



Make sure the email subjects are standard

We have a standard subject format, and if you don't, now is the time to make one. This is crucial to the process, it's how the scripts read the information into the database.

Our format is like this:

CLIENT:HOST:DATABASE:METRIC:STATUS

For example:

ACME:PRODHOST:PRODDB:RMAN BACKUP:SUCCESS

ACME:PRODHOST:PRODDB:ALERT LOG:WARNING

ACME:PRODHOST:PRODDB:HOST SPACE:FAILURE

Create email rules

Assuming each client sends a mail to the same address, you need to create a rule for each client that forwards the emails to the new GMAIL account you created. It could be as easy as creating a rule that says "Forward every mail sent from ACME to the_new_account@gmail.com". Or it may need some tailoring. The key thing is to make sure every alert is also forwarded to the new gmail account. Log into the account after you've received some emails to check they are getting forwarded.

Install php and create the script

Once that is set up, you need to log into your linux host and install some php packages:

yum install php
yum install php*-imap


Once installed, create a php script to connect to the Gmail account and read the email subjects into a flat file that can be loaded into the database. The parameter in use in this script is 'UNSEEN' - i.e only get the unread emails. 
There is a command at the end - expunge - which deletes the mail once it's been read. This makes sure that you don't get duplicates, but the UNSEEN should ignore old emails anyway. Feel free to experiment.

gmail_sub.sh:

#!/usr/bin/php
set_time_limit(4000);

// Connect to gmail
$imapPath = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'the_new_account@gmail.com';
$password = 'the_account_password';

// try to connect
$inbox = imap_open($imapPath,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());

$emails = imap_search($inbox,'UNSEEN');

$output = '';

foreach($emails as $mail) {

        $headerInfo = imap_headerinfo($inbox,$mail);

        $output .= $headerInfo->subject.'
';

        $emailStructure = imap_fetchstructure($inbox,$mail);

   $output = quoted_printable_decode($output);

   echo $output;
   echo "\r\n";
   $output = '';
}

// close the connection
imap_expunge($inbox);
imap_close($inbox);

Make the script executable:

chmod +x gmail_sub.sh


Test the script is reading the gmail inbox and generating the correct format:

./gmail_sub.sh > gmail_sub.txt

Hopefully the gmail_sub.txt should look something like this:



Create the Oracle schema

You'll need to create an oracle user, tablespace and tables in the database to hold the mail subjects:

SQL> create tablespace support_mail datafile '/u02/oradata/MGADB1/support_mail01.dbf' size 2000m;

Tablespace created.

SQL> create user support_mail identified by a_password default tablespace support_mail;

User created.

SQL> grant dba to support_mail;

Grant succeeded.

SQL> connect support_mail/a_password
Connected.

SQL> create table support_mail(client varchar2(50),hostname varchar2(100),database_name varchar2(20),metric varchar2(50),message varchar2(50),message_date date default sysdate) tablespace support_mail;

Table created.

SQL> alter user support_mail default tablespace support_mail;

User altered.

After some experimentation I changed it around a bit, and created tables to hold current information as well as one table to hold everything. They all look the same:

CREATE TABLE "SUPPORT_MAIL"."ALL_SUBJECTS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_FAILURES"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_SUCCESS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_WARNINGS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;


Create some scripts to pull it all together.

There is the main script to call the php script and load the data into the database. This is scheduled to run every 15 minutes in cron:

read_gmail_sub.sh

#/bin/sh
#
# Script to read email subjects into a database table for reporting
# Andy Horne 18/04/2015
#

#
# Set the environment
#

export ORACLE_SID=MGADB2
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export PATH=$ORACLE_HOME/bin:$PATH

#
# Call the php script to read the emails and put them to a file for loading
#
/home/oracle/mga/akh/gmail_sub.php > /home/oracle/mga/akh/gmail_sub.txt

#
# Tidy the file to remove unwanted characters
#
sed 's\FW: \\g' /home/oracle/mga/akh/gmail_sub.txt > /home/oracle/mga/akh/xx
sed 's?
??g' /home/oracle/mga/akh/xx > /home/oracle/mga/akh/gmail_sub.dat

#
# Call sqlloader to load the data into the database
#
sqlldr support_mail/a_password control=/home/oracle/mga/akh/load_gmail_sub.ctl

#
# Remove the old files
#
export LOGDIR=/home/oracle/mga/akh
rm $LOGDIR/gmail_sub.txt $LOGDIR/gmail_sub.dat $LOGDIR/xx

#
# Tidy the data so all subjects are standardised
#
sqlplus support_mail/a_password <
@/home/oracle/mga/akh/tidy.sql
commit;
exit
EOF


This is the SQL Loader Control file:

--
-- SQL Loader control file to load gmail subjects into database
--
-- Andy Horne 16/04/2015
--
LOAD DATA
infile '/home/oracle/mga/akh/gmail_sub.dat'
append
into table support_mail
trailing nullcols
(client CHAR terminated by ':',
hostname CHAR terminated by ':',
database_name CHAR terminated by ':',
metric CHAR terminated by ':',
message terminated by whitespace)


This is the SQL that tidies the data - it changes everything to upper case, renames some clients to fit with the tables, deletes some rubbish and moves data into current tables. For example, the host space email has 'Over 85%' as a message, but the APEX application shows 'CRITICAL', 'WARNING' or 'FAILURE', so I change that. 'Standby is in sync' is changed to 'SUCCESS', 'Blocking Lock' is changed to 'WARNING' etc.


It should be easy to follow. 

set echo on
--
-- Change message field to upper case
--
update support_mail set message='SUCCESS' where message = 'Success';
update support_mail set message='FAILURE' where message = 'Failed';
--
-- Change message for non-standard reports
--
update support_mail set message='WARNING' where message like 'Ove%';
update support_mail set message='WARNING' where message ='Errors';
update support_mail set message='SUCCESS' where metric like 'Standby is in%';
update support_mail set message='WARNING' where metric like 'Block%';
update support_mail set metric='Host Space' where metric like 'Filesys%';
update support_mail set metric='RMAN Backup' where metric like 'DiskHot%';
update support_mail set metric='Standby Sync' where metric like 'Standby%';
update support_mail set metric='Tablespace Space' where metric like 'Tables%';
update support_mail set metric='Alert Log' where metric like 'Alert%';
update support_mail set metric='RMAN Backup' where metric like '%Backup Report';
update support_mail set metric='RMAN Backup' where metric = 'Rman Backup';
update support_mail set metric='RMAN Backup' where metric = 'RMAN-backup';
update support_mail set message='WARNING' where message='Blocked';
update support_mail set metric='Status Change' where metric like 'Statu%';
update support_mail set message='WARNING' where message = 'Change';
--
-- Change client names to upper case
--
update support_mail set client='NAME' where client like 'Name%';
--
-- Change non-standard host names to standard
--
update support_mail set hostname='oracle01' where hostname='oracle01.au.clientname.com';
update support_mail set hostname='clyora06' where hostname='clyora06.clientname.com.au';
update support_mail set hostname='hkrhel33' where hostname='hkrhel33.clientname.hk';
--
-- Delete non-standard data
--
delete from support_mail where client like 'Old_client%';
delete from support_mail where client = 'EM Incident';
delete from support_mail where client = 'RE';
delete from support_mail where client = 'Re';
delete from support_mail where client = 'Fwd';
delete from support_mail where client is null;
delete from support_mail where database_name like 'Space%';
delete from support_mail where message like 'DataQ%';
delete from support_mail where message = 'status';
delete from support_mail where metric like 'Weekly Ch%';
delete from support_mail where message is null;
commit;
--
-- Move data into current tables
--
insert into current_failures (select * from support_mail where message='FAILURE');
insert into current_warnings (select * from support_mail where message='WARNING');
insert into current_success (select * from support_mail where message='SUCCESS');
--
-- Move all data into history table and create a backup
--
insert into all_subjects select * from support_mail;
insert into all_subjects_backups as select * from support_mail;
commit;
--
-- Remove latest data ready for next load
--
truncate table support_mail;
--
-- delete duplicates
--
delete from all_subjects
         where rowid not in ( select min(rowid)
       from all_subjects
   group by client,hostname,database_name,metric,message,to_char(message_date,'dd-mm-yy'));

Job to delete old entries from current tables, run at 17:00 each day:

delete_old_mail.sh

#!/bin/sh
#
# Remove yesterday's entries from current tables
# Andy Horne 26/04/2015
#
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export ORACLE_SID=MGADB2
export PATH=$ORACLE_HOME/bin:$PATH
sqlplus support_mail/a_password <
@/home/oracle/mga/akh/del.sql
commit;
exit
EOF

del.sql

delete from current_warnings where message_date <= sysdate-1;
delete from current_failures where message_date <= sysdate-1;
delete from current_success where message_date <= sysdate-1;

So now you should have 3 tables that show the latest status of all the clients - current_success, current_warnings and current_failures. Data is deleted from these at 17:00 each day so that the issues for the current day are shown.

You also have a table that contains every email ever loaded - called all_subjects. This allows for historic reporting.

There is also a table called support_mail that is loaded directly from the emails but gets truncated at the end of the process so is empty most of the time.

Now you need the APEX application to show the information. It's a really basic page that has 3 report regions - 1 for success (or clients with no current issues), one for warnings and one for critical issues.

I won't go into details on creating the main APEX app page, you can go through some previous articles of mine or trawl the web, but this is the SQL I use to show the status with colours depending on the status (Red, Orange and Green):

select distinct client,metric,'' details,
case when client is not null
then 'red'
end as fcolor
 from current_failures
where to_char(message_date) > sysdate - 1

select distinct client,metric,'' Details,
case when client is not null
then 'DarkOrange'
end as fcolor
 from current_warnings
where to_char(message_date) >= sysdate - 1

select distinct client, 
case when client is not null
then 'Green'
end as fcolor
from all_subjects
where client not in (select client from current_failures)
and client not in (select client from current_warnings)
order by 1


Orange shows up better than yellow.

So that will show the current warnings, failures for clients, as well as the clients that are OK.

For the next page, you need to create an APEX blank page, then add a chart region and 2 report regions. This page shows a pie chart for both warnings and failures and 2 reports with the client names and how many warnings and failures they've had.

The chart SQL is :

select null link,
decode(message,'WARNING','WARNINGS','FAILURE','FAILURES') label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message in ('WARNING','FAILURE')
group by null,message

Change the time scale if you only want to see the past week.

The reports SQL is:

select client,count(message) from all_subjects
where message ='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,message

order by count(metric) desc

and

select client,count(message) from all_subjects
where message ='FAILURE'
and MESSAGE_DATE >= sysdate-31
group by client,message

order by count(metric) desc


To show this:




The next page shows a pie chart with how many warnings each client had, and also a break down of the type.

The chart SQL:

select null link,
client label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message ='WARNING'

group by client,message

The report SQL:

select client,metric,count(metric) from all_subjects
where message='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,metric

order by count(metric) desc

This is the result:





Create another page, and substitute 'WARNING' for 'FAILURE' in the SQL.

So now you have an APEX application with 4 pages:

A page listing the clients with issues or no issues.
A page with a chart showing all warnings and failures and 2 tables with the number of issues
A page with a chart showing just warnings by client and a table with the type of warning.
A page with a chart showing just failures by client and a table with the type of failure.

You can use a browser plug-in to cycle around these pages. I use the Tab Slideshow plug-in which is available for Chrome : https://chrome.google.com/webstore/detail/tab-slideshow/loepeenhjndiclafjgoackjblfhonogb?hl=en

and Firefox : https://addons.mozilla.org/en-us/firefox/addon/tab-slideshow/

Install these, then open each of the 4 APEX pages in a browser and change the settings of Tab Slideshow to cycle through the tabs as often as you like - 30 seconds seems to work OK.

If the APEX session times out and prompts for a login, you need to login as the admin user and change the default timeout setting to 0:

http://apex_host:port/apex/apex_admin

Click on 'Manage Instance', then 'Security', then under 'Session Timeout' set both to '0':



You can then either run this off a laptop / PC that is connected to a TV in the office, or use a Chromecast to send it to a TV from a PC.


This is useful as a display to show the current status, but I also wanted to be able to drill-down a bit and show some history for clients to be able to show trends or see which issues each client had. 

So using the same tables and basic APEX pages, I created some new pages and buttons for navigation.





Tuesday, September 22, 2015

Creating a Database Monitoring Dashboard in APEX

This was copied and pasted from MS Word, so apologies for the formatting.

This is where to get all the files : http://1drv.ms/1OtWys4

The Dashboard

A lot of our clients are small, have standard edition or a few databases, so don't install Cloud Control. This means virtually all of the monitoring is performed using cron jobs, scripts and emails.
While this is usually good enough, I wanted to create a dashboard that showed the status of the databases, and it sort of evolved as I thought of things I'd like to see, and what I thought a client might like to see.
So what started as a page to show if the database was up, turned into this:












Because this evolved as I went along rather than being designed, there are probably some decisions I made that may not be the best – so you may want to read through this and re-think it yourself.


Assumptions:

You have Oracle installed.
You know how to manage and administer Oracle.
You have APEX (at least V4) installed and know how to manage that (ie create users, workspaces and a bit of development knowledge).
You know shell scripting.

How it works

There is a repository database which houses all the dashboard tables and the APEX application.
There are unix shell scripts (can be windows batch jobs if that's your platform) which are scheduled by cron (or Windows task scheduler) which connect to each database using a tnsnames.ora file and run SQL to collect the data. The data is written to flat files on the host where the monitoring database sits. SQL Loader jobs then load the data into the dashboard tables.

If, for some reason, you cannot connect to the databases using tnsnames, then an alternative would be to run the scripts locally and ftp or scp the flat files to the host that is running the repository database, but that is more complex and you have less control.

One thing to note is that this is not a live display – that would require more complexity and would place load on the database – this is not an agent-based monitor, and it's not Cloud Control. It's designed to give an overview of the databases so that the DBA can login to the database to have a closer look if necessary, and to store some historical information.

The scripts are scheduled to run at different times depending on how recent you want the data – the most frequent scripts run every 20 minutes to collect information on locks, alert log, session details etc. You can schedule them as frequently as you think necessary, but be cautious in case you cause load (although the scripts are fairly benign, remember that you are probably running them on a production database so be careful. And of course, make sure you follow the site change control procedures…).

The data I am collecting

This is what I gather, you may decide you want other information or that you don't need some of these, but I think it is a good base.

Account Status                      -              Shows if any application accounts are locked.
Alert Log                              -              (V11R2 and above) – shows ORA- errors.
Database Waits                 -              Chart of database wait events.
File I/O                                 -              Charts file reads and writes history.
DDL Changes                      -              Shows application objects that have changed.
File Latency                        -              Charts I/O latency (not yet shown on dashboard).
Host CPU                             -              Charts Host CPU usage (not yet shown on dashboard).
Host Details                        -              Shows OS, version, database, version, hostname.
Host Memory                    -              Charts host memory usage (not yet shown on dashboard).
Host Space                          -              Charts host space history (not yet shown on dashboard).
Invalid Objects                  -              Shows application invalid objects.
Locks                                     -              Shows blocking locks.
Long Operations               -              Shows any LONGOPS.
Backups                               -              Shows RMAN backup status and information.
Sessions                               -              Charts and lists database sessions.
Stale Stats                           -              Lists objects that have stale statistics.
Tablespace Space            -              Charts tablespace space with history.
Transactions                       -              Charts transactions by hour.
Uptime                                 -              Shows when database was started.

Some of these have data in the tables but I haven't got around to creating a page in APEX. By setting them up and collecting the data, I'll have the history available when I eventually get around to it.

The tables

I'll upload the DDL for all the objects, but this is a sample:

DASHBOARD_ACCOUNT_STATUS

CREATE TABLE MONITOR.DASHBOARD_ACCOUNT_STATUS
   (   DB_NAME VARCHAR2(20 BYTE),
       USERNAME VARCHAR2(50 BYTE),
       ACCOUNT_STATUS VARCHAR2(20 BYTE),
       SAMPLE_DATE DATE DEFAULT sysdate NOT NULL ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE MONITOR_DAT ;

So, the first thing to do is create a tablespace, and a schema. Because this is in a database I've set up specifically, I give the schema DBA access so I don't have to worry about those pesky permissions. If you need to create this in an existing database you need to sort out the permissions yourself, but the schema shouldn't need DBA.

The common columns in all the tables are the DB_NAME and the SAMPLE_DATE columns – these allow you to drill down in the APEX application. The other columns are determined by the sort of information you're collecting.

These are the tables I have, you may not want them all, or you may want to create your own.

DASHBOARD_ACCOUNT_STATUS
DASHBOARD_ALERT_LOG
DASHBOARD_ALERT_LOG_HISTORY
DASHBOARD_DATABASE_WAITS
DASHBOARD_DB_FILE_IO
DASHBOARD_DB_FILE_IO_HISTORY
DASHBOARD_DDL_CHANGES
DASHBOARD_FILE_LATENCY
DASHBOARD_HOST_CPU
DASHBOARD_HOST_CPU_HISTORY
DASHBOARD_HOST_DETAILS
DASHBOARD_HOST_INFORMATION
DASHBOARD_HOST_MEM
DASHBOARD_HOST_SPACE
DASHBOARD_INVALID_OBJECTS
DASHBOARD_LOCK_DETAILS
DASHBOARD_LOCK_HISTORY
DASHBOARD_LONGOPS
DASHBOARD_RMAN_BACKUPS
DASHBOARD_SESSIONS_HISTORY
DASHBOARD_SESSIONS_HWM
DASHBOARD_SESSION_DETAILS
DASHBOARD_STALE_STATS
DASHBOARD_TBS_SPACE
DASHBOARD_TRANSACTIONS
DASHBOARD_TRANSACTIONS_HISTORY
DASHBOARD_UPTIME

I created a couple of history tables, the idea being to move old data out of the current ones, but I haven't got round to it yet.

How the data is collected

The data is collected through shell scripts that connect to each database in the tnsnames.ora, run SQL statements, spool to flat files and then the data is loaded into the database using SQL Loader.
The cron job runs a script based on the frequency required. This script contains calls to the data gathering scripts. This one runs every hour:

#!/bin/sh
#
# Get data each hour
#

export TNS_ADMIN=/home/oracle/mga/akh
export ORACLE_HOME=/opt/oracle/product/11.2.0.3/db1
export ORACLE_SID=MONITOR
export PATH=$ORACLE_HOME/bin:$PATH

for i in AWATST EAREPTST AWS1DEV AWAUAT AREGUAT AREGSUP AWASUP AWAPRD AREGPRD
do
 /home/oracle/mga/akh/dashboard/script/get_io.sh $i
 /home/oracle/mga/akh/dashboard/script/get_trans.sh $i
 /home/oracle/mga/akh/dashboard/script/get_backup.sh $i
 /home/oracle/mga/akh/dashboard/script/get_waits.sh $i
 /home/oracle/mga/akh/dashboard/script/get_uptime.sh $i
 /home/oracle/mga/akh/dashboard/script/get_accounts.sh $i
done

Note that the database names are hard-coded here and are used as input into the scripts. You could create a parameter file, or use a sed / awk process to read the tnsnames.ora instead.

This is one of the scripts called by the cronjob – note I use the system user to connect. If you can't do that, you will need to create a user on the target database(s) with the correct permissions.

#!/bin/bash
# Script to get account status
# Andy Horne 25/07/2015
#
export ORACLE_HOME=/opt/oracle/product/11.2.0.3/db1
export ORACLE_SID=$1
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus -S system/xxxxxx@${ORACLE_SID} <
set pages 0 feed off verify off
spool /home/oracle/mga/akh/dashboard/script/output/${ORACLE_SID}_accounts.out
SELECT '$ORACLE_SID',
username, account_status
from dba_users where username in ('DFC_RO_USER',
'AWRIS2',
'ARCHREPO',
'DFC_USER',
'DSTFIX',
'DWH_COMMON',
'DWH_ETL_USER',
'DWH_INTEGRATION',
'DWH_MAPPING',
'DWH_RO_USER',
'DWH_STAGING',
'DWH_TEMPORAL',
'EAREPTST',
'REGISTRY',
'WADI',
'WADI2',
'WDM')
order by 1 asc;
EOF

#
# Load the data
/home/oracle/mga/akh/dashboard/script/load_accounts.sh ${ORACLE_SID}

So in the cron job script (get_hourly.sh) there is the variable 'i' which is assigned the name of the databases in turn (eg AREGPRD). This variable is fed into the ORACLE_SID variable in the called script (get_accounts.sh). So the get_accounts.sh script runs for each database, connects to it, and spools the output to a file with the database name as the prefix (eg AREGPRD_accounts.out).

This file gets read in by the sql loader job which looks like this:

#!/bin/sh
# Load the data into the database
export ORACLE_HOME=/opt/oracle/product/11.2.0.3/db1
export ORACLE_SID=MONITOR
export PATH=$ORACLE_HOME/bin:$PATH
export DBNAME=$1

#
# Load the data
#
sqlldr monitor/pwd_goes_here control=/home/oracle/mga/akh/dashboard/script/load_accounts.ctl

This is the control file for that job:
--
-- SQL Loader control file to load sessions data into the monitor.dashboard_sessions_history table
--
-- Andy Horne 25/07/2014
--
LOAD DATA
infile '/home/oracle/mga/akh/dashboard/script/output/${DBNAME}_accounts.out'
append
into table DASHBOARD_ACCOUNT_STATUS
trailing nullcols
(db_name terminated by whitespace,
username terminated by whitespace,
account_status terminated by whitespace)

And this is what the data file looks like:

AREGPRD REGISTRY                       OPEN
AREGPRD DWH_ETL_USER                   OPEN
AREGPRD DWH_RO_USER                    OPEN
AREGPRD DFC_RO_USER                    OPEN
AREGPRD DFC_USER                       OPEN

The sample_date column exists in all the tables, but it is default sysdate so doesn't need to be loaded.
This is what the data in the table looks like:

AREGPRD       DFC_USER      OPEN   21/SEP/15
AREGPRD       DFC_RO_USER   OPEN   21/SEP/15
AREGPRD       DWH_RO_USER   OPEN   21/SEP/15
AREGPRD       DWH_ETL_USER  OPEN   21/SEP/15
AREGPRD       REGISTRY      OPEN   21/SEP/15
AWAPRD        WDM           OPEN   21/SEP/15
AWAPRD        DWH_COMMON    OPEN   21/SEP/15
AWAPRD        DWH_TEMPORAL  OPEN   21/SEP/15

And this is what the dashboard region looks like:



I'll upload all the scripts, control files and sample data.

The APEX Application

I found it best to get the scripts working and loading data before turning to APEX – it's a lot easier to design the APEX application if you have decent data.
So, assuming you have the data, you can start to create the application.
I'll upload my application, but you can design yours from scratch if you like, depending on what you want on the front page.

In my case, I have this:

Date and time on hosts (my current site uses UTC on the hosts, which throws out a lot of the time stats, so I have this to be able to see what is happening when).

Database information – dbname, host name, startup time. Note, I hard code the host and db name because it's only needed once. I type the info into the table manually.

Locked Accounts

Database Locks

Database Session Count

Alert Log Errors

DDL Changes

Invalid Objects Count

RMAN Backup Status

Long Operations

Tablespaces over 50% full

Most of these have links or buttons which drill-down to other pages. It looks like this:




Let's get started

So assuming you have a database that you can use as a repository, we can start.
Set the environment to your database (. oraenv).

Run the cr_monitor.sql to create the tablespace, user and tables – edit the path to the directory as required:

SQL> @cr_monitor.sql
SQL> CREATE TABLESPACE MONITOR_DAT DATAFILE
  2    '/data/oracle/db07/oradata/MONITOR/monitor_dat01.dbf' SIZE 524288000
  3    LOGGING ONLINE PERMANENT BLOCKSIZE 8192
  4    EXTENT MANAGEMENT LOCAL AUTOALLOCATE DEFAULT
  5   NOCOMPRESS  SEGMENT SPACE MANAGEMENT AUTO;

Tablespace created.

SQL>
SQL> CREATE USER MONITOR IDENTIFIED BY CHANGE_PASSWORD
  2   DEFAULT TABLESPACE MONITOR_DAT TEMPORARY TABLESPACE TEMP
  3    QUOTA UNLIMITED ON MONITOR_DAT;

User created.

SQL>
SQL> GRANT DBA TO MONITOR;

Grant succeeded.

SQL>
SQL> CREATE TABLE MONITOR.DASHBOARD_DB_FILE_IO
  2     (    DB_NAME VARCHAR2(50 BYTE),
  3          METRIC VARCHAR2(20 BYTE),
  4          METRIC_VALUE NUMBER,
  5          SAMPLE_DATE DATE DEFAULT sysdate NOT NULL ENABLE
  6     ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  7   NOCOMPRESS LOGGING
  8    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  9    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 10    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 11    TABLESPACE MONITOR_DAT ;

Table created.

SQL>
SQL> CREATE TABLE MONITOR.DASHBOARD_DB_FILE_IO_HISTORY
  2     (    DB_NAME VARCHAR2(50 BYTE),
  3          METRIC VARCHAR2(20 BYTE),
  4          METRIC_VALUE NUMBER,
  5          SAMPLE_DATE DATE DEFAULT sysdate NOT NULL ENABLE
  6     ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  7   NOCOMPRESS LOGGING
  8    STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  9    PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
 10    BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
 11    TABLESPACE MONITOR_DAT ;

Table created.

.
.
Etc
.
.
SQL> exit

So we now have tables, but no data.
The shell scripts assume this is the path to the output, so amend accordingly:

/home/oracle/mga/akh/dashboard/script/output

Copy all the shell scripts into a directory, and edit the 'get_all_sql.sh' script and uncomment all the commented scripts:

#!/bin/sh
#
# Get all data
#

export TNS_ADMIN=/home/oracle/mga/akh
export ORACLE_HOME=/opt/oracle/product/11.2.0.3/db1
export ORACLE_SID=MONITOR
export PATH=$ORACLE_HOME/bin:$PATH

for i in AWATST EAREPTST AWS1DEV AWAUAT AREGUAT AREGSUP AWASUP AWAPRD AREGPRD
do
/home/oracle/mga/akh/dashboard/script/get_backup.sh $i
/home/oracle/mga/akh/dashboard/script/get_cpu.sh $i
/home/oracle/mga/akh/dashboard/script/get_io.sh $i
/home/oracle/mga/akh/dashboard/script/get_latency.sh $i
/home/oracle/mga/akh/dashboard/script/get_locks.sh $i
/home/oracle/mga/akh/dashboard/script/get_sessions.sh $i
/home/oracle/mga/akh/dashboard/script/get_tablespace.sh $i
/home/oracle/mga/akh/dashboard/script/get_trans.sh $i
/home/oracle/mga/akh/dashboard/script/get_session_details.sh $i
/home/oracle/mga/akh/dashboard/script/get_longops.sh $i
/home/oracle/mga/akh/dashboard/script/get_uptime.sh $i
/home/oracle/mga/akh/dashboard/script/get_stale_stats.sh $i
/home/oracle/mga/akh/dashboard/script/get_alert_log.sh $i
/home/oracle/mga/akh/dashboard/script/get_invalid.sh $i
/home/oracle/mga/akh/dashboard/script/get_accounts.sh $i
/home/oracle/mga/akh/dashboard/script/get_sessions_hwm.sh $i
/home/oracle/mga/akh/dashboard/script/get_waits.sh $i
/home/oracle/mga/akh/dashboard/script/get_schema_diffs.sh $i
/home/oracle/mga/akh/dashboard/script/get_ddl_changes.sh $i
done

Note that you will obviously need to edit this to reflect the database names and the paths. You also need to change the TNS_ADMIN line or remove it, depending on whether you use the existing tnsnames.ora file or create your own.

You will also need to change all the 'get_' shell scripts to reflect the system password or change the user to one that has permission to view the data dictionary tables on the target databases. I didn't say this was easy.

You may want to limit the first run to one database to make sure all the scripts are OK, or only uncomment one 'get_' script at a time.

So, assuming all these run through OK, you should have files in the output directory and data in the tables:

-rw-r--r-- 1 oracle oinstall  5474 Sep 21 04:59 AWAPRD_session_details.out
-rw-r--r-- 1 oracle oinstall    81 Sep 21 05:00 AWAPRD_sessions_hwm.out
-rw-r--r-- 1 oracle oinstall   162 Sep 21 04:59 AWAPRD_sessions.out
-rw-r--r-- 1 oracle oinstall  5957 Sep 21 05:00 AWAPRD_stale_stats.out
-rw-r--r-- 1 oracle oinstall  1215 Sep 21 04:59 AWAPRD_tbsspace.out
-rw-r--r-- 1 oracle oinstall    81 Sep 21 04:59 AWAPRD_trans.out

AREGPRD       DFC_RO_USER   OPEN   21/SEP/15
AREGPRD       DWH_RO_USER   OPEN   21/SEP/15
AREGPRD       DWH_ETL_USER  OPEN   21/SEP/15
AREGPRD       REGISTRY      OPEN   21/SEP/15
AREGPRD       DFC_USER      OPEN   21/SEP/15
AWAPRD        DWH_ETL_USER  OPEN   21/SEP/15
AWAPRD        DWH_RO_USER   OPEN   21/SEP/15
AWAPRD        DWH_TEMPORAL  OPEN   21/SEP/15
AWAPRD        WDM           OPEN   21/SEP/15
AWAPRD        DWH_COMMON    OPEN   21/SEP/15
AWAPRD        DWH_STAGING   OPEN   21/SEP/15

Now you can import the APEX application. Sign in to APEX as the ADMIN user, and create a workspace and user (called monitor).

Download the dashboard.sql file and import it into APEX.

Run the app and sign in as the monitor user, and hopefully you should see something like this:



Note that there is 'no data found' in the 'Database Information' region. This is because this table is not loaded, but entered manually since it's static. So use SQL Developer or SQLPlus to enter the information, and then refresh the page:

So that's it really. Hopefully this is enough to get you started, and you can tailor it as you see fit.

This is where to get all the files : http://1drv.ms/1OtWys4

You'll have to wait a few days for some of the charts to populate properly since some are loaded overnight.

Adding databases should be easy – just add them to the tnsnames.ora and the list in the 'get_hourly.sh', 'get_daily.sh' etc scripts:

for i in AWATST EAREPTST AWS1DEV AWAUAT NEW_DB_NAME


You're also not restricted to Oracle – if you can get data into a flat file from other databases then you can load it using SQL Loader.

Although the scripts are for unix, you should be able to adapt them for Windows.