Sunday, September 27, 2015

Creating an APEX Database Status Monitor Based on Email alerts

As mentioned in my previous article, we have jobs set up at client sites that monitor the environments - the usual suspects like alert logs, disk space, tablespace space, backups etc. These jobs send us email and SMS alerts depending on the severity.

We all monitor the emails, but I wanted to create a 'Management Dashboard' that would show the current status of each client using the email subjects as input, since they are (fairly) standard.

The obvious candidate was an APEX application, and this meant reading the emails into an Oracle database that I could create pages against. As with all my APEX stuff, it starts out with an idea that gets built on as I think of things to add. There is no design involved so once you've read this you may want to put it together differently.

I did some research and it is possible using Gmail and PHP, so this is how I did it.
(Google are notorious for discontinuing services as they see fit, so one day this might stop working).

You'll need an Oracle database on a linux host with APEX installed.

This was the result (I've had to obscure the client names for obvious reasons):


It also has some pages with charts on how many warnings and failures each client has experienced over the previous month and what category (host space, alert log errors etc). This information allows you to see if there is a recurring problem or if (more likely) the thresholds need changing.




The reason these charts are large and there are no buttons for navigation is because this is designed to cycle around each page in turn. This is done with a Chrome or Firefox plug in that I'll describe later.

I have another app that has the same information but also has buttons for historical reporting and showing more details such as the host and database that the alert came from, which I'll detail later as well.






The process is this:

Clients have monitoring scripts installed that scan the alert logs, databases, report on backups, space etc.
The scripts email a central email address with any issues as well as status (backup reports).
The emails get forwarded to a newly created gmail account.
A php script running on a linux host reads the emails in the gmail account, loads the subject information into an oracle database and deletes the mail from gmail.
A SQL script formats the data (changes lower case to upper case for example) and shifts the information into different tables for reporting.

An APEX application acts as a GUI front end to the database to report on the client status.

Set up a new GMAIL account

It's a good idea to create a new gmail account dedicated to this purpose. So go and do that.
Gmail by default has security that prevents script access, so you need to go into the settings and change this. Note that these settings weaken the account security, so make sure that this email account will not contain sensitive mail.

Log into google as the user you created, then allow access:


Then go here and turn on ‘less secure apps’:


This allows a script to connect to the account and read the mail.



Make sure the email subjects are standard

We have a standard subject format, and if you don't, now is the time to make one. This is crucial to the process, it's how the scripts read the information into the database.

Our format is like this:

CLIENT:HOST:DATABASE:METRIC:STATUS

For example:

ACME:PRODHOST:PRODDB:RMAN BACKUP:SUCCESS

ACME:PRODHOST:PRODDB:ALERT LOG:WARNING

ACME:PRODHOST:PRODDB:HOST SPACE:FAILURE

Create email rules

Assuming each client sends a mail to the same address, you need to create a rule for each client that forwards the emails to the new GMAIL account you created. It could be as easy as creating a rule that says "Forward every mail sent from ACME to the_new_account@gmail.com". Or it may need some tailoring. The key thing is to make sure every alert is also forwarded to the new gmail account. Log into the account after you've received some emails to check they are getting forwarded.

Install php and create the script

Once that is set up, you need to log into your linux host and install some php packages:

yum install php
yum install php*-imap


Once installed, create a php script to connect to the Gmail account and read the email subjects into a flat file that can be loaded into the database. The parameter in use in this script is 'UNSEEN' - i.e only get the unread emails. 
There is a command at the end - expunge - which deletes the mail once it's been read. This makes sure that you don't get duplicates, but the UNSEEN should ignore old emails anyway. Feel free to experiment.

gmail_sub.sh:

#!/usr/bin/php
set_time_limit(4000);

// Connect to gmail
$imapPath = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'the_new_account@gmail.com';
$password = 'the_account_password';

// try to connect
$inbox = imap_open($imapPath,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());

$emails = imap_search($inbox,'UNSEEN');

$output = '';

foreach($emails as $mail) {

        $headerInfo = imap_headerinfo($inbox,$mail);

        $output .= $headerInfo->subject.'
';

        $emailStructure = imap_fetchstructure($inbox,$mail);

   $output = quoted_printable_decode($output);

   echo $output;
   echo "\r\n";
   $output = '';
}

// close the connection
imap_expunge($inbox);
imap_close($inbox);

Make the script executable:

chmod +x gmail_sub.sh


Test the script is reading the gmail inbox and generating the correct format:

./gmail_sub.sh > gmail_sub.txt

Hopefully the gmail_sub.txt should look something like this:



Create the Oracle schema

You'll need to create an oracle user, tablespace and tables in the database to hold the mail subjects:

SQL> create tablespace support_mail datafile '/u02/oradata/MGADB1/support_mail01.dbf' size 2000m;

Tablespace created.

SQL> create user support_mail identified by a_password default tablespace support_mail;

User created.

SQL> grant dba to support_mail;

Grant succeeded.

SQL> connect support_mail/a_password
Connected.

SQL> create table support_mail(client varchar2(50),hostname varchar2(100),database_name varchar2(20),metric varchar2(50),message varchar2(50),message_date date default sysdate) tablespace support_mail;

Table created.

SQL> alter user support_mail default tablespace support_mail;

User altered.

After some experimentation I changed it around a bit, and created tables to hold current information as well as one table to hold everything. They all look the same:

CREATE TABLE "SUPPORT_MAIL"."ALL_SUBJECTS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_FAILURES"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_SUCCESS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;

CREATE TABLE "SUPPORT_MAIL"."CURRENT_WARNINGS"
   (   "CLIENT" VARCHAR2(50 BYTE),
       "HOSTNAME" VARCHAR2(100 BYTE),
       "DATABASE_NAME" VARCHAR2(20 BYTE),
       "METRIC" VARCHAR2(50 BYTE),
       "MESSAGE" VARCHAR2(50 BYTE),
       "MESSAGE_DATE" DATE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "SUPPORT_MAIL" ;


Create some scripts to pull it all together.

There is the main script to call the php script and load the data into the database. This is scheduled to run every 15 minutes in cron:

read_gmail_sub.sh

#/bin/sh
#
# Script to read email subjects into a database table for reporting
# Andy Horne 18/04/2015
#

#
# Set the environment
#

export ORACLE_SID=MGADB2
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export PATH=$ORACLE_HOME/bin:$PATH

#
# Call the php script to read the emails and put them to a file for loading
#
/home/oracle/mga/akh/gmail_sub.php > /home/oracle/mga/akh/gmail_sub.txt

#
# Tidy the file to remove unwanted characters
#
sed 's\FW: \\g' /home/oracle/mga/akh/gmail_sub.txt > /home/oracle/mga/akh/xx
sed 's?
??g' /home/oracle/mga/akh/xx > /home/oracle/mga/akh/gmail_sub.dat

#
# Call sqlloader to load the data into the database
#
sqlldr support_mail/a_password control=/home/oracle/mga/akh/load_gmail_sub.ctl

#
# Remove the old files
#
export LOGDIR=/home/oracle/mga/akh
rm $LOGDIR/gmail_sub.txt $LOGDIR/gmail_sub.dat $LOGDIR/xx

#
# Tidy the data so all subjects are standardised
#
sqlplus support_mail/a_password <
@/home/oracle/mga/akh/tidy.sql
commit;
exit
EOF


This is the SQL Loader Control file:

--
-- SQL Loader control file to load gmail subjects into database
--
-- Andy Horne 16/04/2015
--
LOAD DATA
infile '/home/oracle/mga/akh/gmail_sub.dat'
append
into table support_mail
trailing nullcols
(client CHAR terminated by ':',
hostname CHAR terminated by ':',
database_name CHAR terminated by ':',
metric CHAR terminated by ':',
message terminated by whitespace)


This is the SQL that tidies the data - it changes everything to upper case, renames some clients to fit with the tables, deletes some rubbish and moves data into current tables. For example, the host space email has 'Over 85%' as a message, but the APEX application shows 'CRITICAL', 'WARNING' or 'FAILURE', so I change that. 'Standby is in sync' is changed to 'SUCCESS', 'Blocking Lock' is changed to 'WARNING' etc.


It should be easy to follow. 

set echo on
--
-- Change message field to upper case
--
update support_mail set message='SUCCESS' where message = 'Success';
update support_mail set message='FAILURE' where message = 'Failed';
--
-- Change message for non-standard reports
--
update support_mail set message='WARNING' where message like 'Ove%';
update support_mail set message='WARNING' where message ='Errors';
update support_mail set message='SUCCESS' where metric like 'Standby is in%';
update support_mail set message='WARNING' where metric like 'Block%';
update support_mail set metric='Host Space' where metric like 'Filesys%';
update support_mail set metric='RMAN Backup' where metric like 'DiskHot%';
update support_mail set metric='Standby Sync' where metric like 'Standby%';
update support_mail set metric='Tablespace Space' where metric like 'Tables%';
update support_mail set metric='Alert Log' where metric like 'Alert%';
update support_mail set metric='RMAN Backup' where metric like '%Backup Report';
update support_mail set metric='RMAN Backup' where metric = 'Rman Backup';
update support_mail set metric='RMAN Backup' where metric = 'RMAN-backup';
update support_mail set message='WARNING' where message='Blocked';
update support_mail set metric='Status Change' where metric like 'Statu%';
update support_mail set message='WARNING' where message = 'Change';
--
-- Change client names to upper case
--
update support_mail set client='NAME' where client like 'Name%';
--
-- Change non-standard host names to standard
--
update support_mail set hostname='oracle01' where hostname='oracle01.au.clientname.com';
update support_mail set hostname='clyora06' where hostname='clyora06.clientname.com.au';
update support_mail set hostname='hkrhel33' where hostname='hkrhel33.clientname.hk';
--
-- Delete non-standard data
--
delete from support_mail where client like 'Old_client%';
delete from support_mail where client = 'EM Incident';
delete from support_mail where client = 'RE';
delete from support_mail where client = 'Re';
delete from support_mail where client = 'Fwd';
delete from support_mail where client is null;
delete from support_mail where database_name like 'Space%';
delete from support_mail where message like 'DataQ%';
delete from support_mail where message = 'status';
delete from support_mail where metric like 'Weekly Ch%';
delete from support_mail where message is null;
commit;
--
-- Move data into current tables
--
insert into current_failures (select * from support_mail where message='FAILURE');
insert into current_warnings (select * from support_mail where message='WARNING');
insert into current_success (select * from support_mail where message='SUCCESS');
--
-- Move all data into history table and create a backup
--
insert into all_subjects select * from support_mail;
insert into all_subjects_backups as select * from support_mail;
commit;
--
-- Remove latest data ready for next load
--
truncate table support_mail;
--
-- delete duplicates
--
delete from all_subjects
         where rowid not in ( select min(rowid)
       from all_subjects
   group by client,hostname,database_name,metric,message,to_char(message_date,'dd-mm-yy'));

Job to delete old entries from current tables, run at 17:00 each day:

delete_old_mail.sh

#!/bin/sh
#
# Remove yesterday's entries from current tables
# Andy Horne 26/04/2015
#
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export ORACLE_SID=MGADB2
export PATH=$ORACLE_HOME/bin:$PATH
sqlplus support_mail/a_password <
@/home/oracle/mga/akh/del.sql
commit;
exit
EOF

del.sql

delete from current_warnings where message_date <= sysdate-1;
delete from current_failures where message_date <= sysdate-1;
delete from current_success where message_date <= sysdate-1;

So now you should have 3 tables that show the latest status of all the clients - current_success, current_warnings and current_failures. Data is deleted from these at 17:00 each day so that the issues for the current day are shown.

You also have a table that contains every email ever loaded - called all_subjects. This allows for historic reporting.

There is also a table called support_mail that is loaded directly from the emails but gets truncated at the end of the process so is empty most of the time.

Now you need the APEX application to show the information. It's a really basic page that has 3 report regions - 1 for success (or clients with no current issues), one for warnings and one for critical issues.

I won't go into details on creating the main APEX app page, you can go through some previous articles of mine or trawl the web, but this is the SQL I use to show the status with colours depending on the status (Red, Orange and Green):

select distinct client,metric,'' details,
case when client is not null
then 'red'
end as fcolor
 from current_failures
where to_char(message_date) > sysdate - 1

select distinct client,metric,'' Details,
case when client is not null
then 'DarkOrange'
end as fcolor
 from current_warnings
where to_char(message_date) >= sysdate - 1

select distinct client, 
case when client is not null
then 'Green'
end as fcolor
from all_subjects
where client not in (select client from current_failures)
and client not in (select client from current_warnings)
order by 1


Orange shows up better than yellow.

So that will show the current warnings, failures for clients, as well as the clients that are OK.

For the next page, you need to create an APEX blank page, then add a chart region and 2 report regions. This page shows a pie chart for both warnings and failures and 2 reports with the client names and how many warnings and failures they've had.

The chart SQL is :

select null link,
decode(message,'WARNING','WARNINGS','FAILURE','FAILURES') label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message in ('WARNING','FAILURE')
group by null,message

Change the time scale if you only want to see the past week.

The reports SQL is:

select client,count(message) from all_subjects
where message ='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,message

order by count(metric) desc

and

select client,count(message) from all_subjects
where message ='FAILURE'
and MESSAGE_DATE >= sysdate-31
group by client,message

order by count(metric) desc


To show this:




The next page shows a pie chart with how many warnings each client had, and also a break down of the type.

The chart SQL:

select null link,
client label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message ='WARNING'

group by client,message

The report SQL:

select client,metric,count(metric) from all_subjects
where message='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,metric

order by count(metric) desc

This is the result:





Create another page, and substitute 'WARNING' for 'FAILURE' in the SQL.

So now you have an APEX application with 4 pages:

A page listing the clients with issues or no issues.
A page with a chart showing all warnings and failures and 2 tables with the number of issues
A page with a chart showing just warnings by client and a table with the type of warning.
A page with a chart showing just failures by client and a table with the type of failure.

You can use a browser plug-in to cycle around these pages. I use the Tab Slideshow plug-in which is available for Chrome : https://chrome.google.com/webstore/detail/tab-slideshow/loepeenhjndiclafjgoackjblfhonogb?hl=en

and Firefox : https://addons.mozilla.org/en-us/firefox/addon/tab-slideshow/

Install these, then open each of the 4 APEX pages in a browser and change the settings of Tab Slideshow to cycle through the tabs as often as you like - 30 seconds seems to work OK.

If the APEX session times out and prompts for a login, you need to login as the admin user and change the default timeout setting to 0:

http://apex_host:port/apex/apex_admin

Click on 'Manage Instance', then 'Security', then under 'Session Timeout' set both to '0':



You can then either run this off a laptop / PC that is connected to a TV in the office, or use a Chromecast to send it to a TV from a PC.


This is useful as a display to show the current status, but I also wanted to be able to drill-down a bit and show some history for clients to be able to show trends or see which issues each client had. 

So using the same tables and basic APEX pages, I created some new pages and buttons for navigation.