We all monitor the emails, but I wanted to create a 'Management Dashboard' that would show the current status of each client using the email subjects as input, since they are (fairly) standard.
The obvious candidate was an APEX application, and this meant reading the emails into an Oracle database that I could create pages against. As with all my APEX stuff, it starts out with an idea that gets built on as I think of things to add. There is no design involved so once you've read this you may want to put it together differently.
I did some research and it is possible using Gmail and PHP, so this is how I did it.
(Google are notorious for discontinuing services as they see fit, so one day this might stop working).
You'll need an Oracle database on a linux host with APEX installed.
This was the result (I've had to obscure the client names for obvious reasons):
It also has some pages with charts on how many warnings and failures each client has experienced over the previous month and what category (host space, alert log errors etc). This information allows you to see if there is a recurring problem or if (more likely) the thresholds need changing.
The reason these charts are large and there are no buttons for navigation is because this is designed to cycle around each page in turn. This is done with a Chrome or Firefox plug in that I'll describe later.
I have another app that has the same information but also has buttons for historical reporting and showing more details such as the host and database that the alert came from, which I'll detail later as well.
Clients have monitoring scripts installed that scan the
alert logs, databases, report on backups, space etc.
The scripts email a central email address with any issues as
well as status (backup reports).
The emails get forwarded to a newly created gmail account.
A php script running on a linux host reads the emails in the
gmail account, loads the subject information into an oracle database and
deletes the mail from gmail.
A SQL script formats the data (changes lower case to upper
case for example) and shifts the information into different tables for reporting.
An APEX application acts as a GUI front end to the database
to report on the client status.
Set up a new GMAIL account
It's a good idea to create a new gmail account dedicated to this purpose. So go and do that.
Gmail by default has security that prevents script access, so you need to go into the settings and change this. Note that these settings weaken the account security, so make sure that this email account will not contain sensitive mail.
Log into google as the user you created, then allow access:
Then go here and turn on ‘less secure apps’:
This allows a script to connect to the account
and read the mail.
Make sure the email subjects are standard
We have a standard subject format, and if you don't, now is the time to make one. This is crucial to the process, it's how the scripts read the information into the database.
Our format is like this:
CLIENT:HOST:DATABASE:METRIC:STATUS
For example:
ACME:PRODHOST:PRODDB:RMAN BACKUP:SUCCESS
ACME:PRODHOST:PRODDB:ALERT LOG:WARNING
ACME:PRODHOST:PRODDB:HOST SPACE:FAILURE
Create email rules
Assuming each client sends a mail to the same address, you need to create a rule for each client that forwards the emails to the new GMAIL account you created. It could be as easy as creating a rule that says "Forward every mail sent from ACME to the_new_account@gmail.com". Or it may need some tailoring. The key thing is to make sure every alert is also forwarded to the new gmail account. Log into the account after you've received some emails to check they are getting forwarded.
Install php and create the script
Once that is set up, you need to log into your linux host and install some php packages:
yum install php
yum install php*-imap
Once installed, create a php script to connect to the Gmail account and read the email subjects into a flat file that can be loaded into the database. The parameter in use in this script is 'UNSEEN' - i.e only get the unread emails.
There is a command at the end - expunge - which deletes the mail once it's been read. This makes sure that you don't get duplicates, but the UNSEEN should ignore old emails anyway. Feel free to experiment.
gmail_sub.sh:
#!/usr/bin/php
set_time_limit(4000);
// Connect to gmail
$imapPath = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'the_new_account@gmail.com';
$password = 'the_account_password';
// try to connect
$inbox = imap_open($imapPath,$username,$password)
or die('Cannot connect to Gmail: ' . imap_last_error());
$emails = imap_search($inbox,'UNSEEN');
$output = '';
foreach($emails as $mail) {
$headerInfo =
imap_headerinfo($inbox,$mail);
$output .=
$headerInfo->subject.'
';
';
$emailStructure =
imap_fetchstructure($inbox,$mail);
$output =
quoted_printable_decode($output);
echo $output;
echo "\r\n";
$output = '';
}
// close the connection
imap_expunge($inbox);
imap_close($inbox);
Make the script executable:
chmod +x gmail_sub.sh
Test the script is reading the gmail inbox and generating the correct format:
./gmail_sub.sh > gmail_sub.txt
Hopefully the gmail_sub.txt should look something like this:
Create the Oracle schema
You'll need to create an oracle user, tablespace and tables in the database to hold the mail subjects:
SQL>
create tablespace support_mail datafile
'/u02/oradata/MGADB1/support_mail01.dbf' size 2000m;
Tablespace
created.
SQL>
create user support_mail identified by a_password default tablespace
support_mail;
User
created.
SQL>
grant dba to support_mail;
Grant
succeeded.
SQL>
connect support_mail/a_password
Connected.
SQL>
create table support_mail(client varchar2(50),hostname varchar2(100),database_name
varchar2(20),metric varchar2(50),message varchar2(50),message_date date default
sysdate) tablespace support_mail;
Table
created.
SQL>
alter user support_mail default tablespace support_mail;
User
altered.
After some experimentation I changed it around a bit, and
created tables to hold current information as well as one table to hold
everything. They all look the same:
CREATE TABLE "SUPPORT_MAIL"."ALL_SUBJECTS"
( "CLIENT" VARCHAR2(50 BYTE),
"HOSTNAME"
VARCHAR2(100 BYTE),
"DATABASE_NAME"
VARCHAR2(20 BYTE),
"METRIC"
VARCHAR2(50 BYTE),
"MESSAGE"
VARCHAR2(50 BYTE),
"MESSAGE_DATE"
DATE
) SEGMENT CREATION
IMMEDIATE
PCTFREE 10 PCTUSED 40
INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT
1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1
FREELIST GROUPS 1
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE
"SUPPORT_MAIL" ;
CREATE TABLE "SUPPORT_MAIL"."CURRENT_FAILURES"
( "CLIENT" VARCHAR2(50 BYTE),
"HOSTNAME"
VARCHAR2(100 BYTE),
"DATABASE_NAME"
VARCHAR2(20 BYTE),
"METRIC"
VARCHAR2(50 BYTE),
"MESSAGE"
VARCHAR2(50 BYTE),
"MESSAGE_DATE"
DATE
) SEGMENT CREATION
IMMEDIATE
PCTFREE 10 PCTUSED 40
INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT
1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1
FREELIST GROUPS 1
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE
"SUPPORT_MAIL" ;
CREATE TABLE "SUPPORT_MAIL"."CURRENT_SUCCESS"
( "CLIENT" VARCHAR2(50 BYTE),
"HOSTNAME"
VARCHAR2(100 BYTE),
"DATABASE_NAME"
VARCHAR2(20 BYTE),
"METRIC"
VARCHAR2(50 BYTE),
"MESSAGE"
VARCHAR2(50 BYTE),
"MESSAGE_DATE"
DATE
) SEGMENT CREATION
IMMEDIATE
PCTFREE 10 PCTUSED 40
INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT
1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1
FREELIST GROUPS 1
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SUPPORT_MAIL" ;
CREATE TABLE "SUPPORT_MAIL"."CURRENT_WARNINGS"
( "CLIENT" VARCHAR2(50 BYTE),
"HOSTNAME"
VARCHAR2(100 BYTE),
"DATABASE_NAME"
VARCHAR2(20 BYTE),
"METRIC"
VARCHAR2(50 BYTE),
"MESSAGE"
VARCHAR2(50 BYTE),
"MESSAGE_DATE"
DATE
) SEGMENT CREATION
IMMEDIATE
PCTFREE 10 PCTUSED 40
INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT
1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1
FREELIST GROUPS 1
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE
"SUPPORT_MAIL" ;
Create some scripts to pull it all together.
There is the main script to call the php script and load the data into
the database. This is scheduled to run every 15 minutes in cron:
read_gmail_sub.sh
#/bin/sh
#
#
Script to read email subjects into a database table for reporting
#
Andy Horne 18/04/2015
#
#
#
Set the environment
#
export
ORACLE_SID=MGADB2
export
ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export
PATH=$ORACLE_HOME/bin:$PATH
#
#
Call the php script to read the emails and put them to a file for loading
#
/home/oracle/mga/akh/gmail_sub.php
> /home/oracle/mga/akh/gmail_sub.txt
#
#
Tidy the file to remove unwanted characters
#
sed
's\FW: \\g' /home/oracle/mga/akh/gmail_sub.txt > /home/oracle/mga/akh/xx
sed
's?
??g' /home/oracle/mga/akh/xx > /home/oracle/mga/akh/gmail_sub.dat
??g' /home/oracle/mga/akh/xx > /home/oracle/mga/akh/gmail_sub.dat
#
#
Call sqlloader to load the data into the database
#
sqlldr
support_mail/a_password control=/home/oracle/mga/akh/load_gmail_sub.ctl
#
#
Remove the old files
#
export
LOGDIR=/home/oracle/mga/akh
rm
$LOGDIR/gmail_sub.txt $LOGDIR/gmail_sub.dat $LOGDIR/xx
#
#
Tidy the data so all subjects are standardised
#
sqlplus
support_mail/a_password <
@/home/oracle/mga/akh/tidy.sql
commit;
exit
EOF
This is the SQL Loader Control file:
--
--
SQL Loader control file to load gmail subjects into database
--
--
Andy Horne 16/04/2015
--
LOAD
DATA
infile
'/home/oracle/mga/akh/gmail_sub.dat'
append
into
table support_mail
trailing
nullcols
(client
CHAR terminated by ':',
hostname
CHAR terminated by ':',
database_name
CHAR terminated by ':',
metric
CHAR terminated by ':',
message
terminated by whitespace)
This is the SQL that tidies the data - it changes everything to upper case, renames some clients to fit with the tables, deletes some rubbish and moves data into current tables. For example, the host space email has 'Over 85%' as a message, but the APEX application shows 'CRITICAL', 'WARNING' or 'FAILURE', so I change that. 'Standby is in sync' is changed to 'SUCCESS', 'Blocking Lock' is changed to 'WARNING' etc.
It should be easy to follow.
set echo on
--
-- Change message field to upper case
--
update support_mail set message='SUCCESS' where message =
'Success';
update support_mail set message='FAILURE' where message = 'Failed';
--
-- Change message for non-standard reports
--
update support_mail set message='WARNING' where message like
'Ove%';
update support_mail set message='WARNING' where message ='Errors';
update support_mail set message='SUCCESS' where metric like
'Standby is in%';
update support_mail set message='WARNING' where metric like 'Block%';
update support_mail set metric='Host Space' where metric like
'Filesys%';
update support_mail set metric='RMAN Backup' where metric like
'DiskHot%';
update support_mail set metric='Standby Sync' where metric like
'Standby%';
update support_mail set metric='Tablespace Space' where metric like
'Tables%';
update support_mail set metric='Alert Log' where metric like
'Alert%';
update support_mail set metric='RMAN Backup' where metric like
'%Backup Report';
update support_mail set metric='RMAN Backup' where metric = 'Rman
Backup';
update support_mail set metric='RMAN Backup' where metric =
'RMAN-backup';
update support_mail set message='WARNING' where message='Blocked';
update support_mail set metric='Status Change' where metric like
'Statu%';
update support_mail set message='WARNING' where message = 'Change';
--
-- Change client names to upper case
--
update support_mail set client='NAME' where client like
'Name%';
--
-- Change non-standard host names to standard
--
update support_mail set hostname='oracle01' where hostname='oracle01.au.clientname.com';
update support_mail set hostname='clyora06' where
hostname='clyora06.clientname.com.au';
update support_mail set hostname='hkrhel33' where
hostname='hkrhel33.clientname.hk';
--
-- Delete non-standard data
--
delete from support_mail where client like 'Old_client%';
delete from support_mail where client = 'EM Incident';
delete from support_mail where client = 'RE';
delete from support_mail where client = 'Re';
delete from support_mail where client = 'Fwd';
delete from support_mail where client is null;
delete from support_mail where database_name like 'Space%';
delete from support_mail where message like 'DataQ%';
delete from support_mail where message = 'status';
delete from support_mail where metric like 'Weekly Ch%';
delete from support_mail where message is null;
commit;
--
-- Move data into current tables
--
insert into current_failures (select * from support_mail where
message='FAILURE');
insert into current_warnings (select * from support_mail where
message='WARNING');
insert into current_success (select * from support_mail where
message='SUCCESS');
--
-- Move all data into history table and create a backup
--
insert into all_subjects select * from support_mail;
insert into all_subjects_backups as select * from support_mail;
commit;
--
-- Remove latest data ready for next load
--
truncate table support_mail;
--
-- delete duplicates
--
delete from all_subjects
where rowid not in (
select min(rowid)
from all_subjects
group by
client,hostname,database_name,metric,message,to_char(message_date,'dd-mm-yy'));
Job to delete old entries from current tables, run at 17:00
each day:
delete_old_mail.sh
#!/bin/sh
#
# Remove yesterday's entries from current tables
# Andy Horne 26/04/2015
#
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1
export ORACLE_SID=MGADB2
export PATH=$ORACLE_HOME/bin:$PATH
sqlplus support_mail/a_password <
@/home/oracle/mga/akh/del.sql
commit;
exit
EOF
del.sql
delete from current_warnings where message_date <= sysdate-1;
delete from current_failures where message_date <= sysdate-1;
delete from current_success where message_date <= sysdate-1;
So now you should have 3 tables that show the latest status of all the clients - current_success, current_warnings and current_failures. Data is deleted from these at 17:00 each day so that the issues for the current day are shown.
You also have a table that contains every email ever loaded - called all_subjects. This allows for historic reporting.
There is also a table called support_mail that is loaded directly from the emails but gets truncated at the end of the process so is empty most of the time.
Now you need the APEX application to show the information. It's a really basic page that has 3 report regions - 1 for success (or clients with no current issues), one for warnings and one for critical issues.
I won't go into details on creating the main APEX app page, you can go through some previous articles of mine or trawl the web, but this is the SQL I use to show the status with colours depending on the status (Red, Orange and Green):
select distinct client,metric,'' details,
case when client is not null
then 'red'
end as fcolor
from current_failures
where to_char(message_date) > sysdate - 1
select distinct client,metric,'' Details,
case when client is not null
then 'DarkOrange'
end as fcolor
from current_warnings
where to_char(message_date) >= sysdate - 1
select distinct client,
case when client is not null
then 'Green'
end as fcolor
from all_subjects
where client not in (select client from current_failures)
and client not in (select client from current_warnings)
order by 1
Orange shows up better than yellow.
So that will show the current warnings, failures for clients, as well as the clients that are OK.
For the next page, you need to create an APEX blank page, then add a chart region and 2 report regions. This page shows a pie chart for both warnings and failures and 2 reports with the client names and how many warnings and failures they've had.
The chart SQL is :
select null link,
decode(message,'WARNING','WARNINGS','FAILURE','FAILURES') label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message in ('WARNING','FAILURE')
group by null,message
Change the time scale if you only want to see the past week.
The chart SQL is :
select null link,
decode(message,'WARNING','WARNINGS','FAILURE','FAILURES') label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message in ('WARNING','FAILURE')
group by null,message
Change the time scale if you only want to see the past week.
The reports SQL is:
select client,count(message) from all_subjects
where message ='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,message
order by count(metric) desc
and
select client,count(message) from all_subjects
where message ='FAILURE'
and MESSAGE_DATE >= sysdate-31
group by client,message
order by count(metric) desc
select client,count(message) from all_subjects
where message ='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,message
order by count(metric) desc
and
select client,count(message) from all_subjects
where message ='FAILURE'
and MESSAGE_DATE >= sysdate-31
group by client,message
order by count(metric) desc
To show this:
The next page shows a pie chart with how many warnings each client had, and also a break down of the type.
The chart SQL:
select null link,
client label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message ='WARNING'
group by client,message
The report SQL:
select client,metric,count(metric) from all_subjects
where message='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,metric
order by count(metric) desc
This is the result:
Create another page, and substitute 'WARNING' for 'FAILURE' in the SQL.
So now you have an APEX application with 4 pages:
A page listing the clients with issues or no issues.
A page with a chart showing all warnings and failures and 2 tables with the number of issues
A page with a chart showing just warnings by client and a table with the type of warning.
A page with a chart showing just failures by client and a table with the type of failure.
You can use a browser plug-in to cycle around these pages. I use the Tab Slideshow plug-in which is available for Chrome : https://chrome.google.com/webstore/detail/tab-slideshow/loepeenhjndiclafjgoackjblfhonogb?hl=en
and Firefox : https://addons.mozilla.org/en-us/firefox/addon/tab-slideshow/
Install these, then open each of the 4 APEX pages in a browser and change the settings of Tab Slideshow to cycle through the tabs as often as you like - 30 seconds seems to work OK.
If the APEX session times out and prompts for a login, you need to login as the admin user and change the default timeout setting to 0:
http://apex_host:port/apex/apex_admin
Click on 'Manage Instance', then 'Security', then under 'Session Timeout' set both to '0':
You can then either run this off a laptop / PC that is connected to a TV in the office, or use a Chromecast to send it to a TV from a PC.
This is useful as a display to show the current status, but I also wanted to be able to drill-down a bit and show some history for clients to be able to show trends or see which issues each client had.
So using the same tables and basic APEX pages, I created some new pages and buttons for navigation.
The next page shows a pie chart with how many warnings each client had, and also a break down of the type.
The chart SQL:
select null link,
client label,
count(message) value
from all_subjects
where message_date >= sysdate-31
and message ='WARNING'
group by client,message
The report SQL:
select client,metric,count(metric) from all_subjects
where message='WARNING'
and MESSAGE_DATE >= sysdate-31
group by client,metric
order by count(metric) desc
This is the result:
Create another page, and substitute 'WARNING' for 'FAILURE' in the SQL.
So now you have an APEX application with 4 pages:
A page listing the clients with issues or no issues.
A page with a chart showing all warnings and failures and 2 tables with the number of issues
A page with a chart showing just warnings by client and a table with the type of warning.
A page with a chart showing just failures by client and a table with the type of failure.
You can use a browser plug-in to cycle around these pages. I use the Tab Slideshow plug-in which is available for Chrome : https://chrome.google.com/webstore/detail/tab-slideshow/loepeenhjndiclafjgoackjblfhonogb?hl=en
and Firefox : https://addons.mozilla.org/en-us/firefox/addon/tab-slideshow/
Install these, then open each of the 4 APEX pages in a browser and change the settings of Tab Slideshow to cycle through the tabs as often as you like - 30 seconds seems to work OK.
If the APEX session times out and prompts for a login, you need to login as the admin user and change the default timeout setting to 0:
http://apex_host:port/apex/apex_admin
Click on 'Manage Instance', then 'Security', then under 'Session Timeout' set both to '0':
You can then either run this off a laptop / PC that is connected to a TV in the office, or use a Chromecast to send it to a TV from a PC.
This is useful as a display to show the current status, but I also wanted to be able to drill-down a bit and show some history for clients to be able to show trends or see which issues each client had.
So using the same tables and basic APEX pages, I created some new pages and buttons for navigation.