I have recently been working at a site with an Exadata, and it has several components which can be connected to with putty - compute nodes, storage nodes as well as a recovery appliance.
There is a command called 'dcli' which can be run from one host to remotely run the 'cellcli
command on the other nodes, which makes it easier to run checks without having to connect to each host.
To set it up, you will need to generate a public key on the source host if one doesn't exist. There is another post here (http://yetanotheroracledbablog.blogspot.com.au/2010/05/using-scp-without-prompting-for.html) that tells how to do that, but it's basically:
Log into the source host as root, cd to the '.ssh' directory.
Run this:
ssh-keygen -t rsa
Hit enter at every prompt.
It will create a text file called
id_rsa.pub
That contains lots of characters like this:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4QAwvIhn421tE51yx...NfqvWdBRUBIuNGUjhW1Rh05E2c/T7tW8pgphBX58EfceY255N4Q== root@source_host
Copy this text, log into the destination hosts as root, cd to the '.ssh' directory and there should be an 'authorized_keys' file. If not, create it.
Paste the text at the end of the file as the last entry.
You can now go back to the source host and try an 'ssh' as root. It may respond with a
"The authenticity of host.."
message, enter 'yes'. It should not prompt for the password.
Repeat for all the cells on the Exadata.
Now that you can ssh to each host without a password, you can create a text file with the names of the hosts.
Navigate to the
/opt/oracle.SupportTools/onecommand
directory
Create a text file called 'cell_group' and enter all the cell host names:
prodcell01
prodcell02
prodcell03
prodcell04
devcell01
devcell02
devcell03
devcell04
etc
You can now use the dcli command and call this file, and it will run on every cell:
dcli -g cell_group -l root "cellcli -e list CELL"
prodcell01: prodcell01 online
prodcell02: prodcell02 online
prodcell03: prodcell03 online
prodcell04: prodcell04 online
devcell01: devcell01 online
devcell02: devcell02 online
devcell03: devcell03 online
devcell04: devcell04 online
zdlracell01: zdlracell01 online
zdlracell02: zdlracell02 online
dcli -g cell_group -l root "cellcli -e LIST ALERTHISTORY WHERE endtime=null"
zdlracell02: 3 2016-10-19T17:10:16+11:00 critical "Disk controller was hung. Cell was power cycled to stop the hang."
This makes it easy to run commands across all hosts, and I also created a script to perform various checks:
#!/bin/sh
#
# Command to list all exadata cells
#
#
echo -- Checking Cells are Online
echo
dcli -g cell_group -l root "cellcli -e list CELL"
echo
echo -- Checking Cell Disk Status
echo
dcli -g cell_group -l root "cellcli -e list celldisk"
echo
echo -- Checking for Alerts
dcli -g cell_group -l root "cellcli -e list ALERTHISTORY WHERE severity = 'critical' AND examinedBy = '' DETAIL"
dcli -g cell_group -l root "cellcli -e LIST ALERTHISTORY WHERE endtime=null"
echo
echo -- No entries means there are no outstanding alerts