Yet Another Oracle DBA Blog: Checking Netbackup logs

A number of sites we look after use Symantec Veritas Netbackup as the Media Manager for RMAN backups. Many times I'll kick off a restore and have it sitting there without knowing what's going on.

Netbackup has logs which detail the process so you can see exactly what the restore is doing. The logs are stored here:

/usr/openv/netbackup/logs/user_ops/dbext/logs

The file name format is a bit cryptic, but you generally want the most recent:

2355 May 17 19:32 9613.0.1274086925
2355 May 17 21:30 9613.0.1274094025
2355 May 17 22:23 9613.0.1274097219
2355 May 18 07:40 28094.0.1274130593

The latest one is a log from a restore I was running that was hanging on a controlfile restore. Our backups are put to disk, then written to tape after 2 weeks, so I couldn't understand why this was taking so long (the file was from a week before). This is the log:

Restore started Tue May 18 07:09:53 2010

07:09:54 (4538920.xxx) Restore job id 4538920 will require 1 image.
07:09:54 (4538920.xxx) Media id BD0078 is needed for the restore.

07:09:58 (4538920.001) Restoring from image created Mon May 10 11:42:13 2010
07:10:00 (4538920.001) INF - Data socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28452274130599746506000000000-c9aGK3;d86e0e6cce7b3e
71ed1e7b93ed529939;4;3600
07:10:00 (4538920.001) INF - Name socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28453274130599795954000000000-Z9aOK3;7dd95d7b3700c6
50e74388642ac3522f;4;3600
07:10:00 (4538920.001) INF - If Media id BD0078 is not in a robotic library administrative interaction may be required to satisfy this
mount request.
07:10:00 (4538920.001) INF - Job id = 4538920
07:10:00 (4538920.001) INF - Backup id = dbserver_1273455733
07:10:00 (4538920.001) INF - Backup time = 1273455733
07:10:00 (4538920.001) INF - Policy name = CIS_oracle_policy
07:10:01 (4538920.001) INF - Snapshot = 0
07:10:01 (4538920.001) INF - Frozen image = 0
07:10:01 (4538920.001) INF - Backup copy = 0
07:10:01 (4538920.001) INF - No drives available or requesting resources
07:10:01 (4538920.001) INF - Master server = master
07:10:01 (4538920.001) INF - Media server = nbserver
07:10:01 (4538920.001) INF - New data socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28451274130599694192000000000-o8ayK3;550ef0e07a
989118940acf8578bb2e6f;4;3600
07:10:01 (4538920.001) INF - Encrypt = 0
07:10:01 (4538920.001) INF - Use shared memory = 0
07:10:02 (4538920.001) INF - Restore id = 4538920.001
07:10:02 (4538920.001) INF - Encrypt = 0
07:10:02 (4538920.001) INF - Client read timeout = 3600
07:10:02 (4538920.001) INF - Media mount timeout = 1800
07:10:02 (4538920.001) INF - client = dbserver
07:10:02 (4538920.001) INF - requesting_client = dbserver
07:10:02 (4538920.001) INF - browse_client = dbserver
07:39:59 (4538920.001) ERR - Timed out waiting for the media to be mounted and positioned.
07:40:05 (4538920.001) The following files/folders were not restored:
07:40:05 (4538920.001) UTF - /s7ldas09_1_1
07:40:06 (4538920.001) Status of restore from image created Mon May 10 11:42:13 2010 = timed out waiting for media manager to mount vo
lume

07:40:07 INF - Server status = 5
07:40:07 (4538920.xxx) INF - Status = the restore failed to recover the requested files.

It's fairly obvious what happened - the tape isn't in the library so the restore timed out. This is a job for the team that looks after Netbackup - I have to ask why the file went to tape after just 1 week, and why the tape was not in the library.

Monday, May 17, 2010

Checking Netbackup logs