Yet Another Oracle DBA Blog: 2010

Sunday, October 3, 2010

Displaying an Image in APEX V4

I wanted to display an image in an Apex application in V3 but gave up because it was too hard (it was only for fun, didn’t need it).

Apex V4 now has a ‘display image’ item which makes it very easy.

First, you need to create a table in the database to hold the image. This in itself isn’t straight-forward, but this works:

Create the table:

create table pic(name varchar2(100),image blob);

Create the directory:

CREATE OR REPLACE DIRECTORY DOCUMENTS AS '/u01/app/oracle/dba/work/andy';

Make sure the picture file is in the directory you created:

pwd
/u01/app/oracle/dba/work/andy

ls rabbit.jpg
rabbit.jpg

Create the SQL to load the file into the database:

declare
l_blob blob;
l_bfile bfile;
begin
insert into pic values ( 'RABBIT', EMPTY_BLOB() )
returning image into l_blob;
l_bfile := bfilename( 'DOCUMENTS', 'rabbit.jpg' );
dbms_lob.fileopen( l_bfile );
dbms_lob.loadfromfile( l_blob, l_bfile, dbms_lob.getlength( l_bfile ) );

dbms_lob.fileclose( l_bfile );
end;

Note – the ‘RABBIT’ in the insert line is just a name, the ‘DOCUMENTS’ is the name of the directory you created, the ‘rabbit.jpg’ is the name of the picture file.

Run the SQL:

SQL> @load_pic

PL/SQL procedure successfully completed.

Check to see there is a column in the table:

SQL> select name from pic;

NAME
------
RABBIT

Commit, then exit from SQLPlus.

You then need to go into APEX and edit the application. Create a new page or a new region on a page, and create a new item. There is an item called "Display Image" so select that. You need to give it a name and then use the drop-down selection and choose "BLOB column returned by SQL Statement".

This is the SQL Statement to use:

select image from pic

Click "Next", then "Create Item" and "Run" the page. The image should be displayed.

Wednesday, September 22, 2010

Script to create one user like another

This is from one of my colleagues.

We are often asked to create a new user based on an existing user, and we usually use Quest TOAD which has this feature. However, we look after a number of sites that don't have TOAD, and my colleague found this script to generate the SQL to create a new user:

set pages 0 feed off veri off lines 500
accept oldname prompt "Enter user to model new user to: "
accept newname prompt "Enter new user name: "
accept psw prompt "Enter new user's password: "
spool newuser.sql
-- Create user...
select 'create user &&newname identified by &&psw'||
' default tablespace '||default_tablespace||
' temporary tablespace '||temporary_tablespace||' profile '||
profile||';'
from sys.dba_users
where username = upper('&&oldname');
-- Grant Roles...
select 'grant '||granted_role||' to &&newname'||
decode(ADMIN_OPTION, 'YES', ' WITH ADMIN OPTION')||';'
from sys.dba_role_privs
where grantee = upper('&&oldname');
-- Grant System Privs...
select 'grant '||privilege||' to &&newname'||
decode(ADMIN_OPTION, 'YES', ' WITH ADMIN OPTION')||';'
from sys.dba_sys_privs
where grantee = upper('&&oldname');
-- Grant Table Privs...
select 'grant '||privilege||' on '||owner||'.'||table_name||' to
&&newname;'
from sys.dba_tab_privs
where grantee = upper('&&oldname');
-- Grant Column Privs...
select 'grant '||privilege||' on '||owner||'.'||table_name||
'('||column_name||') to &&newname;'
from sys.dba_col_privs
where grantee = upper('&&oldname');
-- Set Default Role...
select 'alter user &&newname default role '|| granted_role ||';'
from sys.dba_role_privs
where grantee = upper('&&oldname')
and default_role = 'YES';

Friday, August 27, 2010

APEX Connection Frustration

I've just spent a few hours going round in circles trying to get connection to an APEX application. I developed an application at home and wanted to implement it at work. I downloaded and installed Oracle XE on my work PC, then installed APEX 4.01. This went fine, it installed without a hitch. I exported the Apex application and the data from my home PC, then copied the files and imported the APEX app. The data didn't import - it failed with an invalid header saying it wasn't an export file - I think because I exported from 11g and XE is 10g. Since there were only 3000 rows of data in one table I simply spooled it out to a csv file and used SQLLDR to load it after re-creating the table in the XE database.

I opened the browser, pointed it at the App and it worked fine. After running through a few tests I decided it was OK for others to use. This was where the fun began.

No-one else could connect to it.

I went through the doco, found that I could change the http port using

exec dbms_xdb.sethttpport(8090);

so tried that, thinking that perhaps I was using 8080 for something else.

That didn't work.

Going through the doco and some forums, I found that XE doesn't allow remote connections by default, so I needed another command:

EXEC DBMS_XDB.SETLISTENERLOCALACCESS(FALSE);

Tried again, still didn't work.

I tried resetting the port again with the same result - I could connect to the APEX web page from my PC, but not from any others.

I could ping my IP address, could ping the hostname (after updating /etc/hosts on a server), but could not open the web page, it kept timing out.

I decided to ignore the APEX side and tried to tnsping: I got a "TNS-12535 TNS: Operation Timed Out".

Aha, now I was getting somewhere. I then tried a TELNET xx.xx.xx.xx 1521 - timed out.

So it wasn't APEX, it was the network.

The solution was to open the ports 1521 and 8080 in the firewall on my PC. Once that was done all the other PCs could connect.

Wednesday, June 30, 2010

Upgrading APEX to V4

Not posted for a while, been a bit busy.

Oracle Application Express (APEX) V4 was released recently, and I have a couple of Apps that I've written so decided to upgrade.

Needless to say, I took a backup of the repository database (and an export for good measure). The APEX file is only 89Mb, downloadable from the OTN web site (http://www.oracle.com/technology/products/database/application_express/index.html).

The upgrade is fairly easy, although there are a couple of 'gotchas'. The instructions work well, until you get to the point of upgrading the images directory. This is a well-known 'documentation bug' but I'll repeat it here. The instructions infer that you need to run this:

@apxldimg.sql /u01/app/oracle/product/11.1.0/db_1/apex/v4.0/apex

which is the path is to the extracted directory. However, if you do that you'll get this error:

declare
*
ERROR at line 1:
ORA-22288: file or LOB operation FILEOPEN failed
No such file or directory
ORA-06512: at "SYS.XMLTYPE", line 296
ORA-06512: at line 15

What you should run is this:

@apxldimg.sql /u01/app/oracle/product/11.1.0/db_1/apex/v4.0

So drop off the trailing "apex".

That's gotcha 1.

After installation when I tried to log in to the Admin page I got this error:

"application=4050 workspace=10 Error ERR-1014 Application not found."

There are lots of Google results for this error, but they all seem to go round in circles. The actual fix is on the Metalink site (Note 814858.1).

This was my fault, instead of running the "apexins.sql" script, I copied and pasted the run-time one from the instructions and ran that (apxrtins.sql).

The fix is to simply run another script to upgrade the run-time version to the development one. The script is called 'apxdvins.sql'.

After running this, run the script to change the Admin password, apxchpwd.sql, then connect to the Admin page as you would normally (http://servername:port/apex/apex_admin) and it will allow you to continue. The first screen is a password change screen for the admin user, so you will need to go through that.

There is also a apxconf.sql script if you need to change the port.

I've not had a chance to play with V4 yet, but will try to over the next few weeks.

Thursday, May 27, 2010

Testing a DR with DataGuard

I recently performed a DR test using a DataGuard database, but because it was a test the users wanted to leave the Primary (PROD) database active. I explained that this wasn't a DR test in that case, and that the process and commands we would use for the test would not be relevant for a real DR. They accepted this, and I decided that I'd set up my own DataGuard environment so I could test a 'real' DR scenario. In the meantime, these are the steps I performed to allow the standby database to be opened for the users to be able to connect while leaving the primary production database available for normal business.

This DataGuard environment is using a DataGuard Broker.

For brevity, I'll just include the steps, and not all of the commands.

Set the Oracle environmment and connect to the DataGuard Broker on the primary server:
# dgmgrl
DGMGRL> connect /
Connected

DGMGRL> show configuration

The last line of this command should show this:

Current status for "DGPRI_DATAGUARD":
SUCCESS

Disable the configuration (disable configuration;).

Once this is done, alert log messages will appear regarding loss of connectivity to the standby, this is expected and nothing to worry about:

ARC1: Attempting destination LOG_ARCHIVE_DEST_1 network reconnect (3113)
ARC1: Destination LOG_ARCHIVE_DEST_1 network reconnect abandoned
PING[ARC1]: Error 3113 when pinging standby (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DGSBY_XPT.SYSDOMAIN.LOCAL)(INSTANCE_NAME=DGSBY)(SERVER=dedicated))).

I then checked that the Primary database was OK and that users were still logged in.

On the standby database, you need to issue a command to stop the log apply:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE;

To allow the database to open in READ WRITE mode, issue this command:

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

You then need to stop and restart the standby database.

I checked the primary database to make sure that this command hadn't done something mad like stop and restart the database in standby mode (always best to check), it was fine.

That was basically it - disable the configuration, issue a couple of commands on the standby and then stop and restart the standby. This allowed the testers to pretend the standby was now the Production database for DR purposes, but also allowed the real production database to remain active.

After the test was complete I rebuilt the standby (basically a hot backup and restore of the primary, then running the relevant commands and enabling the broker configuration).

My next test will be to stop my primary test database untidily and start my standby, pretending it is a real loss of Production. I'll update data, then switch it back. Hopefully it all hangs together and this is what would happen in a 'real' DR.

Wednesday, May 19, 2010

I Hate ORA-12514

Whenever a user calls me and tells me they're getting an ORA-12514 when trying to connect I get a little depressed. I find this one of the hardest errors to solve because there are so many places for it to go wrong. Here's a recent example:

I checked the database and it was set to archive log mode, but automatic archiving was not enabled. This usually causes a database to hang, but can also prevent connections (but with a different error). I stopped and restarted the database in archive log mode, enabled automatic archiving, tied reconnecting and voila! It didn’t work.

After a recent migration we had issues with some of the ORACLE_HOMEs on some databases. Back in the dim and distant past the databases used /u01/app/oracle/product/8.1.7 as the ORACLE_HOME, then they were upgraded and moved to /u01/app/oracle/product/8.1.7.4. However, I found that the listener.ora had this database hard-coded in the SID list, but with the old ORACLE_HOME.
I changed the entry for the TST2 database to reflect the new home, reloaded the listener, restarted the database, tried reconnecting and voila! It didn’t work.
I then checked the listener and tnsnames.ora files and saw that the service name had “sysdomain.local’ in it, so I removed that, reloaded the listener, tried to connect again and voila! It didn’t work.

I then checked the parameters in the database, and saw that there was a reference to TST1 in the SERVICE_NAME parameter, so I changed that to TST2, stopped and restarted the database and voila! It worked.

It looks as if this database was created by copying TST1 and recreating the controlfile, but someone left the reference to TST1 in the init.ora file. So even though the listener and tnsnames ora files looked OK, this parameter was over-riding them. If I hadn't have spotted it I may have been still trying to solve this one.

Monday, May 17, 2010

Checking Netbackup logs

A number of sites we look after use Symantec Veritas Netbackup as the Media Manager for RMAN backups. Many times I'll kick off a restore and have it sitting there without knowing what's going on.

Netbackup has logs which detail the process so you can see exactly what the restore is doing. The logs are stored here:

/usr/openv/netbackup/logs/user_ops/dbext/logs

The file name format is a bit cryptic, but you generally want the most recent:

2355 May 17 19:32 9613.0.1274086925
2355 May 17 21:30 9613.0.1274094025
2355 May 17 22:23 9613.0.1274097219
2355 May 18 07:40 28094.0.1274130593

The latest one is a log from a restore I was running that was hanging on a controlfile restore. Our backups are put to disk, then written to tape after 2 weeks, so I couldn't understand why this was taking so long (the file was from a week before). This is the log:

Restore started Tue May 18 07:09:53 2010

07:09:54 (4538920.xxx) Restore job id 4538920 will require 1 image.
07:09:54 (4538920.xxx) Media id BD0078 is needed for the restore.

07:09:58 (4538920.001) Restoring from image created Mon May 10 11:42:13 2010
07:10:00 (4538920.001) INF - Data socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28452274130599746506000000000-c9aGK3;d86e0e6cce7b3e
71ed1e7b93ed529939;4;3600
07:10:00 (4538920.001) INF - Name socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28453274130599795954000000000-Z9aOK3;7dd95d7b3700c6
50e74388642ac3522f;4;3600
07:10:00 (4538920.001) INF - If Media id BD0078 is not in a robotic library administrative interaction may be required to satisfy this
mount request.
07:10:00 (4538920.001) INF - Job id = 4538920
07:10:00 (4538920.001) INF - Backup id = dbserver_1273455733
07:10:00 (4538920.001) INF - Backup time = 1273455733
07:10:00 (4538920.001) INF - Policy name = CIS_oracle_policy
07:10:01 (4538920.001) INF - Snapshot = 0
07:10:01 (4538920.001) INF - Frozen image = 0
07:10:01 (4538920.001) INF - Backup copy = 0
07:10:01 (4538920.001) INF - No drives available or requesting resources
07:10:01 (4538920.001) INF - Master server = master
07:10:01 (4538920.001) INF - Media server = nbserver
07:10:01 (4538920.001) INF - New data socket = nbserver.sysdomain.local.IPC:/tmp/vnet-28451274130599694192000000000-o8ayK3;550ef0e07a
989118940acf8578bb2e6f;4;3600
07:10:01 (4538920.001) INF - Encrypt = 0
07:10:01 (4538920.001) INF - Use shared memory = 0
07:10:02 (4538920.001) INF - Restore id = 4538920.001
07:10:02 (4538920.001) INF - Encrypt = 0
07:10:02 (4538920.001) INF - Client read timeout = 3600
07:10:02 (4538920.001) INF - Media mount timeout = 1800
07:10:02 (4538920.001) INF - client = dbserver
07:10:02 (4538920.001) INF - requesting_client = dbserver
07:10:02 (4538920.001) INF - browse_client = dbserver
07:39:59 (4538920.001) ERR - Timed out waiting for the media to be mounted and positioned.
07:40:05 (4538920.001) The following files/folders were not restored:
07:40:05 (4538920.001) UTF - /s7ldas09_1_1
07:40:06 (4538920.001) Status of restore from image created Mon May 10 11:42:13 2010 = timed out waiting for media manager to mount vo
lume

07:40:07 INF - Server status = 5
07:40:07 (4538920.xxx) INF - Status = the restore failed to recover the requested files.

It's fairly obvious what happened - the tape isn't in the library so the restore timed out. This is a job for the team that looks after Netbackup - I have to ask why the file went to tape after just 1 week, and why the tape was not in the library.

Monday, May 10, 2010

Using SCP without prompting for a password

I often have to scp several files from one server to another, but if it's not set up for it, every file prompts for a password. This is fine if you're only doing 1 or 2, but not if you're copying dozens.

You need to generate a public key and copy it to the destination server.

For example, I have to copy several files from server a (call it source) to server b (call it target).

On source, go to the ~/.ssh directory. If it doesn't exist, create it. If it does exist and there are files in there, it's probably a good idea to take a copy first in case you break something.

Generate a public key like this:

ssh-keygen -t rsa

You should see this:

Generating public/private rsa key pair
Enter file in which to save the key ...

Press Enter, and you should see this:

Enter passphrase (empty for no passphrase):

Press Enter twice.

You should see this:

Your identification has been saved in id_rsa.pub
Your public key has been saved in id_rsa.pub

Copy this file (you can scp it) to the target server, and put it in the /tmp folder there.

Log on to the target server, and cd to the ~/.ssh directory. If there is already an id_rsa.pub file you should rename the one you copied to the /tmp directory, and then move it to the ./ssh directory. If one doesn't already exist it's OK to just move it.

If there is an "authorized_keys" file, take a backup and edit it. Copy the contents of file you copied to the end of this. If there is no "authorized_keys" file in the ~/.ssh directory, just rename your file to "authorized_keys".

Log back onto the source server and try to scp a file - it should copy without prompting for a password.

Thursday, May 6, 2010

Some Oracle Statspack tips

Although AWWR is now available for V10, and is (probably) better than statspack, it is an option that requires a license. I tend to install statspack onto a database so that I can still get performance information. There are a couple of things I've picked up over the years about statspack that are worth looking at.

The first is the statspack level - by default it's level 5, but I prefer level 7, mainly because it lists the top 5 objects.

To see the statspack levels available:

SELECT * FROM stats$level_description ORDER BY snap_level;

Note that V8i has levels 0,5 and 10; V9 has levels 0,5,6,7 and 10.

To change the level you run this command:

exec statspack.snap(i_snap_level => 7, i_modify_parameter => 'true');

If there are a large number of snapshots you may see this when you run an spreport.sql:

Instance DB Name Id Snap Started Level Comment
------------ ------------ ----- ----------------- ----- ----------------------
PRD PRD ##### 02 May 2010 22:00 10

In order to see the snap_ids you will need to change the column format in the spreport.sql (or sprepins.sql) . I tend to copy the $ORACLE_HOME/rdbms/admin/spreport.sql (or sprepins.sql) to a backup copy and make a couple of changes.

Change this line in spreport.sql

column snap_id heading 'Snap|Id' format 9990;

to

column snap_id heading 'Snap|Id' format 9999999990;

Running the spreport.sql again shows this:

Instance DB Name Id Snap Started Level
------------ ------------ ----------- ----------------- -----
Comment
----------------------

PRD PRD 33739 06 May 2010 10:00 10

It's wrapped the columns, but at least the snap_id is now visible.

Another change I like to make is the number of lines in the SQL Statements. By default this is only set to 5. This means that the top SQL sections only show the first 5 lines of the statement, which doesn't really show enough.

Change this line:

define top_n_sql = 5;

to

define top_n_sql = 65;

or how ever many lines you think you want. This will now display more, or all, of the SQL statement which you can copy to run an explain on.

Wednesday, May 5, 2010

How to clone an Oracle Home (10g)

If you need to install a V10 (or above) environment and already have the version and patches somewhere else (but the same OS and level), instead of going through an install you can simply clone an existing ORACLE_HOME.

Rather than re-installing Oracle 10g and it's associated patches each time, it is possible to perform a clone of an exisiting Oracle Home.

1. Take a "tar" of the existing Oracle Home. Using tar will ensure that file permissions and symbolic links (internal to the Oracle Home) are maintained.
tar -cvf oracle_home.tar $ORACLE_HOME
2. Extract oracle_home.tar to the required location (same or different server)
3. Run the clone utility
cd $ORACLE_HOME/clone/bin
perl clone.pl ORACLE_HOME="" ORACLE_HOME_NAME=""

This also saves having to copy the install media onto a server if it's at a remote location.

Note - I haven't tested this, these are instructions from a colleague.

Oracle DataGuard Trouble Shooting

I have implemented a couple of DataGuard environments, and most of the time they tick along just fine. However they do sometimes break, and this is a brief guide to some of the errors I've encountered and how to fix them.

Data Guard Trouble shooting.

Dataguard Broker errors (I’ve removed the text to just show the errors):

DGMGRL> show configuration;
.
.
Warning: ORA-16607: one or more databases have failed

This is a general error showing there is a problem with either the primary or secondary database.

To determine which is causing the issue:

DGMGRL> show database verbose PROD_PRIM;
.
.
Error: ORA-16778: redo transport error for one or more databases

This indicates the logs are not getting to the secondary database.

DGMGRL> show database PROD_SBY;
.
.
Error: ORA-12541: TNS:no listener

This shows the listener on the secondary server is not started.

DGMGRL> show database PROD_SBY;
.
.
Error: ORA-16766: Redo Apply unexpectedly offline

This shows that the redo apply has not been enabled. You need to connect to the PROD_SBY database and issue the
“ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;” command.

DGMGRL> show database PROD_SBY;
.
.
Warning: ORA-16826: apply service state is inconsistent with the DelayMins property

This shows that the redo apply command was issued without the “NODELAY” parameter. You need to connect to the PROD_SBY database, stop the redo apply and then re-enable it with the “NODELAY” parameter:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT FROM SESSION;

Database altered.

Messages may appear in the alert log on the primary:

PING[ARC1]: Error 16146 when pinging standby EAM_SBY
ARC3: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16040)
ARC3: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /u01/app/oracle/admin/PRD/bdump/prd_arc3_10159.trc:
ORA-16040: standby destination archive log file is locked.

ORA-16011: Archivelog Remote File Server process in Error state
FAL[server, ARC3]: Error 16011 creating remote archivelog file 'EAM_SBY'
FAL[server, ARC3]: FAL archive failed, see trace file.
Wed Feb 18 09:10:33 2009
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Wed Feb 18 09:10:33 2009
ORACLE Instance PROD_PRIM - Archival Error. Archiver continuing.

And on the standby:

RFS[240]: Possible network disconnect with primary database
Aborting archivelog file creation:
/s01/oraarch/PROD_SBY/PROD_PRIM19019645361152.arc
If this a network disconnect, then this archivelog will be fetched again
by GAP resolution mechanism.

Not sure why this happens – maybe network activity causing time-outs, or some kind of timing conflict. It seems to resolve itself. Test it by “alter system switch logfile;” on the primary and making sure the new log gets applied on the standby. If it does, all is OK. If not, you may need to re-build the standby or do more investigation.

Using Oracle APEX in an 11g database with the embedded gateway feature

I've used APEX (Oracle Application Express) for a while, since when it was called HTML-DB, and I've written a couple of applications but nothing too involved. In db versions before 11g you had to set up OAS to act as the portal / web service (actually I think 10g had an internal method but it wasn't easy to get going). 11g now has an embedded PL/Sql gateway which doesn't need as much implementation. However, there are some 'gotchas'. Here's some information I've found that I put together on how to get the 11g Embedded Gateway working with APEX:

Application Express 3.2 in 11g Using the embedded gateway

Download the apex32.zip file from Oracle

Extract to a temporary location.

Backup and remove the existing $ORACLE_HOME/apex directory

Copy the new extracted files to the 11g $ORACLE_HOME/apex directory

There are some html help pages in the extracted "install.32/e13366" directory

Run the @apexins script from the $ORACLE_HOME/apex directory:

@apexins tablespace_name tablesspace_name TEMP /i/

tablespace_name = where the objects will be stored

Run the @apxconf script from the $ORACLE_HOME/apex directory as sys

It will prompt for you to create a password for the admin account - note you will
need to change this when you first login from the web page so you might want to make
it a temporary one.

Unlock the anonymous account: ALTER USER ANONYMOUS ACCOUNT UNLOCK;

Check the port : SELECT DBMS_XDB.GETHTTPPORT FROM DUAL;

If 0, no port is set. It needs to be changed using
   EXEC DBMS_XDB.SETHTTPPORT(nnnn);
   COMMIT;

Go to a web page and enter "http://servername:port/apex/apex_admin

It should show a page with a username/login field.

I had a problem with the login button not doing anything when clicked, this fixed it

@apex_epg_config /u01/app/oracle/product/11.1.0/db_1

Other things to try if things don't work, any or all of these may fix things:

--------
DECLARE
ACL_PATH VARCHAR2(4000);
ACL_ID    RAW(16);
BEGIN
-- Look for the ACL currently assigned to '*' and give APEX_030200
-- the "connect" privilege if APEX_030200 does not have the privilege yet.

SELECT ACL INTO ACL_PATH FROM DBA_NETWORK_ACLS
   WHERE HOST = '*' AND LOWER_PORT IS NULL AND UPPER_PORT IS NULL;

-- Before checking the privilege, ensure that the ACL is valid
-- (for example, does not contain stale references to dropped users).
-- If it does, the following exception will be raised:
--
-- ORA-44416: Invalid ACL: Unresolved principal 'APEX_030200'
-- ORA-06512: at "XDB.DBMS_XDBZ", line ...
--
SELECT SYS_OP_R2O(extractValue(P.RES, '/Resource/XMLRef')) INTO ACL_ID
    FROM XDB.XDB$ACL A, PATH_VIEW P
   WHERE extractValue(P.RES, '/Resource/XMLRef') = REF(A) AND
         EQUALS_PATH(P.RES, ACL_PATH) = 1;

DBMS_XDBZ.ValidateACL(ACL_ID);
   IF DBMS_NETWORK_ACL_ADMIN.CHECK_PRIVILEGE(ACL_PATH, 'APEX_030200',
     'connect') IS NULL THEN
      DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(ACL_PATH,
     'APEX_030200', TRUE, 'connect');
END IF;

EXCEPTION
-- When no ACL has been assigned to '*'.
WHEN NO_DATA_FOUND THEN
DBMS_NETWORK_ACL_ADMIN.CREATE_ACL('power_users.xml',
    'ACL that lets power users to connect to everywhere',
    'APEX_030200', TRUE, 'connect');
DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL('power_users.xml','*');
END;
/
COMMIT;

---------

ALTER SYSTEM SET SHARED_SERVERS = 5 SCOPE=BOTH;

---------

Changing the admin password:

SQL> PASSWORD APEX_PUBLIC_USER

or

SQL> @apxchpwd.sql

---------

UPDATE resource_view r
SET r.res=UPDATEXML(res, '/a:Resource/a:Contents/b:acl','

xmlns="http://xmlns.oracle.com/xdb/acl.xsd"
xmlns:dav="DAV:"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/xdb/acl.xsd http://xmlns.oracle.com/xdb/
acl.xsd">

true
ANONYMOUS

',
'xmlns:a="http://xmlns.oracle.com/xdb/XDBResource.xsd" xmlns:b="http://xmlns.oracle.com/xdb/acl.xsd"') WHERE r.any_path = '/sys/acls/ro_an
onymous_acl.xml';

---------

lsnrctl reload

---------

alter user xdb account unlock;
alter user xdb identified by xdb;

Tuesday, May 4, 2010

Oracle Enterprise Manager Desktop Widgets

Oracle has recently made available some desktop widgets that connect to OEM and display the status of monitored instances: http://www.oracle.com/technology/products/oem/widgets/index.html

They are quite neat, but the drill-down only functions if you have the relevant Management Packs installed.

You need to install Adobe Air, but it's really a one-click installation, there's not a lot involved. I generally find desktop widgets are only really useful if you have dual monitors, or a widescreen one, otherwise they get in the way.

Monday, May 3, 2010

Recovery using RMAN through a RESETLOGS

I had to recover a database after a RESETLOGS a while ago and recently found the doco. This is the syntax. I may edit it to include comments later, but I think the commands should be self-explanatory.

rman catalog rman/xxxxx@rmancat

connect target

RMAN> run {
2> allocate channel ch1 type 'SBT_TAPE';
3> set until time "to_date('13-JUL-2009 21:00:00','dd-MON-YYYY hh24:mi:ss')";
4> restore database;
5> sql 'alter database mount';
6> recover database;
7> alter database open resetlogs;
8> }

allocated channel: ch1
channel ch1: sid=1095 devtype=SBT_TAPE
channel ch1: Veritas NetBackup for Oracle - Release 6.5 (2007072323)

executing command: SET until clause
released channel: ch1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of set command at 07/24/2009 11:08:38
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20207: UNTIL TIME or RECOVERY WINDOW is before RESETLOGS time

RMAN> list incarnation of database;

List of Database Incarnations
DB Key Inc Key DB Name DB ID            STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
3794137 3794138 DEVDB 1020948257       PARENT 1          19-JUN-09
3794137 3869825 DEVDB 1020948257       CURRENT 8615505    23-JUL-09

RMAN> set dbid 1020948257;

executing command: SET DBID
database name is "DEVDB" and DBID is 1020948257

RMAN> reset database to incarnation 3794138;

database reset to incarnation 3794138

RMAN> run {
2> allocate channel ch1 type 'SBT_TAPE';
3> set until time "to_date('13-JUL-2009 21:00:00','dd-MON-YYYY hh24:mi:ss')";
4> restore database;
5> sql 'alter database mount';
6> recover database;
7> alter database open resetlogs;
8> }

allocated channel: ch1
channel ch1: sid=1095 devtype=SBT_TAPE
channel ch1: Veritas NetBackup for Oracle - Release 6.5 (2007072323)

executing command: SET until clause

Starting restore at 24-JUL-09

channel ch1: starting datafile backupset restore
channel ch1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /u08/oradata/DEVDB/system.dbf
restoring datafile 00002 to /u08/oradata/DEVDB/undotbs_1.dbf
restoring datafile 00003 to /u08/oradata/DEVDB/sysaux_01.dbf
restoring datafile 00004 to /u09/oradata/DEVDB/data_l_01.dbf
restoring datafile 00005 to /u09/oradata/DEVDB/data_m_01.dbf
restoring datafile 00006 to /u09/oradata/DEVDB/data_s_01.dbf
restoring datafile 00007 to /u10/oradata/DEVDB/indx_l_01.dbf
restoring datafile 00008 to /u10/oradata/DEVDB/indx_m_01.dbf
restoring datafile 00009 to /u10/oradata/DEVDB/indx_s_01.dbf
channel ch1: reading from backup piece DEVDB_db_13_07_2009_18_51_t692131889_s4_p1
channel ch1: restored backup piece 1
piece handle=DEVDB_db_13_07_2009_18_51_t692131889_s4_p1 tag=TAG20090713T185129
channel ch1: restore complete, elapsed time: 00:04:06
Finished restore at 24-JUL-09

sql statement: alter database mount
released channel: ch1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 07/24/2009 11:16:05
RMAN-11003: failure during parse/execution of SQL statement: alter database mount
ORA-00205: error in identifying control file, check alert log for more info

RMAN> run {
2> allocate channel ch1 type 'SBT_TAPE';
3> set until time "to_date('13-JUL-2009 21:00:00','dd-MON-YYYY hh24:mi:ss')";
4> restore controlfile;
5> restore database;
6> sql 'alter database mount';
7> recover database;
8> alter database open resetlogs;
9> }

allocated channel: ch1
channel ch1: sid=1095 devtype=SBT_TAPE
channel ch1: Veritas NetBackup for Oracle - Release 6.5 (2007072323)

executing command: SET until clause

Starting restore at 24-JUL-09

channel ch1: starting datafile backupset restore
channel ch1: restoring control file
channel ch1: reading from backup piece DEVDB_db_13_07_2009_18_51_t692132525_s5_p1
channel ch1: restored backup piece 1
piece handle=DEVDB_db_13_07_2009_18_51_t692132525_s5_p1 tag=TAG20090713T185129
channel ch1: restore complete, elapsed time: 00:00:37
output filename=/u08/oradata/DEVDB/control01.ctl
output filename=/u09/oradata/DEVDB/control02.ctl
output filename=/u10/oradata/DEVDB/control03.ctl
Finished restore at 24-JUL-09

Starting restore at 24-JUL-09

skipping datafile 1; already restored to file /u08/oradata/DEVDB/system.dbf
skipping datafile 2; already restored to file /u08/oradata/DEVDB/undotbs_1.dbf
skipping datafile 3; already restored to file /u08/oradata/DEVDB/sysaux_01.dbf
skipping datafile 4; already restored to file /u09/oradata/DEVDB/data_l_01.dbf
skipping datafile 5; already restored to file /u09/oradata/DEVDB/data_m_01.dbf
skipping datafile 6; already restored to file /u09/oradata/DEVDB/data_s_01.dbf
skipping datafile 7; already restored to file /u10/oradata/DEVDB/indx_l_01.dbf
skipping datafile 8; already restored to file /u10/oradata/DEVDB/indx_m_01.dbf
skipping datafile 9; already restored to file /u10/oradata/DEVDB/indx_s_01.dbf
Finished restore at 24-JUL-09

sql statement: alter database mount

Starting recover at 24-JUL-09

starting media recovery

channel ch1: starting archive log restore to default destination
channel ch1: restoring archive log
archive log thread=1 sequence=712
channel ch1: restoring archive log
archive log thread=1 sequence=713
channel ch1: reading from backup piece DEVDB_arch__t692132627_s6_p1
channel ch1: restored backup piece 1
piece handle=DEVDB_arch__t692132627_s6_p1 tag=TAG20090713T190347
channel ch1: restore complete, elapsed time: 00:00:35
archive log filename=/u01/app/oracle/product/10.2.0/db_1/dbs/archDEVDB_arch712_1_689964513.arc thread=1 sequence=712
archive log filename=/u01/app/oracle/product/10.2.0/db_1/dbs/archDEVDB_arch713_1_689964513.arc thread=1 sequence=713
channel ch1: starting archive log restore to default destination
channel ch1: restoring archive log
archive log thread=1 sequence=714
channel ch1: reading from backup piece DEVDB_arch__t692139952_s9_p1
channel ch1: restored backup piece 1
piece handle=DEVDB_arch__t692139952_s9_p1 tag=TAG20090713T210552
channel ch1: restore complete, elapsed time: 00:00:36
archive log filename=/u01/app/oracle/product/10.2.0/db_1/dbs/archDEVDB_arch714_1_689964513.arc thread=1 sequence=714
media recovery complete, elapsed time: 00:00:00
Finished recover at 24-JUL-09

database opened
new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete
released channel: ch1

Diagnosing a hung Oracle database

I'm not sure where I got this - maybe Metalink, maybe a forum post or possibly another blog.

Spool out relevant V$ VIEWS

WHILE the database is hanging, run the following queries:
SPOOL v_views.log;

SELECT *
FROM v$parameter;

SELECT class, value, name
FROM v$sysstat;

SELECT sid, id1, id2, type, lmode, request
FROM v$lock;

SELECT l.latch#, n.name, h.pid, l.gets, l.misses,
l.immediate_gets, l.immediate_misses, l.sleeps
FROM v$latchname n, v$latchholder h, v$latch l
WHERE l.latch# = n.latch#
AND l.addr = h.laddr(+);

SELECT *
FROM v$session_wait
ORDER BY sid;

/* repeat last query 3 times - we want to see who's repeatedly waiting*/

SPOOL OFF;

If a specific session hangs do these queries:

SPOOL session.log;

- Get the sid from the pid or vice versa:
SELECT s.sid, p.spid
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND ... < p.spid = or perhaps
s.sid = >

- What SQL statement is being executed?
SELECT s.sid, s.status, q.sql_text
FROM v$session s, v$sqltext q
WHERE s.sql_hash_value = q.hash_value
AND s.sql_address = q.address
AND s.sid =
order by q.piece;

-Get output from V$SESSION_WAIT
column sid format 990
column seq# format 99990
column wait_time heading 'WTime' format 99990
column event format a30
column p1 format 9999999990
column p2 format 9999999990
column p3 format 9990
select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait
where sid=
order by sid;
*** It is needed to run this statement a few times to see if the wait events actually change

- What is LOCKWAIT column set to in V$SESSION?
SELECT lockwait
FROM v$session
WHERE sid = ;

- If not null, who's holding what lock on which object?
col Username format A15
col Sid format 9990 heading SID
col Type format A4
col Lmode format 990 heading 'HELD'
col Request format 990 heading 'REQ'
col Id1 format 9999990
col Id2 format 9999990

select SN.Username, M.Sid, M.Type,
DECODE(M.Lmode, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(Lmode,'990'))) Lmode,
DECODE(M.Request, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(M.Request, '990'))) Request,
M.Id1, M.Id2 from V$SESSION SN, V$LOCK M
WHERE (SN.Sid = M.Sid and M.Request ! = 0)
or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)
in (select S.Id1, S.Id2 from V$LOCK S where Request != 0 and S.Id1
= M.Id1 and S.Id2 = M.Id2) ) order by Id1, Id2, M.Request;

- What is LATCHWAIT column set to in V$PROCESS?
SELECT latchwait
FROM v$process
WHERE spid = ;SELECT latchwait
FROM v$process
WHERE spid = ;

- If not null, who's holding what latch?
column name format a32 heading 'LATCH NAME'
column pid heading 'HOLDER PID'
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%' order by a.latch#;
- Did the lock holder or latch holder shut off his/her terminal without logging off? This can leave a zombie shadow
process holding the resource.
Kill it with "alter system kill session ''".
If on sqlnet 2.1 or above, try setting sqlnet.expire_time in the sqlnet.ora on the server side. (For more info, see page A-16 of "Understanding SQL*NET" v2.1).

- If session is not frozen but slow, get session-specific statistics:
SELECT s.sid, s.value, t.name
FROM v$sesstat s, v$statname t
WHERE s.statistic# = t.statistic#
AND s.sid = ;

- If session is extremely slow or truly frozen, get session-specific wait info:
SELECT *
FROM v$session_wait
where sid = ;

- repeat above query 3 times.

SPOOL OFF;

- If this is distributed transaction run the following statements from EVERY NODE involved in the transaction:
SPOOL 2pc.log;
SELECT * FROM dba_2pc_pending;
SELECT * FROM pending_sessions$;
SELECT * FROM pending_sub_sessions$;
SELECT * FROM dba_2pc_neighbors;
SPOOL OFF;

The following spool will help determine the cause in MTS.
SPOOL mts.log
- How busy the dispatchers are:
select name,network,status,
(busy /(busy + idle)) * 100 "% of time busy" from v$dispatchers
- query the shared servers view V$SHARED_SERVERS:
select name,status,requests, (busy /(busy + idle)) * 100 "% of time busy"
from v$shared_servers
Spool off
4. Gather OS-Level Info

***RDA output gives the most comprehensive data on OS and database configuration

For information about how to download, install and run the RDA scripts refer to Note 314422.1 "Remote Diagnostic Agent (RDA) 4.0 - Overview"

a. Briefly describe your architecture. Include number of CPUs / disks. Are you using raw devices, NFS mounted
files, shared disks, RAM disks...Are you mirroring? etc.

b. Measure unusual OS level activity: excessive CPU or i/o, paging, swapping, etc. There are other monitor tools such
as TOP. Sun has utilities on their webpage that can be downloaded for monitoring the system.
If you're not comfortable with the utilities listed below, be sure to do step 3 and, if you have isolated a troubled process, the v$sesstat output in step.
- On unix: use SAR, VMSTAT and NETSTAT (man sar)
- You can use truss, trace (man truss) or oradbx (call support) for tracing specific processes.
- You can also unix debuggers such as dbx, xdb, adb or truss to dump process activity. Example:
dbx -a /*attach to hanging process*/
where /*dump Oracle stack */
detach /*detach from process*/

truss -o /tmp/mytruss -fae -p
- On vms: use MONITOR (help monitor)
- you can use analyze/process (help analyze) or orambx (call support) for tracing specific processes.

- On novl: use LOAD MONITOR (gui - no online documentation)

- On Windows NT: use Performance Monitor, Event Monitor, Dr. Watson (see Resource kit for documentation).
- use pview for monitoring specific processes (resource kit).

c. Check the system log for additional information. On many platforms it is at /var/adm the files are the message files.
5. Get SYSTEMSTATE and HANGANALYZE Dumps
This creates a large trace file in the user_dump_dest (30M or more is not unusual).
Note: the init.ora parameter MAX_DUMP_FILE_SIZE controls the maximum trace file size. Using Oradebug and setting unlimit will allow a complete dump which we will need.

Do this step for sure if the entire database is frozen or nearly frozen and if this condition came on suddenly and there are no archive errors in the alert log. Please note: As systemstate dumps are instance specific, they tend to be inconclusive with hanging problems involving Oracle Parallel Server (OPS) unless you get them from each node. You will need 3 system state dumps from each node for OPS.

NOTE: Do systemstate dump 3 times in a row, so we can determine whether
the processes are hung or active.

For Oracle 8.0.5.x to 8.1.7.x
$ svrmgrl
svrmgr> connect internal

svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME HANGANALYZE LEVEL 3';

EXIT ... then reconnect

svrmgr>ALTER SESSION SET MAX_DUMP_FILE_SIZE=UNLIMITED;
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';
wait 90 seconds
svrmgr>ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 10';

Thursday, April 29, 2010

How much faster is DataPump compared to Export?

I've done a test to find out. I took a full datapump export of a 9.5GB database followed by an old-style export.

The results are pretty conclusive, but obviously for a better test you should do more databases, different sizes and different options:

DataPump - started at 15:12:00, finished at 15:41:34, so elapsed time was 29 mins and 34 secs
Export - started at 09:32:56 and finished at 10:27:26, so elapsed time was 54 mins and 30 secs

Later on I'll do an Import and post the results.

Monday, April 19, 2010

Tracing listener connections

We've hit an issue were clients are being disconnected from the database, but there are no error messages apart from ORA-03113 and ORA-03114 (which basically means "you have lost your connection to the database" - duh).
There are several odd things about these disconnects:

Idle connections aren't affected, so it's not a time-out issue.
Multiple PCs get disconnected at the same time, so it's not as if a session hits a profile limit and that gets killed.
Only one application is affected, no other uses have reported issues.
It's not limited to one geographical location.
It happens at (seemingly) random times.
It doesn't matter what process or screen the user is accessing.

I've set tracing on at the listener level to see if anything gets written there - there is nothing in the listener.log.
I originally set the level to ADMIN, but that generated 20Mb of trace in 20 minutes, so I dropped it to USER.

The line goes into the listener.ora file (find it by doing lsnrctl stat):

trace_level_listener=USER

The listener then needs to be reloaded to put it into effect:

lsnrctl reload

lsnrctl stat

will show the trace file location:

Listener Trace File       /u01/app/oracle/product/9.2.0/network/trace/listener.trc

Since this is going to get large, I have put in a cron job to cycle it every hour. The drop outs tend to occur overnight.

This is the script that compresses and cycles the log:

#!/bin/ksh
export tstamp=`date +%d_%m_%Y_%H%M`
     cd /u01/app/oracle/product/9.2.0/network/trace
     if [[ -f listener.trc ]] ; then
         # copy current file to time-stamped one
         cp listener.trc listener_${tstamp}.trc
         # Zero the trace file
         cat /dev/null > listener.trc
         # compress any dated logs
         find . -name "listener_*.trc" -exec compress -f {} \;
     fi

Sunday, April 11, 2010

My favourite tablespace space script

I got this from the asktom.com site, and it's the best one I've found for showing tablespace space usage:

column dummy noprint

column pct_used format 999.9 heading "%|Used"

column name format a16 heading "Tablespace Name"

column Kbytes format 999,999,999 heading "KBytes"

column used format 999,999,999 heading "Used"

column free format 999,999,999 heading "Free"

column largest format 999,999,999 heading "Largest"

break on report

compute sum of kbytes on report

compute sum of free on report

compute sum of used on report

select nvl(b.tablespace_name,

nvl(a.tablespace_name,'UNKOWN')) name,

kbytes_alloc kbytes,

kbytes_alloc-nvl(kbytes_free,0) used,

nvl(kbytes_free,0) free,

((kbytes_alloc-nvl(kbytes_free,0))/

kbytes_alloc)*100 pct_used,

nvl(largest,0) largest

from ( select sum(bytes)/1024 Kbytes_free,

max(bytes)/1024 largest,

tablespace_name

from sys.dba_free_space

group by tablespace_name ) a,

( select sum(bytes)/1024 Kbytes_alloc,

tablespace_name

from sys.dba_data_files

group by tablespace_name )b

where a.tablespace_name (+) = b.tablespace_name;

The only problem I've found is that it's very slow in V10, but fine in all other versions.

How to run an OS command from within Oracle

This is something that comes up occasionally, and I always end up searching Google to find the answer, so I documented it the last time so I had the full instructions. Here they are.

The working directory is important:

cd /u01/app/oracle/dba/work/andy

Create the program in this directory, vi a file and call it shell.c :

#include

void sh(char *command) {

int num;

num = system(command);

}

Compile the program:

/usr/local/bin/gcc -G -c shell.c

ld -r -o shell.so shell.o

chmod 775 shell.so

Create the directory and the library in the database:

SQL> create directory shell_lib as '/u01/app/oracle/dba/work/andy';

SQL> create library shell_lib as '/u01/app/oracle/dba/work/andy/shell.so';

SQL> /

Create the procedure:

SQL> CREATE OR REPLACE PROCEDURE shell(command IN char)

> AS EXTERNAL

> NAME "sh"

> LIBRARY shell_lib

> LANGUAGE C

> PARAMETERS (command string);

> /

Set up the listener.ora and tnsnames.ora:

listener.ora:

TSTLSNR =

(ADDRESS_LIST =

(ADDRESS= (PROTOCOL= IPC)(KEY=EXTPROC))

(ADDRESS= (PROTOCOL= TCP)(HOST=10.208.10.1)(PORT=1525))

)

SID_LIST_TSTLSNR =

(SID_LIST =

(SID_DESC =

(SID_NAME = andy)

(ORACLE_HOME = /u01/app/oracle/product/8.1.7)

)

(SID_DESC =

(SID_NAME = PLSExtProc)

(ORACLE_HOME = /u01/app/oracle/product/8.1.7)

(PROGRAM = extproc)

)

tnsnames.ora:

EXTPROC_CONNECTION_DATA =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))

(CONNECT_DATA = (SID = PLSExtProc)(SERVER=DEDICATED))

)

andy =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(Host = 10.208.10.1)(Port = 1525))

(CONNECT_DATA = (SID = andy))

)

Run the procedure:

SQL> exec shell('ls');

andyalterTablespace.sh cr_db.sh logs new_temp.sql shell.c

andyrun.sh cr_user.sql mga_performance.dmp performance shell.o

andyrun1.sh data mga_performance.log replace_temp.log shell.so

andyrun2.sh listener.andy1 new scripts

PL/SQL procedure successfully completed.

So this should work, just make sure all the elements are present. One thing to double-check is the names - a typo or the wrong directory or script name will cause this not to work and you could spend ages trying to figure out where it's gone wrong.

Quick Tip - log on to sqlplus as sysdba in Windows

Most of the sites I work with (99% of them) are Unix (mainly Solaris, some AIX, HP-UX, Red Hat Linux, Oracle Enterprise Linux), but there are a couple that are running Oracle on Windows (ugh!).

I always forget how to log on to sqlplus as sysdba, so this is a reminder for me:

c:\> sqlplus / nolog

sql> connect sys/password@SID as sysdba

Flash Recovery Area full - ORA-19815

I got alerted over the weekend from a client site with the message "ORA-19815: WARNING: db_recovery_file_dest_size.."

It turns out that there was a new application release going in over the weekend so they had decided to suspend the database backups (without telling me).

This is a fairly easy fix, but it took a while due to the number of archived logs.

In case you're not familiar with the Flash Recovery Area (FRA), the location and size is defined in the init.ora file. Oracle will keep archived logs and disk backups there for as long as it can while maintaining the space. For example, if you give it 200Gb, Oracle will keep old archived logs in the FRA (in case a recovery is needed - doesn't have to go to tape) unless new ones need the space, in which case it will delete them if they have already been backed up. You will see messages in the alert log to the affect "Deleted Oracle managed file....".

What happened this weekend is that because the backups hadn't run, the FRA filled up and there were no old logs that hadn't been backed up for Oracle to delete, so the messages were issued, and the database hangs.

To allow the database to continue, the first thing to do is to issue the command "ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=400G scope=both;" - making sure the 400G is larger than the current setting (show parameter recovery).

This will start the database writing new archived logs again, but the disk will fill up and you'll be back to where you started unless you backup the existing archived logs.

Connect to rman, and run an archived log backup. If the site uses a media manager like Netbackup you'll need to find the parameters, they should be documented (yeah, right!), or find an existing backup script which should have them. If you can't find them you'll need to ask the Netbackup admin for them, but it's something you should know if you support a site.

rman catalog rman/xxxxx@rcat10.world

RMAN> connect target

connected to target database: PROD (DBID=99999999)

RMAN> run {
2> sql 'alter system archive log current';
3> ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE' ;
4> SEND 'NB_ORA_POLICY=Oracle_PROD, NB_ORA_SERV=nbu_server01';
5> BACKUP
6> filesperset 200
7> FORMAT 'al_%s_%p_%t'
8> ARCHIVELOG ALL;
9> RELEASE CHANNEL ch00;
10> }

sql statement: alter system archive log current

allocated channel: ch00
channel ch00: sid=352 devtype=SBT_TAPE
channel ch00: VERITAS NetBackup for Oracle - Release 6.0 (2008012304)

sent command to channel: ch00

Starting backup at 2010-04-11 17:48:12
current log archived
channel ch00: starting archive log backupset
channel ch00: specifying archive log(s) in backup set
input archive log thread=1 sequence=130197 recid=212695 stamp=715864266
input archive log thread=1 sequence=130198 recid=212696 stamp=715864428
input archive log thread=1 sequence=130199 recid=212697 stamp=715864599

etc.

Once the backup has been completed, these logs are flagged as having been backed up by RMAN so Oracle will consider them for deletion. However, this won't happen until you reset the DB_RECOVERY_FILE_DEST_SIZE back to what it was, otherwise Oracle thinks there's still plenty of space.

SQL> ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE=300G scope=both;

Once this is done, the delete messages will show up in the alert log:

Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130013_5vwfhln1_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130014_5vwflfy1_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130015_5vwfo4fy_.arc
Deleted Oracle managed file /FRA/PROD/archivelog/2010_04_09/o1_mf_1_130016_5vwfr3v9_.arc

It's a good idea to tell the site that it's better practice to reschedule rather than suspend backups, or at least let the archived log backups run because they won't impact the database files.

Thursday, April 8, 2010

Performing a compressed pipe export and import

I always forget how to do a compressed pipe export and import, so here are the commands, taken from the orafaq website.

Make a pipe process

mknod exp.pipe p

Start the compress and pipe it in the background

gzip < exp.pipe > scott.exp.gz &

Wait a second or so before kicking off the export

Run the export command

nohup exp userid=scott/tiger file=exp.pipe ... &

That does the export. To import the file, do this:

Make a new pipe process, run the uncompress in the background and then run the import command

mknod imp.pipe p

zcat expdat.dmp.Z > imp.pipe &

imp system/password file=imp.pipe full=Y log=imp.log resumable=y resumable_name=impjob resumable_timeout=7200 buffer=5000000

This should be all you need.

Find the SQL associated with a Unix process

If the server is slow, and you do a "top" or "prstat" and there is a process chewing up CPU, and it's an Oracle process, this is how to see what SQL it's running.

Note that I've truncated some output. Also, it looks a mess, but I can't see any better formatting options.

prstat

PID USERNAME TIME CPU PROCESS/NLWP
25475 oracle 5:52:23 24% oracle/1

ps -ef | grep 25475

oracle 25475 14487 23 Apr 06 ? 352:27 oracleDEV (LOCAL=NO)

sqlplus "/ as sysdba"

select s.username,
s.SID,
s.serial#,
q.SQL_TEXT
from v$session s,
v$process p,
v$sql q
where s.paddr=p.addr
and p.spid=25475
and s.sql_hash_value=q.hash_value;

USERNAME SID SERIAL# SQL_TEXT
MAXIMO 59 22149 select count(*) from trans where message is not null

Since you have the SID and SERIAL#, this session can be killed if required:

alter system kill session '59,22149';

Handy Unix command line script

I don't use this very often, but it's one I keep in my head that I find really useful.

One thing you can use it for is to quickly make backup copies of files in a directory:

for i in `ls *.dbf`
do
cp $i $i.bkp
done

note the backticks, not single quotes in the first line.

You can also use it to change some text in all the files (probably worth taking a backup first):

for i in `ls *.sql`
do
sed 's/PROD/DEV/g' $i > $i,
mv $i, $i
done

Note the file is output to $i, then the $i, moved back over the original.

It can also be used if you copy files from Windows to Unix and they have "^M" at the end of each line:

for i in `ls *`
do
dos2unix $i > $i,
mv $i, $i
done

Ignore the "cannot get terminal type" messages.

RMAN duplicate - RMAN-06025 errors

We do RMAN duplicates of some PROD databases everyday, and for the most part they are pretty reliable.

Recently we started to get these errors:

RMAN-06025: no backup of log thread 1 seq 51996 lowscn 5982564789686 found to restore

Looking it up, we found that the what the Duplicate process does is restore the database files from the backup, connect to the live PROD database to find the latest archived log, then restore the differential logs and apply them. The problem is that the backup finishes around 02:00 but the Duplicates start around 05:00. I don't know why it's starting to happen, maybe there is more activity on the database in the early hours than there used to be so it's generating logs where it didn't before, but obviously the logs aren't backed up, so the error message is issued.

I looked into fixing it up and had a few thoughts. I could put a "SET UNTIL TIME" clause into the Duplicate script, but since the backups could finish at any time I couldn't guarantee that I wouldn't hit the same problem. I could also use a "SET UNTIL SCN" clause, but I would need to amend the script to connect to the PROD database, run a query to determine the SCN of the last backed up archive log, feed that to a variable and pass it to the Duplicate script.

I was about to sit down and write it when I decided there was a better, and a lazier way: I changed the time of the Duplicate job to start soon after the backup completed. Although not an exact science, it's good enough that it hasn't thrown up the RMAN-06025 errors since I changed it 3 weeks ago.

The users don't need up-to-the-minute copies of the databases, so a few hours older makes no difference.

We were also hitting another error which we couldn't figure out. We do Duplicates of the one PROD database to a few different databases on different servers. In one of the logs we were seeing error messages with the names of different database files. For example, on server1 we duplicate PROD to dup1, and on server2 we duplicate PROD to dup2. On server2 we were seeing errors in the log to the effect

"'/u03/oradata/dup1/dup1_data01.dbf' directory doesn't exist"

Which meant that even though the Duplicate script on server2 specifically set AUXNAME to

"/u04/oradata/dup2/dup2_data01.dbf" etc

the job was trying to restore the dup1 file.

We could only surmise that there were 2 Duplicate jobs running around the same time, and RMAN was getting confused as to which files were supposed to be restored where. We rescheduled one of them an hour apart and it fixed it up.

RMAN obviously doesn't keep the script entirely in memory, and if there are 2 Duplicates going at the same time against the same source database it gets confused.

Problem with Database recovery - ORA-00314

We had a hardware failure on the archived log disk (on AIX), which resulted in massive alert logs containing repeating error messages:
"ARC0: Error 19504 creating archivelog file '/d09/oracle/arch/PROD/arch_PROD_93739.arc'
ARC0: Archiving not possible: error count exceeded
"
and

"ARC0: Archiving not possible: No primary destinations".

There were 3 databases writing logs here and all crashed. All databases are V8.1.7.4.

The Unix team advised they would need to rebuild the file system and we would probably lose all the files. See, this is why you put archived logs and data files on separate disks!

Luckily we have cron jobs that scp the archive logs to a separate server every 30 minutes, so these were available.

We tried restarting the databases (after changing the init.ora file to redirect the archive logs to a new disk) - 2 performed crash recovery and restarted straight away.

One of them prompted for a media recovery, so we checked the backup logs and grabbed the previous night's backup and restored it.

Note that since this is an old server the backups are not managed through RMAN, cold backups are done each night.

Tried a restart and recovery and that's where we hit problems - we got a

"ORA-00314: log 1 of thread 1, expected sequence# 77569 doesn't match 77573"

error message.

We looked this up and none of the explanations seemed to match what we knew. The redo logs and control files were from the same backup and roughly the same time, as were all the datafiles.
One suggestion was that the database was up when the files were backed up.

We re-checked the backup logs - the database was shut, the files copied by an OS command, the database restarted.

We restored the files again, tried recovery, same thing.

It was only then that the Unix guy told us he'd looked through his logs and the timings showed the database was open during the backup. I showed him our log and he said "What's that?". It appeared that this server has 2 sets of backup processes (I did say it was old). The one he restored from backed up the files while the database was up, rendering them useless.

We managed to find the correct backup from 3 days before, re-ran the restore and recovery (after shipping the archived logs back to the new archive location from the DR server), and this time it worked.

Lessons learnt:

Talk to each other - we assumed the backup scripts and logs we always check are the 'proper' ones - not an unreasonable assumption. However, if other groups are involved in backups then it pays to double-check their processes and information.

Split data and log files onto different areas - should be standard practice, but you'd be surprised how many sites I see that don't do this. It's a no-brainer.

Protect your archived logs - because we scp these to a separate server every 30 mins we have an off-site copy. They also get backed up from the primary server, but it's always good to have multiple copies. Yes, this can be achieved with duplicate destinations, but this works too.

Don't panic - at one stage I thought all of our backups would be invalid, and we wouldn't be able to recover. This was a Production database so it would have been a bad situation to be in. However, our logs showed that the database was shutdown before the files were backed up and restarted when complete, and the Unix team were able to restore it - the first one they tried was from a different (and useless) process. We check the logs every day and had confidence in the procedure, which we've now proved.

Keep the users informed - I sent email status updates all the time, including the stuff that went wrong. I find that users are very forgiving if they have all the information and regular updates re-assure them that I am working the problem. All they see is their application / database is unavailable, and if they don't hear from the DBA every so often as far as they know he/she's gone off to do something else. Sometimes users can be a pain and want constant updates, but if you get in first and send out a couple they soon feel confident that you're on top of it and will let them know when something changes. If you don't update them they will hassle you.

You can never have too many backups - as well as scp'ing the archived logs to the DR server, they are also backed up to tape. I also take a full export of the databases each night and scp that to the DR server as well (they are small databases). Although an export is not a backup, it's better to tell the users "I can have your database back as of 03:00 yesterday" rather than "Our backups haven't been working, we've lost your database".

There's nothing worse than typing "recover database until cancel;" on a PROD database and seeing an error come back.

But there's nothing better than telling the users "Your database is back up and running again."

Objective of this Blog

I've decided to create this blog to record my DBA activities so that I can keep a record of things I come across, various Oracle DBA issues and solutions, tips and tricks etc. Most of my information is spread across documents, emails, text files etc and I decided I wanted a place to centralise everything.

Whenever I encounter an issue and find a solution, I'll try and put it up here. This is mainly for my benefit, but if it helps other DBAs, all good.