Wednesday, June 5, 2013

I made something run faster

I'm not an expert tuner, but know the basics and where to look. Yesterday someone came to me with a problem - 2 databases, same data, same query. One was taking 2 seconds (that's good), one was taking 15 seconds (that's bad).

I checked the obvious things first and compared the results between the 2 servers:

Hardware
Database Parameters
Amount of data
Explain Plan.
Did a "top"
Ran awrrpt

The hardware on the bad server (memory and CPU) is actually a lot better on than on the good server:

Bad = 48GB of RAM, and 24 x 2.93GHz CPUs
Good = 12GB of RAM, and 16 x 2.93GHz CPUs

(just goes to show that throwing hardware at a problem isn't always the solution).

The database parameters on the bad server were beefed up to take advantage of the hardware.

The "top" command didn't show any runaway processes, or anything untoward.

The explain plan was the same for both servers, and showed the query using an index.

I created a 5 GB file on both - we've had issues with disk I/O on some of these hosts before. The bad server created the file a lot faster, so I ruled out the earlier issues we experienced.

I ran

"analyze index owner.index_name validate structure;"

then a

"select * from index_stats;"

on both servers.

The result (sorry about the formatting):

Bad:

HEIGHT     BLOCKS NAME              LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS   PRE_ROWS PRE_ROWS_LEN OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
    3      19272  DBQ_PROCESS_TIME  190128     18768   6004458     5524          18767      103     601529      8028       184802      5765005         190128        1                 104501316   6605987      7        1            4                   0        0            2              32


Good:

HEIGHT     BLOCKS NAME              LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS   PRE_ROWS PRE_ROWS_LEN OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
    3      17208  DBQ_PROCESS_TIME  9426       16048   401044      6052          16047      87      544198      8028       9392        399684          9426          1                 97820932    945242       1        1            4

The figures from the bad server are a lot higher.

I rebuilt the index on the bad server:

alter index owner.index_name rebuild online parallel 3;

Didn't take long, then re-ran the query - it then ran in 0.01 seconds.

I rebuilt the index the same way on the good server, same result.

I was later told that the application owner did a purge of the table the night before and removed about 10M rows.

If I'd known that before it would have saved me a bit of investigation.