Tuesday, November 01, 2011

Is SQL*Plus Resumable?

I am using resumable space for a lot of my operations. However, it seems as though the SQL*Plus copy command doesn't respect the resumable feature?

Note, I am issuing an "alter session enable resumable timeout 14400" through a login trigger. I know this works because if I do the same type of operation through an INSERT/SELECT through a dblink, my session waits for more space to be added.
user@whouse1.us> select sysdate from dual;

SYSDATE
-------------------
11/01/2011 15:48:45

Elapsed: 00:00:00.03
user@whouse1.us> COPY FROM user1/pw@proddb.us TO user/password@whouse1.us 
append dest_table (f1, f2, f3) using select f1, f2, f3 from source_table;

Array fetch/bind size is 1000. (arraysize is 1000)
Will commit after every 1000 array binds. (copycommit is 1000)
Maximum long size is 80. (long is 80)

ERROR:
ORA-01653: unable to extend table USER.DEST_TABLE by 4096 in tablespace JH_TEST

user@whouse1.us> select sysdate from dual;

SYSDATE
-------------------
11/01/2011 15:49:01

Elapsed: 00:00:00.03
Is RESUMABLE space not available with SQL*Plus copy?

Wednesday, September 28, 2011

Interesting Error

We just upgraded one of our 10.2.0.3 dbs to 11.2.0.2 plus some patches.  Today an interesting ORA-07445 was thrown to the alert.log:

ORA-07445: exception encountered: core dump [__intel_new_memcpy()+382] [SIGILL] [ADDR:0x3FE5CAE] [PC:0x3FE5CAE] [Illegal operand] []

Searching metalink didn't get us anywhere, so we opened a TAR (or iTar, or SR, or ServReq, or whatever the heck they are calling them these days).  While Oracle Support was looking at the issue, I looked at the query generating the error:

SELECT nvl(t.value, null) FROM table t WHERE t.id = 12345;

Huh?  If the value is null, substitute a null?  WTF?

Presto-changeo...

SELECT t.value FROM table t WHERE t.id = 12345;

...and the error goes away.

Is it a bug on Oracle's side? Sure.

Is it a stupid thing to do on our side? You betcha.

Thursday, September 15, 2011

Common Sense

Developer:
I am running this query:
DELETE FROM tab;
It's been running since 2PM yesterday.  Can you make it go faster?

DBA:
You shouldn't delete 80M rows in a single transaction.  You should have used TRUNCATE instead of DELETE.  I have killed your session and it will be a while before it completes.

Developer (5 minutes later):
I tried TRUNCATE but it says my table is locked.

DBA:
I know, the table will be locked up for a while while the transaction rolls back.

Developer:
Can you make it go faster?

DBA:
I have turned on _transmogrify_delete_queries on so that is the best I can do.

Developer:
Thanks!! That should definitely help!!!

Wednesday, June 01, 2011

KEEP Pool

I have a persnickety problem with a particular table being aged out of the buffer cache.  I have one query that runs on a defined basis.  Sometimes when it is run, it does a lot of physical reads for this table.  Other times, it does no physical reads.

So I decided to play around with a different buffer pool and let this table age out of cache on it's own terms rather than competing in the greater default pool.  So far, it seems to be working pretty well.

I'll admit it, it's been a long time since I used a KEEP pool.  Some helpful KEEP pool resources:
Edward Stoever at Database-expert.com
ORAFAQ

James Colestock
asktom.oracle.com

Standard Disclaimer: I understand that 99.9% of the time you don't need a KEEP pool.  In this situation it seems to be warranted.

Wednesday, May 25, 2011

Compression and SE

While researching a corrupt block on 11g SE, we came across a number of objects that were compressed according to the data dictionary.

How could that be?  Compression is not a feature of SE, or so we thought.

The objects in question were all indexes.  In fact, Oracle creates compressed indexes in ?/apex/core/tab.sql even though we are on SE.

Further investigation lead me to Doc 1084132.1 which calls the feature "Basic table compression".

Just something to be aware of.

Thursday, April 28, 2011

Time Spent, Time Wasted

Sometimes tuning a database can be rewarding.  I get great satisfaction when helping a user take their query from 10 minutes down to 3 seconds.  But it doesn't always work that way.  Consider this recent exchange:

Developer to DBA Manager: The database is slow.
DBA Manager to DBA: Trace his process.

The DBA traces the developer's process and comes up with about 12 queries that can benefit from tuning.  The DBA informs both the developer and the DBA Manager that he thinks these 12 queries could benefit from some attention.

Developer to DBA Manager: We don't have time to tune our own queries, we need one of your DBAs to do it for us.
DBA Manager to Developer: But you guys know the model, my guys will end up asking you 100 questions.
Developer to DBA Manager: That's fine, we don't have time to work on it.
DBA Manager to DBA: Start working on tuning the queries.


DBA spends 6 days tuning the 12 queries.  Some of the queries require the SQL to change, some require statistics to be gathered at a different point, and some are just changing how the indexes work.  The DBA informs both the DBA Manager and the developer what needs to change and how exactly to implement the changes from the database perspective.

Almost two weeks pass and the developer is back in the DBA Manager's office:

Developer to DBA Manager: The database is slow.
DBA Manager to Developer: Did you make the changes we recommended?
Developer to DBA Manager: We don't have time to make the changes, can't you just make the database go faster.
DBA Manager to Developer: Please get your management to approve us spending a boatload of money on new hardware and Oracle licenses.

Wednesday, April 13, 2011

It's always X, except when it's not

I drill into my DBAs heads that the first step in tuning any process is to get a trace.  Don't show me the query plan until you show me what the database actually executed.  The trace is the fundamental clue in any performance investigation.  The wait events tell you exactly what's going on and there is no question where the bulk of the time is being spent.

When we see a large number of waits on the "db file sequential read" in my environment, it usually means that the wrong index is being used.  Not usually in the 5 out of 10 times sense, but usually in the degree of 99 times out of 100.  Most of the time we find that the developer has "discovered" hints and is trying to outsmart the optimizer or the optimizer just picked the wrong index.  Very rarely, a poor response time for "db file sequential read" may mean we have a bad disk somewhere in the volume.

We had an issue the other day where a particular query had a high number of "db file sequential read" waits.  None of the normal things applied and we needed to dig a little deeper.  After looking at the raw trace file, we saw that the query was waiting on a "db file sequential read" from the UNDO datafiles.  Oh yeah, the other .05% of the time it's a consistent read problem!

Understand what the causes for a particular event are.  Sometimes there is more than one cause for Oracle to throw a particular event.  Also, the event thrown may be dependent on what else is going on in the database at the time.  Also keep and open mind that even though it's X 99% of the time, there is a slight possibility it could be Y.

Monday, April 11, 2011

11g Gotcha

Something I discovered recently is that the DEFAULT profile for Oracle 11g sets the PASSWORD_LIFE_TIME to 180 instead of UNLIMTED by default.  Applications will encounter an "ORA-28002: the password will expire within X days" error message if you keep the default value.

To change the PASSWORD_LIFE_TIME, you:
ALTER PROFILE default LIMIT password_life_time UNLIMITED;

One thing to keep in mind is that when you first encounter the ORA-28002 error, your dba_users.account_status will go into a "EXPIRED(GRACE)" state.  You can "change" that users password to the same password it is now to clear it.

Thursday, April 07, 2011

What's old is new again

We had a fair number of problems related to bind variable peeking in our 10.2.0.3 and 10.2.0.4 dbs.  In fact, the plans became so erratic that we decided to just turn bind variable peeking off by setting _optim_peek_user_binds=false in all the init.ora files.

Flash forward to Oracle 11g (11.2.0.2) and the optimizer has been enhanced to use bind variable peeking much better.  After converting to 11g, one of my queries was running slower than before.  I ran the slow query with hard-coded values and it finished in less than a second.  I then tried the same thing with bind variables and it ran in 2400 seconds, with a different execution plan.

These were the exact symptoms that originally brought me to set _optim_peek_user_binds in the first place.  My first thought was that I left the underscore parameters commented out after the upgrade.  Nope, it was set "correctly", or at least how I thought it should be set.

I then set optimizer_features_enable back to 10.2.0.4 for my session and re-executed the query.  Sure enough, it was back to sub-second response time.  I switched the optimizer_features_enable back to 11.2.0.2 and it was slow again.  Aha, something changed!

More research yeilded some documentation to digest:
http://www.oracle.com/technetwork/articles/sql/11g-sqlplanmanagement-101938.html
http://www.dba-oracle.com/t_11g_adaptive_cursor_sharing.htm
http://www.pythian.com/news/820/bind-peeking-ad-hoc-queries-stable-performance-on-10g-you-can-only-pick-any-two/

After much reading, thought, and testing, I decided that _optim_peek_user_binds=false was not needed in 11g for my environment.

Sunday, March 27, 2011

The Cost of Mediocre

Over the years I've written tons of scripts to monitor all sorts of things on the database.  Some of those scripts turned out to be helpful on an ongoing basis and some ended up dying on the vine when new features of Oracle came out.

One of my scripts looks at the top processes in a database from the CPU's perspective.  In other words, how much time did the process use on the CPU over it's entire lifetime.  About a month ago I identified a process that was using a disproportionate amount of time as compared to everything else.  It was a new process, so I was pretty sure something could be tuned.  I assigned another DBA to trace it and make tuning suggestions.  One particular query that was doing a Full Table Scan was identified as the major time consumer and suggestions were made on how to fix the query.

Over the last few weeks, the developer had been busy.  When I enquired as to the status of the slow query the response was that the job runs in an acceptable time and nothing would be done on it until later in the month.

Flash forward to a couple days ago when the process showed up on my report again.  We had burned about three hours of CPU time just doing Full Table Scans all day long.  I created an index on some fields that I thought were appropriate and a single execution of the query when from 44+ seconds to .04 seconds.  My report the next day showed the process down to about 10 minutes of CPU time.

The kicker: The developer said "Oh, I thought of that a month ago but never implemented it."  Meanwhile, I burned about 84 hours of CPU time.

Friday, March 25, 2011

Upgrading to Oracle 11gR2

I am about half way through upgrading all my dbs to Oracle 11gR2 from various versions of 10gR2.  Out of all the upgrades I've performed from 7.0 up to 10gR2, the 10g to 11g upgrade is by far the most hassle free of all.

I have abandoned the GUI upgrade tool this time.  The GUI tool is fine, but internal issues prevented me from running the GUI remotely.  This meant that somebody had to be on-site to push buttons for each and every 9i to 10g upgrade.  I have moved to a command-line upgrade that I have automated through a series of ksh scripts.  I feed the right parameters to the script and two hours later it's done.  No fuss, no muss.

The most important thing I have learned so far is that utlu112i.sql is your friend.  If utlu112i.sql even hints that something might go wrong during your upgrade, you better fix it.  For example, it's a heck of a lot easier to fix JVM issues before the upgrade rather than during the upgrade.  You have to trust the utlu112i.sql script.  If it thinks you don't have enough space in your SYSTEM tablespace, you probably don't.  utlu112i.sql is a little cautious on some stuff like the DBMS_LDAP warnings, but it's best to understand what all the warnings mean.

My general approach has been to upgrade what I have and make sure everything works on 11g before I start to exploit the 11g features.

Full Speed Ahead!  To infinity and beyond!

Wednesday, March 23, 2011

Seriously?

I haven't done any research on this yet, so I don't know if it is fixed in a subsequent patch or a later version.  But seriously, a space?
SQL> select * from v$version;

BANNER                                                                          
----------------------------------------------------------------                
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi                
PL/SQL Release 10.2.0.3.0 - Production                                          
CORE 10.2.0.3.0 Production                                                      
TNS for Linux: Version 10.2.0.3.0 - Production                                  
NLSRTL Version 10.2.0.3.0 - Production                                          

SQL> select trigger_name, triggering_event from user_triggers;

TRIGGER_NAME                   TRIGGERING_EVENT                                 
------------------------------ ------------------------------                   
DDL1                           DDL                                              
LT1                            LOGON                                            
LT2                            LOGON                                            

SQL> select trigger_name, triggering_event from user_triggers
  2  where triggering_event = 'LOGON';

no rows selected

SQL> select trigger_name, '^' || triggering_event || '^' triggering_event
  2  from user_triggers;

TRIGGER_NAME                   TRIGGERING_EVENT                                 
------------------------------ ------------------------------                   
DDL1                           ^DDL ^                                           
LT1                            ^LOGON ^                                         
LT2                            ^LOGON ^                                         

Tuesday, January 11, 2011

My first 11g show stopper: Purging the Recyclebin

I've run into my first issue with 11.2.0.2.  After one of my dbs was upgraded, every time I try to purge the recyclebin, I get an ORA-00600 error:

SQL> purge dba_recyclebin;
purge dba_recyclebin
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [ktcdso-1], [], [], [], [], [], [], [], [], [], [], []

It doesn't matter if the recyclebin is on or off.

I have done about six upgrades and this is the only db with the problem.  I can backup this db using RMAN and restore it on another host and the error still gets generated.

I have about 43 objects owned by SYS and named WRH$* left in the recyclebin after the upgrade, but these are normal after the upgrade. Or, at least, I have these objects in the recyclebin after every 10.2.0.5 -> 11.2.0.2 upgrade I've done so far.

I can purge the objects individually to get rid of the immediate problem, but I am most concerned that this will show up again in another db.

I couldn't find anything on metalink.  I've had a TAR open with Oracle Support for well over a month and it's going nowhere.

Monday, January 10, 2011

Listener Logging in 11g

By default, the listener logging is not turned on in 11r2.  You have to turn it on by specifying:

LOGGING_listener_name=ON


in your listener.ora file.