Wednesday, August 22, 2007

Deadlocks

Good old ORA-00060: deadlock detected while waiting for resource. There's really two main causes for an ORA-00060 error; bad application logic or unindexed foreign keys.

The classic example of a deadlock is userA updating table X and then table Y while userB updates table Y and then table X. Oracle kills one of the users, rolls back his transaction, throws an error to the alert.log and everybody else moves on. As DBAs, we look at these errors, forward them to the appropriate developer and forget about them.

The second most prevalent cause of ORA-00060 errors is unindexed foreign keys. You can check out Oracle's explanation and there are a ton of scripts that will help you diagnose this situation. I personally use a variation of Tom Kyte's script but there are others such as this one.

I know, you're asking yourself "So What? Everybody knows about deadlocks."

Be patient grasshopper, there's more to the story.

A couple weeks ago we started getting ORA-00060 error messages every now and then on a 10.2.0.3 database that rarely received this type of message before. And then some of our critical processes slowed down. And then we started getting about 12 ORA-00060 errors a day.

At first, we attacked it as two separate problems; the ORA-00060 error and performance. One DBA started decoding the trace files from the deadlocks informing the developers of which objects were involved and the data involved. The weird part was the deadlock was occurring on an INSERT statement.

At the same time, I started tracing the slow processes and found that a great majority of time was being spent on "enq: TM - contention" waits. We then ran an AWR report and found that the top wait was "enq: TM - contention" and we knew we had a problem.

The wait event and the ORA-00060 together lead us to an unindexed FK. However, all the scripts we used indicated the fks were indexed. We started disabling the FKs one by one (for the two tables involved in the majority of the deadlocks). After one particular FK was disabled, all our "enq: TM - contention" waits went away and things were back to normal.

Oracle support directed us to bug 6053354 which isn't really a bug because of bug 6066587. Sure enough, the conditions in 6053354 existed in our database. We turned the tablespaces to read-write and re-enabled the FK with no adverse effects.

The moral of the story is follow your instincts up to a point, but every now and then something really freaky is going on.

Tuesday, August 21, 2007

Back when the internet was fun...

Do you remember way back to 2005 when the internet was fun?

Wanted to check the progress of the NCAA tournament during the day? Just go to http://www.espn.com and your browser refreshes automatically as the action happens.

Read some industry chit-chat during your lunch hour? Open up your favorite news aggregator and read away.

Two years ago I wouldn't have thought twice about forwarding a funny email to friends & family.

Try to go to http://www.espn.com today and you'll likely have to "justify the business need" to three levels of management to get your scores.

My RSS newsfeeds quit working a couple weeks ago. Doesn't matter that 90% of them are industry related, they're not business appropriate.

I stopped forwarding emails to some friends and family because I was tired of getting the bounce messages from their email filter. It's just not the same when you have to tell the joke over the phone and explain why 18 goes into 54 more times than 54 goes into 18.

Ah, the price of progress.