Wednesday, August 22, 2007

Deadlocks

Good old ORA-00060: deadlock detected while waiting for resource. There's really two main causes for an ORA-00060 error; bad application logic or unindexed foreign keys.

The classic example of a deadlock is userA updating table X and then table Y while userB updates table Y and then table X. Oracle kills one of the users, rolls back his transaction, throws an error to the alert.log and everybody else moves on. As DBAs, we look at these errors, forward them to the appropriate developer and forget about them.

The second most prevalent cause of ORA-00060 errors is unindexed foreign keys. You can check out Oracle's explanation and there are a ton of scripts that will help you diagnose this situation. I personally use a variation of Tom Kyte's script but there are others such as this one.

I know, you're asking yourself "So What? Everybody knows about deadlocks."

Be patient grasshopper, there's more to the story.

A couple weeks ago we started getting ORA-00060 error messages every now and then on a 10.2.0.3 database that rarely received this type of message before. And then some of our critical processes slowed down. And then we started getting about 12 ORA-00060 errors a day.

At first, we attacked it as two separate problems; the ORA-00060 error and performance. One DBA started decoding the trace files from the deadlocks informing the developers of which objects were involved and the data involved. The weird part was the deadlock was occurring on an INSERT statement.

At the same time, I started tracing the slow processes and found that a great majority of time was being spent on "enq: TM - contention" waits. We then ran an AWR report and found that the top wait was "enq: TM - contention" and we knew we had a problem.

The wait event and the ORA-00060 together lead us to an unindexed FK. However, all the scripts we used indicated the fks were indexed. We started disabling the FKs one by one (for the two tables involved in the majority of the deadlocks). After one particular FK was disabled, all our "enq: TM - contention" waits went away and things were back to normal.

Oracle support directed us to bug 6053354 which isn't really a bug because of bug 6066587. Sure enough, the conditions in 6053354 existed in our database. We turned the tablespaces to read-write and re-enabled the FK with no adverse effects.

The moral of the story is follow your instincts up to a point, but every now and then something really freaky is going on.

3 comments:

Doug Burns said...

I know how easy it is to misinterpret ORA-00060 errors. I'd add another main cause - insufficient ITL slots ...

Jeff Hunter said...

Yes, on 9i that's still true. 10g works a little differently so that's not such an issue anymore, but something to definitely be aware of.

Doug Burns said...

Isn't the same still true on 10g unless you're using ASSM? (Which I'm very wary of)

All 10g? I'm just jealous really - 9.2.0.8 patching this month :-(