Sometimes tuning a database can be rewarding. I get great satisfaction when helping a user take their query from 10 minutes down to 3 seconds. But it doesn't always work that way. Consider this recent exchange:
Developer to DBA Manager: The database is slow.
DBA Manager to DBA: Trace his process.
The DBA traces the developer's process and comes up with about 12 queries that can benefit from tuning. The DBA informs both the developer and the DBA Manager that he thinks these 12 queries could benefit from some attention.
Developer to DBA Manager: We don't have time to tune our own queries, we need one of your DBAs to do it for us.
DBA Manager to Developer: But you guys know the model, my guys will end up asking you 100 questions.
Developer to DBA Manager: That's fine, we don't have time to work on it.
DBA Manager to DBA: Start working on tuning the queries.
DBA spends 6 days tuning the 12 queries. Some of the queries require the SQL to change, some require statistics to be gathered at a different point, and some are just changing how the indexes work. The DBA informs both the DBA Manager and the developer what needs to change and how exactly to implement the changes from the database perspective.
Almost two weeks pass and the developer is back in the DBA Manager's office:
Developer to DBA Manager: The database is slow.
DBA Manager to Developer: Did you make the changes we recommended?
Developer to DBA Manager: We don't have time to make the changes, can't you just make the database go faster.
DBA Manager to Developer: Please get your management to approve us spending a boatload of money on new hardware and Oracle licenses.
Thursday, April 28, 2011
Wednesday, April 13, 2011
It's always X, except when it's not
I drill into my DBAs heads that the first step in tuning any process is to get a trace. Don't show me the query plan until you show me what the database actually executed. The trace is the fundamental clue in any performance investigation. The wait events tell you exactly what's going on and there is no question where the bulk of the time is being spent.
When we see a large number of waits on the "db file sequential read" in my environment, it usually means that the wrong index is being used. Not usually in the 5 out of 10 times sense, but usually in the degree of 99 times out of 100. Most of the time we find that the developer has "discovered" hints and is trying to outsmart the optimizer or the optimizer just picked the wrong index. Very rarely, a poor response time for "db file sequential read" may mean we have a bad disk somewhere in the volume.
We had an issue the other day where a particular query had a high number of "db file sequential read" waits. None of the normal things applied and we needed to dig a little deeper. After looking at the raw trace file, we saw that the query was waiting on a "db file sequential read" from the UNDO datafiles. Oh yeah, the other .05% of the time it's a consistent read problem!
Understand what the causes for a particular event are. Sometimes there is more than one cause for Oracle to throw a particular event. Also, the event thrown may be dependent on what else is going on in the database at the time. Also keep and open mind that even though it's X 99% of the time, there is a slight possibility it could be Y.
When we see a large number of waits on the "db file sequential read" in my environment, it usually means that the wrong index is being used. Not usually in the 5 out of 10 times sense, but usually in the degree of 99 times out of 100. Most of the time we find that the developer has "discovered" hints and is trying to outsmart the optimizer or the optimizer just picked the wrong index. Very rarely, a poor response time for "db file sequential read" may mean we have a bad disk somewhere in the volume.
We had an issue the other day where a particular query had a high number of "db file sequential read" waits. None of the normal things applied and we needed to dig a little deeper. After looking at the raw trace file, we saw that the query was waiting on a "db file sequential read" from the UNDO datafiles. Oh yeah, the other .05% of the time it's a consistent read problem!
Understand what the causes for a particular event are. Sometimes there is more than one cause for Oracle to throw a particular event. Also, the event thrown may be dependent on what else is going on in the database at the time. Also keep and open mind that even though it's X 99% of the time, there is a slight possibility it could be Y.
Monday, April 11, 2011
11g Gotcha
Something I discovered recently is that the DEFAULT profile for Oracle 11g sets the PASSWORD_LIFE_TIME to 180 instead of UNLIMTED by default. Applications will encounter an "ORA-28002: the password will expire within X days" error message if you keep the default value.
To change the PASSWORD_LIFE_TIME, you:
ALTER PROFILE default LIMIT password_life_time UNLIMITED;
One thing to keep in mind is that when you first encounter the ORA-28002 error, your dba_users.account_status will go into a "EXPIRED(GRACE)" state. You can "change" that users password to the same password it is now to clear it.
To change the PASSWORD_LIFE_TIME, you:
ALTER PROFILE default LIMIT password_life_time UNLIMITED;
One thing to keep in mind is that when you first encounter the ORA-28002 error, your dba_users.account_status will go into a "EXPIRED(GRACE)" state. You can "change" that users password to the same password it is now to clear it.
Thursday, April 07, 2011
What's old is new again
We had a fair number of problems related to bind variable peeking in our 10.2.0.3 and 10.2.0.4 dbs. In fact, the plans became so erratic that we decided to just turn bind variable peeking off by setting _optim_peek_user_binds=false in all the init.ora files.
Flash forward to Oracle 11g (11.2.0.2) and the optimizer has been enhanced to use bind variable peeking much better. After converting to 11g, one of my queries was running slower than before. I ran the slow query with hard-coded values and it finished in less than a second. I then tried the same thing with bind variables and it ran in 2400 seconds, with a different execution plan.
These were the exact symptoms that originally brought me to set _optim_peek_user_binds in the first place. My first thought was that I left the underscore parameters commented out after the upgrade. Nope, it was set "correctly", or at least how I thought it should be set.
I then set optimizer_features_enable back to 10.2.0.4 for my session and re-executed the query. Sure enough, it was back to sub-second response time. I switched the optimizer_features_enable back to 11.2.0.2 and it was slow again. Aha, something changed!
More research yeilded some documentation to digest:
http://www.oracle.com/technetwork/articles/sql/11g-sqlplanmanagement-101938.html
http://www.dba-oracle.com/t_11g_adaptive_cursor_sharing.htm
http://www.pythian.com/news/820/bind-peeking-ad-hoc-queries-stable-performance-on-10g-you-can-only-pick-any-two/
After much reading, thought, and testing, I decided that _optim_peek_user_binds=false was not needed in 11g for my environment.
Flash forward to Oracle 11g (11.2.0.2) and the optimizer has been enhanced to use bind variable peeking much better. After converting to 11g, one of my queries was running slower than before. I ran the slow query with hard-coded values and it finished in less than a second. I then tried the same thing with bind variables and it ran in 2400 seconds, with a different execution plan.
These were the exact symptoms that originally brought me to set _optim_peek_user_binds in the first place. My first thought was that I left the underscore parameters commented out after the upgrade. Nope, it was set "correctly", or at least how I thought it should be set.
I then set optimizer_features_enable back to 10.2.0.4 for my session and re-executed the query. Sure enough, it was back to sub-second response time. I switched the optimizer_features_enable back to 11.2.0.2 and it was slow again. Aha, something changed!
More research yeilded some documentation to digest:
http://www.oracle.com/technetwork/articles/sql/11g-sqlplanmanagement-101938.html
http://www.dba-oracle.com/t_11g_adaptive_cursor_sharing.htm
http://www.pythian.com/news/820/bind-peeking-ad-hoc-queries-stable-performance-on-10g-you-can-only-pick-any-two/
After much reading, thought, and testing, I decided that _optim_peek_user_binds=false was not needed in 11g for my environment.
Subscribe to:
Posts (Atom)