Prior to 8.1.5, calculating statistics used to be a trade off of time spent calculating statistics vs. the amount of time saved when your queries ran faster. Oracle provided the ANALYZE command or the dbms_utility package that you used to calculate statistics on your data. Automating the process of collecting statistics was either a script generating a bunch of ANALYZE commands or a call to dbms_utility.
The enhancements to the optimizer in 8.1.5 made calculating statistics a necessary evil. Oracle provided dbms_stats to assist with statistics gathering. My first attempt at using dbms_stats was replacing calls to dbms_utility with calls to dbms_stats. Needless to say, this method failed horribly wrong for a couple reasons:
- dbms_stats didn't work exactly the same as dbms_utility.
- dbms_stats calculation took longer than dbms_utility.
- The new method must be able to analyze more than one object at a time. The degree option of dbms_stats is great, but I found my CPUs were at most 50% busy during statistics gathering, even with degree=2*CPU. I've got multiple processors and I want them 100% busy during statistics gathering.
- The new method must be able to give me feedback on where it is and how long it took to analyze each object. I don't want to issue a command and wait for 5 hours to tell me "done".
- 20 hours is totally unacceptable. I was analyzing every table every weekend when probably 90% of them never changed. I'll admit it, I was lazy when converting from another version.
- When the optimizer chose a radically different execution path, I had to be able to "restore" the previous stats.
I read more about dbms_stats. I talked to people. I read what others were doing for their statistics collection. I read just about every link on AskTom about dbms_stats. I experimented.
First, I turned on monitoring and am letting Oracle tell me what objects I need to gather statistics for. This will immediately speed up my process because I don't have to analyze tables that don't change. Less work = faster.
Next, I configured job queues so I could use dbms_job. The job queues will give me the ability to run multiple analyze jobs at the same time.
Last, I had to write a little code. I basically broke up the processing into two pieces; one step to figure out what needs to be analyzed and submit those jobs to the job queue and another step to analyze the object. Below is the pseudo code I used to figure out which objects had to be analyzed:
Get A list of objects (o) that need analyzing;
FOREACH o LOOP
Record attributes about the object you are going to analyze;
Setup a DBMS_JOB;
Then, each dbms_job was a simple call to a piece of code that used this logic:
BEGINI'm testing this in development right now to see how it compares to the old method. Right now it looks promising.
Record the time this job started;
Export the stats for this object to a "history" table;
IF the table is partitioned THEN
IF the object to analyze is the partitioned table/index THEN
Record time job ended;