Tuesday, August 15, 2006

Compute or Estimate

A question came by me the other day that I had to do a little head-scratching on.

"How can you tell if your table was analyzed with COMPUTE or ESTIMATE?"

Good question. In my enviornment, I know it was COMPUTE because that's how I do my stats. But how would I know otherwise? There's a column called sample_size in dba_tables that indicates how many rows were used for the last sample. Ah, just a simple calculation.

Lets see how it works out. First, create a decent sized table:

SQL> create table big_table as select * from dba_objects;

Table created.

SQL> insert into big_table select * from big_table;

42396 rows created.

SQL> /

84792 rows created.

SQL> /

169584 rows created.

SQL> /

339168 rows created.

SQL> /

678336 rows created.

SQL> commit;

Commit complete.

Of course, we don't have any stats yet, but lets just check it anyway.

SQL> select table_name, last_analyzed, global_stats, sample_size,
num_rows, decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE NO


Sounds good. Lets analyze our table using dbms_stats and see what we get:

SQL> exec dbms_stats.gather_table_stats('JH','BIG_TABLE');

PL/SQL procedure successfully completed.

SQL> select table_name, last_analyzed, global_stats,
sample_size, num_rows,
decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE 14-AUG-06 YES 1356672 1356672 100


Cool, that's what we expect. Now let's analyze and estimate with 1% of the rows:

SQL> exec dbms_stats.gather_table_stats('JH','BIG_TABLE', estimate_percent=>1);

PL/SQL procedure successfully completed.

SQL> select table_name, last_analyzed, global_stats, sample_size,
num_rows, decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE 14-AUG-06 YES 13371 1337100 1


That definitely is what I was expecting.

I think I remember that if you estimate more the 25%, Oracle will just COMPUTE. Lets see what happens:

SQL> exec dbms_stats.gather_table_stats('JH','BIG_TABLE', estimate_percent=>33);

PL/SQL procedure successfully completed.

SQL> select table_name, last_analyzed, global_stats, sample_size,
num_rows, decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE 14-AUG-06 YES 447748 1356812 33.0000029



Here I'm on 9.2.0.5. Previous to 9.2, this would have done 100% of the rows, but in 9.2, it will estimate all the way up to 99%. (Why you would estimate 99% vs. 100%, I have no clue, but it can be done).

Let's see what the block_sampling does:


SQL> exec dbms_stats.gather_table_stats('JH','BIG_TABLE',
estimate_percent=>10, block_sample=>true);

PL/SQL procedure successfully completed.

SQL> select table_name, last_analyzed, global_stats, sample_size,
num_rows, decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE 14-AUG-06 YES 132743 1327430 10



Eh, nothing new there.

Lastly, let's see what dbms_stats.auto_sample_size does.

SQL> exec dbms_stats.gather_table_stats('JH','BIG_TABLE',
estimate_percent=>dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

SQL> select table_name, last_analyzed, global_stats, sample_size,
num_rows, decode(num_rows, 0, 1, (sample_size/num_rows)*100) est_pct
from dba_tables
where table_name = 'BIG_TABLE'
/

TABLE_NAME LAST_ANAL GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------- --- ----------- ---------- ----------
BIG_TABLE 14-AUG-06 YES 1356672 1356672 100



That's interesting, 9.2 thinks my best method is to compute. I try this on several 9.2 dbs and different sized tables and it picks 100% every time. So I go to the friendly Metalink and find a couple bugs in 9.2 that indicate auto_sample_size doesn't exactly work in 9.2.

Not sure where I'll use this in the future, but it was an interesting experiment anyway.

1 comment:

Anonymous said...

Hello,

i tested your tests on my system and i think it is a BUG from 9.2.0.5. because when i use dbms_stats.auto_sample_size , i get the right result, isn't it?

TABLE_NAME LAST_ANALYZED GLO SAMPLE_SIZE NUM_ROWS EST_PCT
------------------------------ --------------- --- ----------- ---------- ----------
BIG_TABLE 20060823-140847 YES 52098 844134 6.17176894

i use 9.2.0.7.

but, can i get how the table is analyzed ? COMPUTE or ESTIMATE ?

best regards

TOM