So What Co-operative: What is the optimizer doing?

Interesting problem here. I have a table in one of my Oracle 9.2.0.5 databases on Solaris called ACTIVITY_RTAB:

SQL> desc activity_rtab;
Name              Null?    Type
----------------- -------- --------------
ID                NOT NULL NUMBER(19)
FLAG1             NOT NULL CHAR(1)
MONEY_AMOUNT      NOT NULL MONEYAMOUNT_T
DATE1             NOT NULL DATE
DATE2             NOT NULL DATE
DATE3             NOT NULL DATE
DATE_MATCHED      NOT NULL DATE
FLAG2             NOT NULL CHAR(1)
STR1              NOT NULL VARCHAR2(100)
FLAG3             NOT NULL CHAR(1)
ID2               NOT NULL NUMBER(19)
STR2              NOT NULL VARCHAR2(10)
ID3               NOT NULL NUMBER(19)
FLAG4             NOT NULL CHAR(1)
DATE4             NOT NULL DATE
DATE5             NOT NULL DATE
ID4               NOT NULL NUMBER(38)
DATE6             NOT NULL DATE

This table has two indicies; a Primary Key on ID, and a NON-UNIQUE index on DATE_MATCHED. The data in DATE_MATCHED is pretty skewed. I have about 10K rows per day in this table, except for the date '31-DEC-4712', which I have about 355K rows.


select
  before_on,
  avg(rows_per_day) avg_rows_per_day
from (
  select
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
     end before_on,
     trunc(date_matched) dm,
     count(*) rows_per_day
  from activity_rtab
  group by
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
    end,
    trunc(date_matched)
 )
 group by before_on
 /

BEFORE  AVG_ROWS_PER_DAY                                                       
------ -----------------                                                       
before           9413.75                                                       
on             355606.00

Queries using the index on DATE_MATCHED index are somewhat quick for any day other than '31-DEC-4712'. This behaviour doesn't really surprise me since I'll be fetching 355K rows basically a single block at a time.


select
  count(*)
from activity_rtab
where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:03.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=99 Bytes=792)
Statistics
----------------------------------------------------------                     
        56  recursive calls                                                   
         0  db block gets                                                     
      1031  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I know what you're thinking, "OK, so what's your problem?". In order to do some maintenance, I did a CTAS and created a new table in the same tablespace. I put both indicies on the new table and ran a quick query.


select count(*) from jh$activity_rtab where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.53

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'JH$DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=9867 Bytes=78936)
                                                                              
Statistics
----------------------------------------------------------                     
        14  recursive calls                                                   
         0  db block gets                                                     
       949  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

Huh? Why is this returning in less than a second where the query on the real table takes 3? At first, I thought something must be wrong with the storage so I moved the table and rebuilt the indexes. Same thing.

Then I looked closer at the plan and noticed the query on JH$ACTIVITY_RTAB had "Card=9867" and the query against ACTIVITY_RTAB had "Card=99". Aha! A statistics problem. I calculated statistics on both tables using the following commands:


exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'activity_rtab',cascade=>true);

No dice. The query on JH$ACTIVITY_RTAB was still significantly faster than ACTIVITY_RTAB. Then I decided to take my own advice and ran a 10046 trace on both queries. The TKPROF confirmed my observations, but still offered no clues as to what the real problem was.


select count(*)
from
jh$activity_rtab where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.01          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch        6      0.92       0.91          0       2838          0           3
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.92       0.92          0       2838          0           3

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN JH$DATE_MATCHED (object id 43980)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       6        0.00          0.00
 SQL*Net message from client                     6        0.00          0.00
********************************************************************************

select count(*)
from
ACTIVITY_RTAB where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        4      6.05       6.22          0       2030          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      6.06       6.23          0       2030          0           2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN IX_ACTIVITY_DATE_MATCHED (object id 37979)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       4        0.00          0.00
 SQL*Net message from client                     4        0.00          0.00
********************************************************************************

At this point, I had a hunch that it was an optimizer problem but couldn't convince myself. I couldn't get around the fact that both tables had nearly exact characteristics but the queries performed radically different. Time for a 10053 trace on both tables:


Table stats    Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121877  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: JH$DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 19996  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194025
 INDEX NAME: JH$PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16754  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908531
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: JH$ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 9867  CMPTD CDN: 9867
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1925
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1925  Resp:  1925
 Access path: index (equal)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  JH$ACTIVITY_RTAB[JH$ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:       9867  BYTES:      78936
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 9867  RSC: 7  RSP: 7  BYTES: 78936
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1
PLAN
Cost of plan:  8
Operation...........Object name.....Options.........Id...Pid..
SELECT STATEMENT                                        0
SORT                                AGGREGATE           1
INDEX               JH$DATE_MATCHED RANGE SCAN          2    1
BASE STATISTICAL INFORMATION
***********************
Table stats    Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121875  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: ACTIVITY_PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16748  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908561
 INDEX NAME: IX_ACTIVITY_DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 20112  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194021
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 99  CMPTD CDN: 99
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1936
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1936  Resp:  1936
 Access path: index (equal)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  ACTIVITY_RTAB[ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:         99  BYTES:        792
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 99  RSC: 7  RSP: 7  BYTES: 792
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1

The queries are basically the same except for that darn Cardinality (CDN). I pulled out Jonathan Lewis' latest book and went through the calculations but still don't see how the CDN of 99 is being computed. I played around with histograms and different bucket sizes, but always came back to the simple premise of "If performance is fine with JH$ACTIVITY_RTAB, why does it suck with ACTIVITY_RTAB"?

I just don't understand it. For kicks, I tried the query with a +RULE hint and it returned in .48 seconds.


SQL> select /*+ rule */ count(*) from ACTIVITY_RTAB where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=HINT: RULE                                
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE)                                                                
                                                                              

Statistics
----------------------------------------------------------                     
         0  recursive calls                                                   
         0  db block gets                                                     
      1015  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I'm really perplexed at this point. Sure, I have two workarounds; use the +RULE hint or drop the table and recreate it. I'm not a big fan of hinting since hinted queries seem to be the first queries that "go bad" when you upgrade. Dropping the table seems reasonable, but since I don't know how I got into this situation, I'm not sure how to detect it. Sort of like choosing between a root canal and a colonoscopy.

An update is here.

3 comments:

Anonymous said...: Just some idle curiosity - did you run the num_distinct query on p. 42 of Jonathan's book for your two tables?; 12/16/2006 07:39:00 PM
Anonymous said...: Jeff,
I'll take a bet that your production system has a constraint on that column:
date_matched = trunc(date_matched).

This is extracted by the optimizer and used as a predicate in the query, and (incorrectly) introduces a 1% factor to the selectivity (hence 9876 -> 99).

BUT - the predicate gets checked for every row, and checking date = trunc(date) 355,000 times is probably going to account for 5 seconds of CPU.

Warning - always look at the dbms_xplan output (or at least the filter_predicates and access_predicates) when investigating execution plan problems.

The bit that really puzzles me is that the range scan line has a cost of 29, but the query has a cost of 8. I'm still thinking about that - but did you see anything in the trace file that says 'prefetch enabled'.; 12/21/2006 02:38:00 AM
Jeff Hunter said...: I'm glad I don't bet with you. Indeed, it was a constraint on date_matched that was causing this slowdown:
CONSTRAINT_NAME C SEARCH_CONDITION
------------------------------ - ------------------------------------------------------------
ACTIVITY_DATE_MATCHED C date_matched = TRUNC( date_matched )

SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)
----------
370319

Elapsed: 00:00:01.02
SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)
----------
370319

Elapsed: 00:00:01.01
SQL> alter table activity_rtab drop constraint ACTIVITY_DATE_MATCHED;

Table altered.

Elapsed: 00:00:00.37
SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)
----------
370319

Elapsed: 00:00:00.08
SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)
----------
370319

Elapsed: 00:00:00.07

Thanks for all your help!! You're a real life-saver.; 12/21/2006 10:30:00 AM

So What Co-operative

Friday, December 15, 2006

What is the optimizer doing?

3 comments:

Facebook Badge

Pageviews past week

Contributors

Best of (IMHO)

Blog Archive

Blogroll

Swag