So What Co-operative: December 2006

Thursday, December 21, 2006

The Optimizer, an update

My last post involving the optimizer was quite perplexing until Jonathan Lewis commented on it. Indeed, the check constraint was the problem in this particular case.

I was so frustrated, I went down the path of upgrading to the latest version since I knew Oracle wouldn't be submitting a bug on 9.2.0.5. I even changed platform just for kicks:


SQL> select * from v$version;

BANNER                                                                                      
----------------------------------------------------------------                            
Oracle Database 10g Release 10.2.0.2.0 - Production                                         
PL/SQL Release 10.2.0.2.0 - Production                                                      
CORE 10.2.0.2.0 Production                                                                  
TNS for Linux: Version 10.2.0.2.0 - Production                                              
NLSRTL Version 10.2.0.2.0 - Production

Alas, I still had the problem before Jonathan suggested the constraints.


SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.27
SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.27

If I had only chased down the filter predicate using the new dbms_xplan.display method, I might have had some more clues:


SQL> explain plan for select count(*) from activity_rtab where date_matched = '31-dec-4712';

Explained.

Elapsed: 00:00:00.00
SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                            
----------------------------------
Plan hash value: 3790338984                                                                  
                                                                                           
----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                          |     1 |     8 |   269   (3)| 00:00:04 |
|   1 |  SORT AGGREGATE   |                          |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| IX_ACTIVITY_DATE_MATCHED |  3801 | 30408 |   269   (3)| 00:00:04 |
----------------------------------------------------------------------------------------------
                                                                                           
Predicate Information (identified by operation id):                                          

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
 2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))        
     filter(TRUNC(INTERNAL_FUNCTION("DATE_MATCHED"))=TO_DATE('4712-12-31 00:00:00',        
            'yyyy-mm-dd hh24:mi:ss'))                                                      

16 rows selected.

I dropped the constraint and sure enough, the performance problem went away:


SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.06
SQL> explain plan for select count(*) from activity_rtab where date_matched = '31-dec-4712';

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
Plan hash value: 3790338984                                                                  
                                                                                           
----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                          |     1 |     8 |   264   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE   |                          |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| IX_ACTIVITY_DATE_MATCHED |   380K|  2969K|   264   (1)| 00:00:04 |
----------------------------------------------------------------------------------------------
                                                                                           
Predicate Information (identified by operation id):                                          

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
 2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))        

14 rows selected.

Now we have to decide; use the constraint to ensure data integrity or suffer the performance penalty for the sake of the data. A classic trade-off.

Friday, December 15, 2006

What is the optimizer doing?

Interesting problem here. I have a table in one of my Oracle 9.2.0.5 databases on Solaris called ACTIVITY_RTAB:

SQL> desc activity_rtab;
Name              Null?    Type
----------------- -------- --------------
ID                NOT NULL NUMBER(19)
FLAG1             NOT NULL CHAR(1)
MONEY_AMOUNT      NOT NULL MONEYAMOUNT_T
DATE1             NOT NULL DATE
DATE2             NOT NULL DATE
DATE3             NOT NULL DATE
DATE_MATCHED      NOT NULL DATE
FLAG2             NOT NULL CHAR(1)
STR1              NOT NULL VARCHAR2(100)
FLAG3             NOT NULL CHAR(1)
ID2               NOT NULL NUMBER(19)
STR2              NOT NULL VARCHAR2(10)
ID3               NOT NULL NUMBER(19)
FLAG4             NOT NULL CHAR(1)
DATE4             NOT NULL DATE
DATE5             NOT NULL DATE
ID4               NOT NULL NUMBER(38)
DATE6             NOT NULL DATE

This table has two indicies; a Primary Key on ID, and a NON-UNIQUE index on DATE_MATCHED. The data in DATE_MATCHED is pretty skewed. I have about 10K rows per day in this table, except for the date '31-DEC-4712', which I have about 355K rows.


select
  before_on,
  avg(rows_per_day) avg_rows_per_day
from (
  select
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
     end before_on,
     trunc(date_matched) dm,
     count(*) rows_per_day
  from activity_rtab
  group by
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
    end,
    trunc(date_matched)
 )
 group by before_on
 /

BEFORE  AVG_ROWS_PER_DAY                                                       
------ -----------------                                                       
before           9413.75                                                       
on             355606.00

Queries using the index on DATE_MATCHED index are somewhat quick for any day other than '31-DEC-4712'. This behaviour doesn't really surprise me since I'll be fetching 355K rows basically a single block at a time.


select
  count(*)
from activity_rtab
where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:03.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=99 Bytes=792)
Statistics
----------------------------------------------------------                     
        56  recursive calls                                                   
         0  db block gets                                                     
      1031  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I know what you're thinking, "OK, so what's your problem?". In order to do some maintenance, I did a CTAS and created a new table in the same tablespace. I put both indicies on the new table and ran a quick query.


select count(*) from jh$activity_rtab where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.53

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'JH$DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=9867 Bytes=78936)
                                                                              
Statistics
----------------------------------------------------------                     
        14  recursive calls                                                   
         0  db block gets                                                     
       949  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

Huh? Why is this returning in less than a second where the query on the real table takes 3? At first, I thought something must be wrong with the storage so I moved the table and rebuilt the indexes. Same thing.

Then I looked closer at the plan and noticed the query on JH$ACTIVITY_RTAB had "Card=9867" and the query against ACTIVITY_RTAB had "Card=99". Aha! A statistics problem. I calculated statistics on both tables using the following commands:


exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'activity_rtab',cascade=>true);

No dice. The query on JH$ACTIVITY_RTAB was still significantly faster than ACTIVITY_RTAB. Then I decided to take my own advice and ran a 10046 trace on both queries. The TKPROF confirmed my observations, but still offered no clues as to what the real problem was.


select count(*)
from
jh$activity_rtab where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.01          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch        6      0.92       0.91          0       2838          0           3
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.92       0.92          0       2838          0           3

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN JH$DATE_MATCHED (object id 43980)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       6        0.00          0.00
 SQL*Net message from client                     6        0.00          0.00
********************************************************************************

select count(*)
from
ACTIVITY_RTAB where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        4      6.05       6.22          0       2030          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      6.06       6.23          0       2030          0           2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN IX_ACTIVITY_DATE_MATCHED (object id 37979)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       4        0.00          0.00
 SQL*Net message from client                     4        0.00          0.00
********************************************************************************

At this point, I had a hunch that it was an optimizer problem but couldn't convince myself. I couldn't get around the fact that both tables had nearly exact characteristics but the queries performed radically different. Time for a 10053 trace on both tables:


Table stats    Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121877  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: JH$DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 19996  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194025
 INDEX NAME: JH$PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16754  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908531
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: JH$ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 9867  CMPTD CDN: 9867
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1925
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1925  Resp:  1925
 Access path: index (equal)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  JH$ACTIVITY_RTAB[JH$ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:       9867  BYTES:      78936
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 9867  RSC: 7  RSP: 7  BYTES: 78936
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1
PLAN
Cost of plan:  8
Operation...........Object name.....Options.........Id...Pid..
SELECT STATEMENT                                        0
SORT                                AGGREGATE           1
INDEX               JH$DATE_MATCHED RANGE SCAN          2    1
BASE STATISTICAL INFORMATION
***********************
Table stats    Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121875  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: ACTIVITY_PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16748  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908561
 INDEX NAME: IX_ACTIVITY_DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 20112  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194021
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 99  CMPTD CDN: 99
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1936
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1936  Resp:  1936
 Access path: index (equal)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  ACTIVITY_RTAB[ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:         99  BYTES:        792
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 99  RSC: 7  RSP: 7  BYTES: 792
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1

The queries are basically the same except for that darn Cardinality (CDN). I pulled out Jonathan Lewis' latest book and went through the calculations but still don't see how the CDN of 99 is being computed. I played around with histograms and different bucket sizes, but always came back to the simple premise of "If performance is fine with JH$ACTIVITY_RTAB, why does it suck with ACTIVITY_RTAB"?

I just don't understand it. For kicks, I tried the query with a +RULE hint and it returned in .48 seconds.


SQL> select /*+ rule */ count(*) from ACTIVITY_RTAB where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=HINT: RULE                                
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE)                                                                
                                                                              

Statistics
----------------------------------------------------------                     
         0  recursive calls                                                   
         0  db block gets                                                     
      1015  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I'm really perplexed at this point. Sure, I have two workarounds; use the +RULE hint or drop the table and recreate it. I'm not a big fan of hinting since hinted queries seem to be the first queries that "go bad" when you upgrade. Dropping the table seems reasonable, but since I don't know how I got into this situation, I'm not sure how to detect it. Sort of like choosing between a root canal and a colonoscopy.

An update is here.

Thursday, December 14, 2006

NYOUG Winter

Just got back from the New York Oracle Users Group Winter meeting. Although I've only been involved with the NYOUG for about two years, I was saddened to see Dr. Paul Dorsey step down as President. He's been a driving force for getting top quality speakers to the quarterly meetings and really supports the user community. I'm sure Michael Olin will take the reigns from Dr. D and keep the top quality speakers coming.

World renowned Michael Abbey presented the keynote address. His presentation took a light hearted look on the soft skills needed to cope with the various personalities encountered in the IT arena. I had to chuckle about his characterization of a sysadmin; hair as messy as his office, the biggest screen in the whole shop, and an aire of secrecy surrounding this person. I kept picturing the sysadmin I had right out of college and his description was right on. Perhaps the best management advice I got out of this presentation was "delegate and oversee". Michael explained that one person can't do and know everything and you have to trust (but verify) the work your people do. I struggle daily with knowing when to hand things off to other people and when to handle them myself.

The second presentation of the day was for 11g PL/SQL features by Tom Kyte. There's a lot of new features coming for PL/SQL in 11g (expected really late next year) that will help optimize the code as well as make it easier to write code. It sounds like the most helpful feature in my environment will be that dependency tracking is now at the element level instead of the object level. For example, just adding a column to a table won't validate the packages that depend on it. If you drop a field, then sure, everything referencing it will need to be recompiled. There are also other optimizations like the result cache (just-in-time materialized views) and better native compilation, but it sounds like you shouldn't be expecting your applications to speed up more than 5-10%. One really nice feature is the ability to mark just a table as readonly instead of having to move it to a readonly tablespace. As always, Tom was the entertaining speaker keeping that kept the audience participation at a very high level.

All the presenters then got together just before lunch to have an "Ask the Experts" panel; or as I like to call it "Ask Tom Live". Tom is a top notch presenter and world-class information source, but I kind of felt bad for the other presenters who were basically left out.

After lunch, I stole some power to recharge the laptop and bumped into Dave Anderson from SkillBuilders who was also powering up. We swapped laptop stories and seems neither one of us are really happy with our current capacity. As Tim Allen says, "More Power!"

Come to find out, Dave was the next presenter with an introduction to Automatic Storage Management. ASM is a 10g specific piece of software that essentially takes the place of a volume manager if you have JBOD (just a bunch of disks). You present the raw devices for those disks to ASM as a diskgroup and Oracle stripes the data over the physical disks in the disk group. For those of you on servers with direct attached storage and limited budgets, I would think ASM would be a good option rather than chunking down money for VxVM and VxFS just to have some RAID. On the other hand, I'm not sure how practical it is if you're in a SAN environment where the striping and mirroring is handled at the OS level. Also, Dave mentioned that there was no performance penalty of ASM striping over a hardware stripe. However, since ASM is a piece of software, you're basically putting a software based RAID over a hardware based RAID robbing my host of CPU that may be neede for user processes. Even so, ASM is obviously here to stay and I'll be investigating the technology and it's use as I get closer to 10g.

All in all, a great education value for the membership you pay. IF you're in the Tri-State area, it makes sense to join even if you only get to the September conference. Lets just say there are some educational events coming in may where the COST will be BASED on your membership being OPTIMIZEd. (OK, so maybe not so subtle, Jonathan Lewis is coming in May).

Sunday, December 10, 2006

The Failure was a success

We periodically excersize various Disaster Recovery scenarios on our mission critical systems. The most recent test was scheduled for Saturday, and most things came off without a hitch. As always, we learned a couple things and have some work to do before the next test, but overall, it went pretty good. Checkout my setup for Saturday. Three screens on two computers logged into the Disaster Recovery site, and the laptop logged into my production environment fighting another fire. I love the thrill of a crisis.

Switched to Blogger Beta

Switched the blog to Blogger Beta. Let's see how much transfers over correctly.

So What Co-operative

Thursday, December 21, 2006

The Optimizer, an update

Friday, December 15, 2006

What is the optimizer doing?

Thursday, December 14, 2006

NYOUG Winter

Sunday, December 10, 2006

The Failure was a success

Switched to Blogger Beta

Facebook Badge

Pageviews past week

Contributors

Best of (IMHO)

Blog Archive

Blogroll

Swag