Thursday, January 11, 2007

Just set the 10195 event

Going back and forth with Oracle on what I think is an optimizer problem. Thanks to JL, we pretty much have figured out what the short-term solution is, yet the nagging question is Why?

Oracle Support came up with an 10195 event (Doc: 271999.1 on ML for those interested) , which essentially disables predicate generation for check constraints. OK, fine and dandy. But it still doesn't solve our performance problem. Perhaps it's more than an optimizer problem after all. (I'll post at the end of the article the exact scripts I used to setup this example. I'd actually be interested if other people on other platforms experience this as well.)

Take for example:

SQL> alter session set events '10195 trace name context forever, level 1';

Session altered.

Elapsed: 00:00:00.01
SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

Elapsed: 00:00:00.01
SQL> explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');

Explained.

Elapsed: 00:00:00.04
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2788119336

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | INDEX RANGE SCAN| T1_DM | 1493 | 11944 | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))

Note
-----
- 'PLAN_TABLE' is old version

19 rows selected.

Elapsed: 00:00:00.53

OK, my filter predicate is gone. But looking at the tkprof, I still find:

select count(*)
from
t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.28 0.27 0 241 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.28 0.27 0 241 0 1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 5

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=241 pr=0 pw=0 time=277577 us)
90000 INDEX RANGE SCAN T1_DM (cr=241 pr=0 pw=0 time=270112 us)(object id 649118)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.00
**************************************************************************
select count(*)
from
t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.05 0.05 0 241 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.05 0.05 0 241 0 1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 5

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=241 pr=0 pw=0 time=51866 us)
90000 INDEX RANGE SCAN T2_DM (cr=241 pr=0 pw=0 time=120 us)(object id 649119)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.00

Now what's the problem? I'm retrieving the same number of blocks (from cache, I might add). Yet, my elapsed time to retrieve data on the table with a constraint (even with the predicate constraint checking supposedly off) is .27 seconds vs. .05 for the table without the check constraint. To add insult to injury, if we pull out our old friend the Rule Based Optimizer:

select /*+ rule */ count(*)
from
t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.06 0.05 0 241 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.06 0.05 0 241 0 1

Misses in library cache during parse: 0
Optimizer mode: RULE
Parsing user id: 5

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=241 pr=0 pw=0 time=53206 us)
90000 INDEX RANGE SCAN T1_DM (cr=241 pr=0 pw=0 time=86 us)(object id 649118)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 0.00 0.00

Ouch.

An omnipotent inside Oracle brings up a good point. It's the fetch where all the time is being spent. True, but Why? Why would the access path for the same plan be different? Still chugging along, looking for answers.

I'd be interested in your findings. I've tried this on Solaris 32, Solaris 64, Linux x86, and Linux x86_64. All similar (except x86_64 rocks and it's hard to tell a difference). I actually think the slower the box, the better.

script to reproduce:

set timing on
set echo on;
drop table t1;
create table t1 (
id number primary key,
date_matched date,
other_stuff varchar2(222),
constraint ck1 check (date_matched = trunc(date_matched))
);
drop table t2;
create table t2 (
id number primary key,
date_matched date,
other_stuff varchar2(222)
);
declare
dm date;
dc integer := 0;
begin
for i in 1..100000 loop
if mod(i,10) = 0 then
dm := trunc(sysdate + mod(dc,66));
dc := dc+1;
else
dm := to_date('12/31/4712','mm/dd/yyyy');
end if;
insert into t1 values (i, dm, rpad('x',mod(i,222)));
insert into t2 values (i, dm, rpad('x',mod(i,222)));
end loop;
commit;
end;
/
create index t1_dm on t1(date_matched);
create index t2_dm on t2(date_matched);
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T1',cascade=>true,estimate_percent=>NULL);
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T2',cascade=>true,estimate_percent=>NULL);
select * from v$version;
select date_matched, count(*) from t1 group by date_matched order by date_matched;
select date_matched, count(*) from t2 group by date_matched order by date_matched;
-- heres the explain plan
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
-- here's the explain plan
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
-- lets get some blocks in cache
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- lets time one with tkprof
alter session set events '10046 trace name context forever, level 12';
alter session set tracefile_identifier = 't1t2_10046';
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- yet RBO yields basically same result as t2
select /*+ rule */ count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- lets get a plan with the event turned on
alter session set events '10195 trace name context forever, level 1';
alter session set events '10046 trace name context forever, level 12';
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);

-- lets time one with tkprof
alter session set tracefile_identifier = 't1t2_10195';
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- yet RBO yields basically same result as t2
select /*+ rule */ count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');

4 comments:

Anonymous said...

Jeff, I saw this from an incoming link on my blog, and tried to emulate it - without success.

Check this line in the first tkprof output.

Misses in library cache during parse: 0

No misses in library cache means you didn't re-optimize the statement. Are you sure you didn't pick up the execution of a plan generated before 10195 was set ? Setting this event wouldn't invalidate a pre-existing cursor.

Jeff Hunter said...

I usually do the query a couple times to make sure I am getting limited physical I/O. Let me go back and try to verify...

Anonymous said...

Jeff, I didn't run exactly the script you left which, I think, is why I got a different result.

Try inserting 'alter system flush shared_pool' just before you enable event 10195, and I think you will see the difference.

Jeff Hunter said...

Indeed, the plan was cached. Thanks again for all your help. Updated info here