So What Co-operative

Friday, January 12, 2007

Opportunity for a MySQL DBA

I am looking for a MySQL Database Administrator to join my team. The position will be located at our Greenwich, Connecticut headquarters. The ideal candidate will have a mix of database technologies, predominantly MySQL, as well as a minimum of three years experience as a Database Administrator on the Linux platform.

Responsibilities

Administer Backup/Recovery processes
Install and Configure Database Software
Identify and resolve performance problems.
Automate day-to-day tasks as needed.
Rotating on-duty coverage is expected.
Manage access to database resources

Required Skills

3+ years experience as MySQL DBA in a production Linux environment.
1+ years experience with MySQL 5.0.
2+ years exposure to Oracle in a production or development environment.
Day-to-day Database Administration.
Backup and Recovery using InnoDB Backup, Zmanda, mysqldump.
MySQL Replication.
Ability to quickly diagnose and resolve performance using standard tools.
Some experience with both MySQL and Oracle partitioning.
Self-starter with initiative and desire to learn.
Ability to work quickly in a fast-paced environment.
Ability to communicate clearly, constructively and concisely to peers and end-users.
Ability to work independently with minimal supervision and drive issues to closure.

Desired Skills

Automation of DBA tasks (Perl, CGI, Ksh, PHP) .
Some experience in Oracle 10g, MySQL 5.1.
DR Strategies including Replication/Standby Database.
Supporting mission critical systems in 24x7 environment

For further consideration, candidates should send their resume, cover letter, and salary history to recruiting@interactivebrokers.com. Please indicate you are applying for the "Database Administrator" position.

Thursday, January 11, 2007

Just set the 10195 event

Going back and forth with Oracle on what I think is an optimizer problem. Thanks to JL, we pretty much have figured out what the short-term solution is, yet the nagging question is Why?

Oracle Support came up with an 10195 event (Doc: 271999.1 on ML for those interested) , which essentially disables predicate generation for check constraints. OK, fine and dandy. But it still doesn't solve our performance problem. Perhaps it's more than an optimizer problem after all. (I'll post at the end of the article the exact scripts I used to setup this example. I'd actually be interested if other people on other platforms experience this as well.)

Take for example:


SQL> alter session set events '10195 trace name context forever, level 1';

Session altered.

Elapsed: 00:00:00.01
SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

Elapsed: 00:00:00.01
SQL> explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');

Explained.

Elapsed: 00:00:00.04
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                             
--------------------------------------------------------------------------------
Plan hash value: 2788119336                                                   
                                                                            
---------------------------------------------------------------------------   
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |   
---------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |       |     1 |     8 |     4   (0)| 00:00:01 |   
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |   
|*  2 |   INDEX RANGE SCAN| T1_DM |  1493 | 11944 |     4   (0)| 00:00:01 |   
---------------------------------------------------------------------------   
                                                                            
Predicate Information (identified by operation id):                           

PLAN_TABLE_OUTPUT                                                             
--------------------------------------------------------------------------------                                                                              
 2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd       
            hh24:mi:ss'))                                                   
                                                                            
Note                                                                          
-----                                                                         
 - 'PLAN_TABLE' is old version                                              

19 rows selected.

Elapsed: 00:00:00.53

OK, my filter predicate is gone. But looking at the tkprof, I still find:


select count(*)
from
t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.28       0.27          0        241          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.28       0.27          0        241          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 5 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE (cr=241 pr=0 pw=0 time=277577 us)
 90000   INDEX RANGE SCAN T1_DM (cr=241 pr=0 pw=0 time=270112 us)(object id 649118)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       2        0.00          0.00
 SQL*Net message from client                     2        0.00          0.00
**************************************************************************
select count(*)
from
t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.05          0        241          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0        241          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 5 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE (cr=241 pr=0 pw=0 time=51866 us)
 90000   INDEX RANGE SCAN T2_DM (cr=241 pr=0 pw=0 time=120 us)(object id 649119)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       2        0.00          0.00
 SQL*Net message from client                     2        0.00          0.00

Now what's the problem? I'm retrieving the same number of blocks (from cache, I might add). Yet, my elapsed time to retrieve data on the table with a constraint (even with the predicate constraint checking supposedly off) is .27 seconds vs. .05 for the table without the check constraint. To add insult to injury, if we pull out our old friend the Rule Based Optimizer:


select /*+ rule */ count(*)
from
t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy')


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.05          0        241          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.06       0.05          0        241          0           1

Misses in library cache during parse: 0
Optimizer mode: RULE
Parsing user id: 5 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE (cr=241 pr=0 pw=0 time=53206 us)
 90000   INDEX RANGE SCAN T1_DM (cr=241 pr=0 pw=0 time=86 us)(object id 649118)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       3        0.00          0.00
 SQL*Net message from client                     3        0.00          0.00

Ouch.

An omnipotent inside Oracle brings up a good point. It's the fetch where all the time is being spent. True, but Why? Why would the access path for the same plan be different? Still chugging along, looking for answers.

I'd be interested in your findings. I've tried this on Solaris 32, Solaris 64, Linux x86, and Linux x86_64. All similar (except x86_64 rocks and it's hard to tell a difference). I actually think the slower the box, the better.

script to reproduce:


set timing on
set echo on;
drop table t1;
create table t1 (
  id number primary key,
  date_matched date,
  other_stuff varchar2(222),
  constraint ck1 check (date_matched = trunc(date_matched))
);
drop table t2;
create table t2 (
  id number primary key,
  date_matched date,
  other_stuff varchar2(222)
);
declare
  dm date;
  dc integer := 0;
begin
  for i in 1..100000 loop
     if mod(i,10) = 0 then
        dm := trunc(sysdate + mod(dc,66));
        dc := dc+1;
     else
        dm := to_date('12/31/4712','mm/dd/yyyy');
     end if;
     insert into t1 values (i, dm, rpad('x',mod(i,222)));
     insert into t2 values (i, dm, rpad('x',mod(i,222)));
  end loop;
  commit;
end;
/
create index t1_dm on t1(date_matched);
create index t2_dm on t2(date_matched);
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T1',cascade=>true,estimate_percent=>NULL);
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T2',cascade=>true,estimate_percent=>NULL);
select * from v$version;
select date_matched, count(*) from t1 group by date_matched order by date_matched;
select date_matched, count(*) from t2 group by date_matched order by date_matched;
-- heres the explain plan
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
-- here's the explain plan
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
-- lets get some blocks in cache
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- lets time one with tkprof
alter session set events '10046 trace name context forever, level 12';
alter session set tracefile_identifier = 't1t2_10046';
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- yet RBO yields basically same result as t2
select /*+ rule */ count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- lets get a plan with the event turned on
alter session set events '10195 trace name context forever, level 1';
alter session set events '10046 trace name context forever, level 12';
explain plan for select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);
explain plan for select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select * from table(dbms_xplan.display);

-- lets time one with tkprof
alter session set tracefile_identifier = 't1t2_10195';
select count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
select count(*) from t2 where date_matched = to_date('12/31/4712','mm/dd/yyyy');
-- yet RBO yields basically same result as t2
select /*+ rule */ count(*) from t1 where date_matched = to_date('12/31/4712','mm/dd/yyyy');

Friday, January 05, 2007

Working with InnoDB Tablespaces

Is it me, or is working with InnoDB tablespaces akin to working with Oracle 6.0?

For example, today somebody filled up a filesystem by unintentially loading a large amount of data. It was development, so this stuff happens. Since it was development, I setup the InnoDB tablespace to start with size 10M and autoextend to whatever free space was available. The one datafile kept autoextending until it completely filled the disk. The developer dropped the table and thought that would free up the space.

I knew that wouldn't be the case, so I had to dump all the data, recreate the tablespace, and reload all my data. Pretty standard DBA stuff so far.

"This time", I thought to myself, " I'm going to create a couple files each 10m and let them autoextend to 10240m so this doesn't happen again".

Speedbump #1: InnoDB doesn't let you autoextend more than one file.

OK, I know my data is about 30G and we'll be doing some development, so I'll create three 10240m files and let the last one autoextend to 10240m. After 20 minutes of combing through the manuals trying to find a "DROP TABLESPACE" or similar command, I finally deduce that you can't drop a tablespace while the server is up.

Speedbump #2: The server (or instance for you Oracle people) has to be down to remove the tablespace.

So I shutdown the server, remove my datafiles per the docs, setup my config file, and restarted the server. The logfile spits out that it's pre-creating the files (as expected) but then throws an error message:


  Error: All logfiles must be created at the same time.

Speedbump #3: You have to recreate the logfiles when you recreate the tablespace.

I don't want to recreate 30G of space, so I shutdown my server, remove the logfiles, and restart the server. Except this time I get another error:


  InnoDB: Cannot initialize created log files because
  InnoDB: data files are corrupt, or new data files were
  InnoDB: created when the database was started previous
  InnoDB: time but the database was not shutdown
  InnoDB: normally after that.

Speedbump #4: You have to recreate your logfiles and datafiles at the same time.

It's not out of the realm of possibility that I'm doing something totally wrong or I don't understand InnoDB very well. Maybe my version (5.0.27) doesn't support these features but they are in a newer version. Or maybe it's just a hassle. I'd be interested in other's practical experiences with managing the InnoDB tablespace.

Thursday, January 04, 2007

Questions, Questions

Well it’s that time of year again! Companies’ budgets are set which means they’ll be looking to spend money, which means software purchases and upgrades, which means there will be a flurry of questions on the DBA forums on how to do things (and it’s already started).

I don’t know if this is going to turn into something of a vent for me but if anyone else feels the same or can give some insight that would lessen my frustrations please speak up.

Let me start off by saying that being a DBA in my eyes means that you have a great responsibility, one that should not be taken lightly. The company you work for depends on you for basically two things, ensure that their data is accessible and kept safe. If those two things are not met, the company is loosing money and potentially could bankrupt them if it’s not. Agreed? Yes, we are also responsible for daily flogging of developers but I look at that as a fringe benefit not a duty.

Hanging out on the DBA forums keeps me in tune with what’s going on in the Oracle world and also keeps me on my toes. However, *drum roll* when people start asking questions that you know that they didn’t do any type of research and simply posted the question really boils my potatoes. I mean the kind of question if you were to plug it into Google it would give you the answer right off the bat. When I see these things, I ask myself “why on earth would a company entrust their database to this individual?” and “what gives them the right to be called a DBA because a true DBA would of done research”. I honestly don’t know how Tom copes with it because I know he gets pelted with these things (Tom if you’re reading this please for the love of God tell me how you avoid these questions getting to you). I guess I feel like we as DBA’s are a whole and when one of us displays a lack of motivation makes us all look bad. Ok, I’m done.

*************
Last week an anonymous reader posted a question about Oracle Spatial and ArcSDE in one of my previous blogs. He asked if he has Oracle Spatial why would he need ArcSDE? The short answer….which would you rather do, load a large size raster image via the SDO_GEOR.importFrom package or use ArcCatalog? If you have your dbtune.sde file setup right it’s too easy :)
Here’s some light reading GeoRaster and Oracle Spatial.

Wednesday, January 03, 2007

Does everybody know this?

I've only been using my Palm Tungsten C for about 18 months. In that time, I only connected it to my desktop when I needed to update software or move some documents to the Palm. Never did a hot-sync since I thought I thought the data would always be on the Palm.

Did you know that when your Palm completely drains the power, it loses everything? Yup, everything. I know, backups, backups, backups....

Thursday, December 21, 2006

The Optimizer, an update

My last post involving the optimizer was quite perplexing until Jonathan Lewis commented on it. Indeed, the check constraint was the problem in this particular case.

I was so frustrated, I went down the path of upgrading to the latest version since I knew Oracle wouldn't be submitting a bug on 9.2.0.5. I even changed platform just for kicks:


SQL> select * from v$version;

BANNER                                                                                      
----------------------------------------------------------------                            
Oracle Database 10g Release 10.2.0.2.0 - Production                                         
PL/SQL Release 10.2.0.2.0 - Production                                                      
CORE 10.2.0.2.0 Production                                                                  
TNS for Linux: Version 10.2.0.2.0 - Production                                              
NLSRTL Version 10.2.0.2.0 - Production

Alas, I still had the problem before Jonathan suggested the constraints.


SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.27
SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.27

If I had only chased down the filter predicate using the new dbms_xplan.display method, I might have had some more clues:


SQL> explain plan for select count(*) from activity_rtab where date_matched = '31-dec-4712';

Explained.

Elapsed: 00:00:00.00
SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                            
----------------------------------
Plan hash value: 3790338984                                                                  
                                                                                           
----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                          |     1 |     8 |   269   (3)| 00:00:04 |
|   1 |  SORT AGGREGATE   |                          |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| IX_ACTIVITY_DATE_MATCHED |  3801 | 30408 |   269   (3)| 00:00:04 |
----------------------------------------------------------------------------------------------
                                                                                           
Predicate Information (identified by operation id):                                          

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
 2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))        
     filter(TRUNC(INTERNAL_FUNCTION("DATE_MATCHED"))=TO_DATE('4712-12-31 00:00:00',        
            'yyyy-mm-dd hh24:mi:ss'))                                                      

16 rows selected.

I dropped the constraint and sure enough, the performance problem went away:


SQL> select count(*) from activity_rtab where date_matched = '31-dec-4712';

COUNT(*)                                                                                   
----------                                                                                   
  370319                                                                                   

Elapsed: 00:00:00.06
SQL> explain plan for select count(*) from activity_rtab where date_matched = '31-dec-4712';

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
Plan hash value: 3790338984                                                                  
                                                                                           
----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                          |     1 |     8 |   264   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE   |                          |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| IX_ACTIVITY_DATE_MATCHED |   380K|  2969K|   264   (1)| 00:00:04 |
----------------------------------------------------------------------------------------------
                                                                                           
Predicate Information (identified by operation id):                                          

PLAN_TABLE_OUTPUT                                                                            
-----------------------------------------------------------------------------------------------
 2 - access("DATE_MATCHED"=TO_DATE('4712-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))        

14 rows selected.

Now we have to decide; use the constraint to ensure data integrity or suffer the performance penalty for the sake of the data. A classic trade-off.

Friday, December 15, 2006

What is the optimizer doing?

Interesting problem here. I have a table in one of my Oracle 9.2.0.5 databases on Solaris called ACTIVITY_RTAB:

SQL> desc activity_rtab;
Name              Null?    Type
----------------- -------- --------------
ID                NOT NULL NUMBER(19)
FLAG1             NOT NULL CHAR(1)
MONEY_AMOUNT      NOT NULL MONEYAMOUNT_T
DATE1             NOT NULL DATE
DATE2             NOT NULL DATE
DATE3             NOT NULL DATE
DATE_MATCHED      NOT NULL DATE
FLAG2             NOT NULL CHAR(1)
STR1              NOT NULL VARCHAR2(100)
FLAG3             NOT NULL CHAR(1)
ID2               NOT NULL NUMBER(19)
STR2              NOT NULL VARCHAR2(10)
ID3               NOT NULL NUMBER(19)
FLAG4             NOT NULL CHAR(1)
DATE4             NOT NULL DATE
DATE5             NOT NULL DATE
ID4               NOT NULL NUMBER(38)
DATE6             NOT NULL DATE

This table has two indicies; a Primary Key on ID, and a NON-UNIQUE index on DATE_MATCHED. The data in DATE_MATCHED is pretty skewed. I have about 10K rows per day in this table, except for the date '31-DEC-4712', which I have about 355K rows.


select
  before_on,
  avg(rows_per_day) avg_rows_per_day
from (
  select
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
     end before_on,
     trunc(date_matched) dm,
     count(*) rows_per_day
  from activity_rtab
  group by
     case
        when date_matched < '31-dec-4712' then 'before'
        else 'on'
    end,
    trunc(date_matched)
 )
 group by before_on
 /

BEFORE  AVG_ROWS_PER_DAY                                                       
------ -----------------                                                       
before           9413.75                                                       
on             355606.00

Queries using the index on DATE_MATCHED index are somewhat quick for any day other than '31-DEC-4712'. This behaviour doesn't really surprise me since I'll be fetching 355K rows basically a single block at a time.


select
  count(*)
from activity_rtab
where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:03.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=99 Bytes=792)
Statistics
----------------------------------------------------------                     
        56  recursive calls                                                   
         0  db block gets                                                     
      1031  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I know what you're thinking, "OK, so what's your problem?". In order to do some maintenance, I did a CTAS and created a new table in the same tablespace. I put both indicies on the new table and ran a quick query.


select count(*) from jh$activity_rtab where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.53

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=8)            
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'JH$DATE_MATCHED' (NON-UNIQUE) (Cost=29 Card=9867 Bytes=78936)
                                                                              
Statistics
----------------------------------------------------------                     
        14  recursive calls                                                   
         0  db block gets                                                     
       949  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

Huh? Why is this returning in less than a second where the query on the real table takes 3? At first, I thought something must be wrong with the storage so I moved the table and rebuilt the indexes. Same thing.

Then I looked closer at the plan and noticed the query on JH$ACTIVITY_RTAB had "Card=9867" and the query against ACTIVITY_RTAB had "Card=99". Aha! A statistics problem. I calculated statistics on both tables using the following commands:


exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'activity_rtab',cascade=>true);

No dice. The query on JH$ACTIVITY_RTAB was still significantly faster than ACTIVITY_RTAB. Then I decided to take my own advice and ran a 10046 trace on both queries. The TKPROF confirmed my observations, but still offered no clues as to what the real problem was.


select count(*)
from
jh$activity_rtab where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.01          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch        6      0.92       0.91          0       2838          0           3
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       12      0.92       0.92          0       2838          0           3

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN JH$DATE_MATCHED (object id 43980)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       6        0.00          0.00
 SQL*Net message from client                     6        0.00          0.00
********************************************************************************

select count(*)
from
ACTIVITY_RTAB where date_matched = '31-DEC-4712'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        4      6.05       6.22          0       2030          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      6.06       6.23          0       2030          0           2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42 

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE
355606   INDEX RANGE SCAN IX_ACTIVITY_DATE_MATCHED (object id 37979)


Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       4        0.00          0.00
 SQL*Net message from client                     4        0.00          0.00
********************************************************************************

At this point, I had a hunch that it was an optimizer problem but couldn't convince myself. I couldn't get around the fact that both tables had nearly exact characteristics but the queries performed radically different. Time for a 10053 trace on both tables:


Table stats    Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121877  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: JH$DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 19996  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194025
 INDEX NAME: JH$PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16754  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908531
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: JH$ACTIVITY_RTAB   Alias: JH$ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: JH$ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 9867  CMPTD CDN: 9867
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1925
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1925  Resp:  1925
 Access path: index (equal)
     Index: JH$DATE_MATCHED
 TABLE: JH$ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  JH$ACTIVITY_RTAB[JH$ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:       9867  BYTES:      78936
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 9867  RSC: 7  RSP: 7  BYTES: 78936
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1
PLAN
Cost of plan:  8
Operation...........Object name.....Options.........Id...Pid..
SELECT STATEMENT                                        0
SORT                                AGGREGATE           1
INDEX               JH$DATE_MATCHED RANGE SCAN          2    1
BASE STATISTICAL INFORMATION
***********************
Table stats    Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
 TOTAL ::  CDN: 7538301  NBLKS:  121875  AVG_ROW_LEN:  109
-- Index stats
 INDEX NAME: ACTIVITY_PK  COL#: 1
   TOTAL ::  LVLS: 2   #LB: 16748  #DK: 7538301  LB/K: 1  DB/K: 1  CLUF: 908561
 INDEX NAME: IX_ACTIVITY_DATE_MATCHED  COL#: 9
   TOTAL ::  LVLS: 2   #LB: 20112  #DK: 764  LB/K: 26  DB/K: 253  CLUF: 194021
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column: DATE_MATCH  Col#: 9      Table: ACTIVITY_RTAB   Alias: ACTIVITY_RTAB
   NDV: 764       NULLS: 0         DENS: 1.3089e-03 LO:  2453031  HI: 3442447
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: ACTIVITY_RTAB     ORIG CDN: 7538301  ROUNDED CDN: 99  CMPTD CDN: 99
 Access path: tsc  Resc:  11723  Resp:  11723
 Access path: index (iff)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 1936
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  1936  Resp:  1936
 Access path: index (equal)
     Index: IX_ACTIVITY_DATE_MATCHED
 TABLE: ACTIVITY_RTAB
     RSC_CPU: 0   RSC_IO: 29
 IX_SEL:  0.0000e+00  TB_SEL:  1.3089e-03
 BEST_CST: 8.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]:  ACTIVITY_RTAB[ACTIVITY_RTAB]#0
Best so far: TABLE#: 0  CST:          8  CDN:         99  BYTES:        792
Final - All Rows Plan:
 JOIN ORDER: 1
 CST: 8  CDN: 99  RSC: 7  RSP: 7  BYTES: 792
 IO-RSC: 7  IO-RSP: 7  CPU-RSC: 1  CPU-RSP: 1

The queries are basically the same except for that darn Cardinality (CDN). I pulled out Jonathan Lewis' latest book and went through the calculations but still don't see how the CDN of 99 is being computed. I played around with histograms and different bucket sizes, but always came back to the simple premise of "If performance is fine with JH$ACTIVITY_RTAB, why does it suck with ACTIVITY_RTAB"?

I just don't understand it. For kicks, I tried the query with a +RULE hint and it returned in .48 seconds.


SQL> select /*+ rule */ count(*) from ACTIVITY_RTAB where date_matched = '31-DEC-4712';

 COUNT(*)                                                                     
----------                                                                     
   355606                                                                     

Elapsed: 00:00:00.48

Execution Plan
----------------------------------------------------------                     
  0      SELECT STATEMENT Optimizer=HINT: RULE                                
  1    0   SORT (AGGREGATE)                                                   
  2    1     INDEX (RANGE SCAN) OF 'IX_ACTIVITY_DATE_MATCHED' (NON-UNIQUE)                                                                
                                                                              

Statistics
----------------------------------------------------------                     
         0  recursive calls                                                   
         0  db block gets                                                     
      1015  consistent gets                                                   
         0  physical reads                                                    
         0  redo size                                                         
       381  bytes sent via SQL*Net to client                                  
       499  bytes received via SQL*Net from client                            
         2  SQL*Net roundtrips to/from client                                 
         0  sorts (memory)                                                    
         0  sorts (disk)                                                      
         1  rows processed

I'm really perplexed at this point. Sure, I have two workarounds; use the +RULE hint or drop the table and recreate it. I'm not a big fan of hinting since hinted queries seem to be the first queries that "go bad" when you upgrade. Dropping the table seems reasonable, but since I don't know how I got into this situation, I'm not sure how to detect it. Sort of like choosing between a root canal and a colonoscopy.

An update is here.

Thursday, December 14, 2006

NYOUG Winter

Just got back from the New York Oracle Users Group Winter meeting. Although I've only been involved with the NYOUG for about two years, I was saddened to see Dr. Paul Dorsey step down as President. He's been a driving force for getting top quality speakers to the quarterly meetings and really supports the user community. I'm sure Michael Olin will take the reigns from Dr. D and keep the top quality speakers coming.

World renowned Michael Abbey presented the keynote address. His presentation took a light hearted look on the soft skills needed to cope with the various personalities encountered in the IT arena. I had to chuckle about his characterization of a sysadmin; hair as messy as his office, the biggest screen in the whole shop, and an aire of secrecy surrounding this person. I kept picturing the sysadmin I had right out of college and his description was right on. Perhaps the best management advice I got out of this presentation was "delegate and oversee". Michael explained that one person can't do and know everything and you have to trust (but verify) the work your people do. I struggle daily with knowing when to hand things off to other people and when to handle them myself.

The second presentation of the day was for 11g PL/SQL features by Tom Kyte. There's a lot of new features coming for PL/SQL in 11g (expected really late next year) that will help optimize the code as well as make it easier to write code. It sounds like the most helpful feature in my environment will be that dependency tracking is now at the element level instead of the object level. For example, just adding a column to a table won't validate the packages that depend on it. If you drop a field, then sure, everything referencing it will need to be recompiled. There are also other optimizations like the result cache (just-in-time materialized views) and better native compilation, but it sounds like you shouldn't be expecting your applications to speed up more than 5-10%. One really nice feature is the ability to mark just a table as readonly instead of having to move it to a readonly tablespace. As always, Tom was the entertaining speaker keeping that kept the audience participation at a very high level.

All the presenters then got together just before lunch to have an "Ask the Experts" panel; or as I like to call it "Ask Tom Live". Tom is a top notch presenter and world-class information source, but I kind of felt bad for the other presenters who were basically left out.

After lunch, I stole some power to recharge the laptop and bumped into Dave Anderson from SkillBuilders who was also powering up. We swapped laptop stories and seems neither one of us are really happy with our current capacity. As Tim Allen says, "More Power!"

Come to find out, Dave was the next presenter with an introduction to Automatic Storage Management. ASM is a 10g specific piece of software that essentially takes the place of a volume manager if you have JBOD (just a bunch of disks). You present the raw devices for those disks to ASM as a diskgroup and Oracle stripes the data over the physical disks in the disk group. For those of you on servers with direct attached storage and limited budgets, I would think ASM would be a good option rather than chunking down money for VxVM and VxFS just to have some RAID. On the other hand, I'm not sure how practical it is if you're in a SAN environment where the striping and mirroring is handled at the OS level. Also, Dave mentioned that there was no performance penalty of ASM striping over a hardware stripe. However, since ASM is a piece of software, you're basically putting a software based RAID over a hardware based RAID robbing my host of CPU that may be neede for user processes. Even so, ASM is obviously here to stay and I'll be investigating the technology and it's use as I get closer to 10g.

All in all, a great education value for the membership you pay. IF you're in the Tri-State area, it makes sense to join even if you only get to the September conference. Lets just say there are some educational events coming in may where the COST will be BASED on your membership being OPTIMIZEd. (OK, so maybe not so subtle, Jonathan Lewis is coming in May).

Sunday, December 10, 2006

The Failure was a success

We periodically excersize various Disaster Recovery scenarios on our mission critical systems. The most recent test was scheduled for Saturday, and most things came off without a hitch. As always, we learned a couple things and have some work to do before the next test, but overall, it went pretty good. Checkout my setup for Saturday. Three screens on two computers logged into the Disaster Recovery site, and the laptop logged into my production environment fighting another fire. I love the thrill of a crisis.

Switched to Blogger Beta

Switched the blog to Blogger Beta. Let's see how much transfers over correctly.

Wednesday, November 29, 2006

Where am I deploying MySQL, revisited, revisited...

Seems like I'm doing a lot of revisting these days. Perhaps I'm just not making myself clear.

I have basically five types of systems in my organization; mission critical, 24x7, ancillary, QA, and Development. Below are some detail explanations of each type of system:

Mission Critical
A mission critical application is a piece of software that must be running in order for the business to continue to operate. Without the software, the business would cease to exist. An example of this is eBay's auction system or the software that executes orders for Ameritrade.

24x7
A 24x7 system is important to the business, but not essential for the business to run. If a 24x7 system goes down, customers get upset but business continues to operate. An example of this type of system would be Ameritrade's charts or eBay's feedback system.

Ancillary Systems
An ancillary system is a system that a limited number of users access. If an ancillary system is down your bean counters may complain because they can't crunch their numbers, but overall, there is no impact to the business. Usually an ancillary system is something used internally and not exposed to your customer. An example of this type of system might be eBay's customer data warehouse or Ameritrade's accounting system.

QA
A QA system is "almost production" in the sense that it carries a full load of production data, but the data may be "stale". Performance testing occurs on these types of systems as well as new software changes. The only people that are affected by QA outages are those running the tests.

Development
A development system is a work in progress. A handful of developers will be ticked off if development is down, but nobody really cares.

Mission Critical, 24x7, and Ancillary systems are production systems and are treated as such. Any production system is important to the business, just some are more important than others. QA systems are sometimes treated like production, sometimes not. Development systems are almost never treated like production.

In "Where am I deploying MySQL", I point out what types of systems where I am comfortable deploying MySQL. I have deployed several systems in development, QA, and production. My production apps are mostly of ancillary type, but I have recently deployed a 24x7 web based application. As we build the infrastructure around MySQL, there will be a natural progression towards more 24x7 apps. Those ancillary and 24x7 apps eventually evolve into mission critical systems as our experience becomes deeper and our confidence becomes stronger.

I'm an Oracle guy, no question about it. But I'm implementing MySQL. In fact, the vast majority of my new systems are being developed on MySQL and are going into production on MySQL.

From time to time we have to emerge from our comfort zones. We can either confront it as an opportunity to learn and progress or fear it and fail.

Tuesday, November 28, 2006

Why I still use pfiles, revisited...

Ivan Kartik has an informative guide on SPFILES and their use at http://ivan.kartik.sk/index.php?show_article=22. I hope he posts a followup explaining what advantages they have over PFILEs and some real-world scenarios. Perhaps he can persuade me to change my mind...

Monday, November 27, 2006

Where am I deploying MySQL, revisited...

The internet is a great thing. You can find almost anything and share your ideas with anybody that is willing to listen. Unfortunately, that means that anybody with an pugnacious personality and a keyboard also has a voice.

A couple weeks ago, somebody asked me where I deploy MySQL. I created this post about where I think MySQL's sweet spots are in my organization.

I'm no MySQL expert, I'll admit it. In fact, right at the top of the post I indicate as such:

In my opinion, I'd use MySQL for anything but the most mission critical applications. I'm not saying MySQL can't handle the most mission critical applications, but I'm not comfortable betting my business on MySQL at this point.

Sheeri Kritzer decides to put her own disparaging slant on my post and writes MySQL Is a Great Database, But Do Not Use It???? I posted a comment on the blog clarifying my position on a couple points, but wanted to expound on it a little more here.

First off, I didn't say nor did I imply not to use MySQL. I didn't say anything of the sort. I simply explained where I use MySQL and what kind of applications I use MySQL for. Nowhere in the post did I say MySQL is only good for XYZ type application in every organization.

She also goes on to pick apart this paragraph:

Last, but not least, MySQL is suited for data marts ( less than 1TB). Stuffing lots of historical data into denormalized relational tables is what "LOAD DATA LOCAL" is all about. These types of applications aren't needed 24x7 but require snappy response times when queried.

with

In the third sweet spot, you say MySQL is good at stuffing lots of data into denormalized tables. I’ve been using MySQL for quite a long time, and I missed the part where stuffing lots of data into normalized tables is slow. It’s fast for stuffing lots of data into tables, period.

Um, I guess if I stood on my head with one eye closed my paragraph says "MySQL sucks at stuffing lots of data into normalized tables."

Next, we get into a common misconception:

And MySQL isn’t “just as fast as the next guy”. It’s FASTER, otherwise why would people use it, because it’s much less featured than Oracle? This I’ll admit — however, you can code around features. No amount of code can make Oracle faster.

Where, oh where, do I start? I know I use MySQL because it's cheaper. I wouldn't make a blanket statement saying it's FASTER. Sometimes it's faster, sometimes not. All I have to do is come up with one example of where MySQL is slower and the blanket statement is false.

People that have been in this business a little more than a couple years know lots of examples where people think they can code around features. And no amount of code can make Oracle faster? If you treat it like a big black box, then you're probably right. But treat it like a sophisticated piece of software that has it's on intricacies, then yes, code matters.

Oh, and:

To Jeff Hunter’s comment I say: Incremental backups exist — that’s what binary logs are.

...has nothing to do with MySQL, it's a reply to the previous comment about what's missing in Oracle SE vs. EE.

And I purposely didn't make this an Oracle/MySQL comparison. Primarily because I can't say with authority how MySQL behaves. I could bring up the fact that you have to bring down the db just to add a datafile to the InnoDB tablespace, but I'm not 100% sure you can't do that online. I could also bring up the fact that you have to shutdown the database to change the size of your logfiles, but again, I'm not 100% sure that's the only way to do it.

Apparently MySQL is her gig and she's a zealot for the software. That's cool, more power to ya. But don't belittle me because that's not the only thing I use. And if you're going to criticize me, at least have the courtesy of alerting me to the fact that you have done so.

Tuesday, November 21, 2006

Selling out

Danasoft has a .jpg generator that echo's your IP address and does a IP lookup to figure out your ISP. You could also put up to ten sayings on your .jpg and they would randomly rotate on the graphic. Up until a couple days ago, I thought it was kinda cool, so I included it in my signature on several forums.

A couple days ago, I noticed that my signature had an advertisement in it. I thought that maybe my signature had been hacked and I was just going to create another one. But then I saw on the Danasoft website that you could create a .jpg without advertisements by clicking a certain link.

Granted, everytime my .jpg is displayed, I'm using some of their bandwidth. But the part that really miffs me is they dropped these advertisements in without notifying anybody. Needless to say, I ripped the graphic from all the forums in which I participate. That's 3000 less views they'll be getting every day...

Tuesday, November 14, 2006

Why I still use pfiles

I'm always up for investigating features of Oracle and deciding when and how they should be introduced into my environment. I recently got a chance to go back and look at spfiles again.

Server Parameter Files (spfiles) are binary files that contain your initialization paramters. The two major advantages to using spfiles over standard text parameter files are that the spfiles let your initialization parameters persist across instance restarts and they can be backed up by RMAN. How many people have changed a parameter on-the-fly only for it to be reset by the init.ora upon restart? All your init.ora problems could now be solved, just by using spfiles.

But I don't use them.

The first reason is maybe I don't want the instance to keep a value when it gets restarted. For example, maybe I changed my log_archive_dest_1 to a temporary location because my primary location started filling up. I don't want that to be my permanent location, just until I get a chance to backup my archived redo logs. Sure, I'll change it back when my backup changes, but if I forget, I haven't made an unintentional permanent change.

Also, I dig change control. Nothing goes to production until it's checked into my version control system. This includes init.ora files, tnsnames.ora files, listener.ora, scripts, and anything else. If you want to know what the db_cache_size was back in 2002, I can tell you. Sure, I could dump my spfile to a pfile when I make a change and put it into version control, but that goes against the process. Things go into version control before they go into production.

Along those same lines, version control allows me to back out a change that won't allow me to start the instance. For example, say I set the shared_pool_size to 400G on a box with 4G of RAM. When my instance doesn't start, I can check the old version of the init.ora out of my version control, replace it, and start the instance. If I were using an spfile I'd have to create a pfile that specified my spfile and then add an entry that reduced the shared_pool_size back down. And that's assuming I knew what the problem was. With version control, I just diff the two files and I know what the change was.

Another reason I like pfiles is I can include a "standard" pfile so all my instances run with similar parameters. For example, maybe I want to make sure all my instances run 4 job queue procesess. I just put the appropriate parameter in the my initStandard.ora, include the initStandard.ora in my init.ora file using ifile= and I'm good to go. I know that on the next restart my environments will be the same.

The last advantage I think standard pfiles give me is the ability to distribute production changes to my DR site. I have two init.ora files at my DR site; one that holds the parameters for recovery and one that holds the parameters for running during a failover. When I make a change to my primary init.ora, I push that change to both the primary site and the standby site (using automated methods). When I have to failover, I know the standby will operate just like my primary site did.

I also don't care about the backup. Whoa, I mean I don't care about backing up the initialization parameters because I already have them in my version control.

I know this goes against conventional wisdom in the Oracle world. I'm a firm believer in doing what makes sense for my particular environment regardless if "everybody" thinks its OK or not. When I start running RAC, maybe it will make more sense for this environment, but until then I'm still using pfiles.

Thursday, November 09, 2006

Keeping you in the loop

The email subscription form we introduced at Wilton Diaries a couple weeks ago has really taken off. In other words, we've got three subscribers, but the "cool" factor is way up there.

When you subscribe via email, you will get a message when the Blog gets updated. To subscribe, simply enter your email address and click the "Subscribe" button. A form will then pop-up asking you to verify that you are an acutal person and not an aggregator. You simply type the letters that show in the box and click the buttons you are instructed to. You will then be sent an email message with a URL that you must click on to finish your subscription. This last part is important as you don't want to go through all that work and not get the feed!

If that all sounds too complicated, feel free to continue checking the So What Co-operative every day.

Wednesday, November 08, 2006

Playing in the Sand

Just got back from a week in Kuwait upgrading our database down there from 9i to 10g and upgrading our ESRI from 9.0 to 9.1.

I must say upgrading Oracle9i to 10g on a Sun Solaris 10 OS is the most easiest and painless install I’ve ever done. I found out on our Dev box that trying to put 10g on Solaris 8 was like trying to put a square peg in a round hole. Luckily, I got our UNIX admin on board with upgrading all of our servers to Solaris 10 before I started my production upgrade festivities.

When I got down there we had one database serving up maps and vehicle tracking data. All the tracking data is OLTP oriented and the maps are nothing but a bunch of blobs. The user has the ability to see maps by themselves and vehicle information (text) by itself. The user also has the ability to see maps and the vehicle data at the same time.

The server has plenty of horsepower and space so I decided to break the database into two. I created another database and put the maps on it. I configured it for bulk stuff .One other thing I did was, we have a particular map that automatically loads up when a user first accesses the webpage, I threw that in to the keep pool. Performance is very nice. It’s so refreshing when you have a database configured correctly for the environment it supports.

Tuesday, November 07, 2006

Where am I deploying MySQL?

If cost were no object, I'd always deploy Oracle. I'm comfortable with Oracle technology and I think I have a pretty good idea how to implement and administer it.

In the world of corporate IT, however, budgets are king. Projects are measured by their Return on Investment (ROI) and the lower I can get that investment, the better return I can get for my investment. I have a real hard time spending $160K on an application that will occupy 40G of space.

In my opinion, I'd use MySQL for anything but the most mission critical applications. I'm not saying MySQL can't handle the most mission critical applications, but I'm not comfortable betting my business on MySQL at this point.

I think there are about three sweet spots for MySQL. The first is small to medium size OLTP databases (<100 GB) that are fronted by something like a java middle-tier. These applications typically control most of the business logic and authentication/authorization in the middle-tier (right or wrong) and use the database as a big storage bucket. These applications rely on the backend serving data as fast as it can and MySQL can serve data just as fast as the next guy.

Another area where MySQL excels in serving database driven content directly on the webserver. This type of application typically cranks out high numbers of queries and has very little updates to worry about.

Last, but not least, MySQL is suited for data marts ( < 1TB). Stuffing lots of historical data into denormalized relational tables is what "LOAD DATA LOCAL" is all about. These types of applications aren't needed 24x7 but require snappy response times when queried.

No, MySQL doesn't have some of the features that some of the big-box databases have. And it's got plenty of limitations. But when you want an 80% solution, I think it's the right choice. My company is sold on MySQL and as our confidence grows in the software, so will our installed base.

Monday, November 06, 2006

Quick and Dirty MySQL Backup

Until recently, the MySQL databases I work with contain data that can be retrieved from other sources. Most of the data is either batched in from flat files or another database. It would be inconvenient to reload a couple months worth of data, but since these databases are not mission critical, the business could operate without them for a couple days. Lately, we've been implementing some semi-critical systems that rely on a somewhat expedient recovery.

The requirements for the project were that the database must remain up during the backup and losing a day's worth of data was acceptable. All of my regular Oracle readers are cringing at the moment, but hey, that was the rules I was working with.

My first thought was to use mysqlhotcopy because it backed up the actual physical files. However, mysqlhotcopy only allows you to backup MyISAM tables and we extensively use InnoDB.

My next choice was mysqldump. mysqldump basically takes the entire database and dumps a text file containing DDL and DML that will re-create your database. Coming from an Oracle background, I knew there were shortcomings to dumping the entire database, but hopefully I could mitigate them.

The first hurdle was security. I specifically turn off unauthenticated root access on my databases, but I needed to be able to read all the tables to do a backup. I don't want to hard-code my root password or any password in a script as I don't have suicidal tendencies (diagnosed, anyway). So I created a user called backup that could only login from the server machine, but could login unauthenticated.

The next thing I had to figure out was how to get a consistent view of the data. I knew that my developers preferred InnoDB for it's Referential Integrity features and getting inconsistent data would be disasterous. Fortunately, one of the flags to mysql_dump is the --single-transaction which essentially takes a snapshot in time.

So I wrote a script around mysql_dump and --single-transaction and dumped my entire database to disk. Every now and again, I encountered an "Error 2013: Lost connection to MySQL server during query when dumping table `XYZ` at row: 12345". The row number changed each time, so I figured it had something to do with either activity in the database or memory. I could rerun the command and it usually finished the second or third time.

After the third day straight of my backup failing, I decided to research it a little more. mysql_dump has a flag called --quick which bypasses the cache and writes directly to disk. I put this flag in my backup script and the script started finishing more consistently.

The last hurdle was having enough space on disk to store my backups. Since the backup file is really a text file, I decided to pipe the output through gzip to reduce it's size.

Currently, my quick and dirty backup script is a wrapper around the following command:


mysqldump --all-databases --quick --single-transaction -u backup | gzip > mybackup.sql.gz

We're adopting MySQL at a blistering pace, so I'm sure I'll need to make changes in the future. For right now, though, it gets the job done.

Wednesday, November 01, 2006

Check this out

I usually employ a logon trigger for most of my Oracle databases so I can grab certain identifying information about the session. Then I save this information in another table for later analysis.

I have started testing 9iR2 on a 64-bit Linux box and have come across a certain peculiarity. v$session is defined as:


SQL> desc v$session
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
SADDR                                              RAW(4)
SID                                                NUMBER
...

I then create a table using the same type and try to insert a value:


SQL> create table jh1 (saddr raw(4));

Table created.

SQL> desc jh1
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
SADDR                                              RAW(4)

SQL> insert into jh1 select saddr from v$session;
insert into jh1 select saddr from v$session
                      *
ERROR at line 1:
ORA-01401: inserted value too large for column

Hmmmf. So I do a CTAS:


SQL> drop table jh1;

Table dropped.

SQL> create table jh1 as select saddr from v$session;

Table created.

SQL> desc jh1
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
SADDR                                              RAW(8)

...and look what size the column is!


SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
PL/SQL Release 9.2.0.7.0 - Production
CORE    9.2.0.7.0       Production
TNS for Linux: Version 9.2.0.7.0 - Production
NLSRTL Version 9.2.0.7.0 - Production

Update: 2006/11/01 16:09:
From Support:

I checked my Windows (32bit) database and v$session.saddr is a RAW(4).

OK, that explains it.