Sunday, April 29, 2007

Two Years of So What?



I started this blog about two years ago. Never in my wildest dreams would have I thought that I would have over 200 regular subscribers and about 300 page views a day. In celebration of the Anniversary, I change the blog template to one of new standard blogger templates.

If only my IRA portfolio looked like this...

Friday, April 27, 2007

MySQL Pseudo-Partitioning, Part II

On advice of the MySQL She-BA, I wanted to see what performance I could get with MERGE tables. In my particular application I'm going to need the RI I get with InnoDB, so it's not a solution, but as long as I am investigating anyway...

Seems simple enough, let me the MERGE table over my existing tables:

mysql> create table xyz_mt (
-> `id` bigint(10) NOT NULL default '0',
-> `report_dt` date default NULL,
-> `acct_id` varchar(8) default NULL,
-> `some_text` varchar(222) default NULL,
-> PRIMARY KEY (`id`)
-> ) ENGINE=MERGE UNION=(xyz_2007_01,xyz_2007_02,xyz_2007_03) ;
Query OK, 0 rows affected (0.08 sec)

mysql> select id, report_dt, acct_id
-> from xyz_mt
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
ERROR 1168 (HY000): Unable to open underlying table which is differently defined or of non-MyISAM type or doesn't exists

Doh! That's right, got to change to MyISAM.

mysql> alter table xyz_2007_01 engine=MyISAM;
Query OK, 1000000 rows affected (25.42 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

mysql> alter table xyz_2007_02 engine=MyISAM;
Query OK, 1000000 rows affected (17.82 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

mysql> alter table xyz_2007_03 engine=MyISAM;
Query OK, 1000000 rows affected (18.84 sec)
Records: 1000000 Duplicates: 0 Warnings: 0

Lets see how it works now.

mysql> select id, report_dt, acct_id
-> from xyz_mt
-> where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (2.43 sec)

Whoa, that's pretty good, I could live with that. Lets compare the inline view of MyISAM tables vs. InnoDB tables:

mysql> select * from (
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03) t
-> where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (15.17 sec)

Slightly faster than InnoDB, but not what I would call Significant. Definitely a viable option for partitioning in v5.0. Thanks to Sheeri for the top.

Thursday, April 26, 2007

MySQL Pseudo-partitioning

Got a MySQL project that needs a partitioned table. It's the ideal candidate really. Historical data that naturally would be partitioned by date. I need to keep an active window of data and delete old data quickly without doing DML.

Except partitioning is in 5.1 (more on that later) and I'm on 5.0.

One way I would overcome this limitation in Oracle Standard Edition would be to have multiple tables that hold a month worth of data and put a view on top of them. My users might notice a slight performance degradation, but it would be worth it to drop the data quickly.

So, I tried the same experiment in MySQL. I created three tables of this format:
CREATE TABLE `xyz_2007_01` (
`id` bigint(10) NOT NULL default '0',
`report_dt` date default NULL,
`acct_id` varchar(8) default NULL,
`some_text` varchar(222) default NULL,
PRIMARY KEY (`id`),
KEY `xyz_rdt_acct_2007_01` (`report_dt`,`acct_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


And then I stuffed them with a million rows each. If I query a particular piece of data on a particular date, I expect to use the "xyz_rdt_acct*" index.

mysql> select id, report_dt, acct_id 
-> from xyz_2007_01
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';

Empty set (0.00 sec)

mysql> select id, report_dt, acct_id
-> from xyz_2007_02
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';

+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (0.01 sec)

mysql> select id, report_dt, acct_id
-> from xyz_2007_03
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';

Empty set (0.00 sec)


OK, I'm looking for two rows in the 2007_01 "partition". Lets create a view on top of those tables to present one interface for queries.

mysql> create view xyz as
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03
-> ;
Query OK, 0 rows affected (0.00 sec)

And now we query from the view.

mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (44.36 sec)

Hmmm, that's not cool. 44 seconds to retrieve two rows. Lets try a sub-query just for kicks.
mysql> select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (33.47 sec)

About the same, but a little bit better. Lets see what is going on.


mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (43.18 sec)

Ah, three full table scans. Hmm, wonder why it's doing that. Lets do the same for the inline view.

mysql> explain select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (35.51 sec)

Basically the same thing. Lets see what happens if I only get the fields I want in the sub-query.
mysql> explain select id, report_dt, acct_id from (
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03) t
-> where report_dt = '2007-02-17'
-> and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (16.09 sec)

Well, look at that. 16 seconds still isn't great, but at least the query is using the indexes.

Lets create a view using the same logic.

mysql> drop view xyz;
Query OK, 0 rows affected (0.01 sec)

mysql> create view xyz as
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03;
Query OK, 0 rows affected (0.00 sec)

mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (16.70 sec)

Performance is consistent at least.

mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (15.99 sec)

Plan is consistent as well. Still, not a solution for me, but thought you might be interested in the findings.

The more I learn, the more questions I have. Tomorrow, I test partitioning on 5.1 beta.

Wednesday, April 11, 2007

Why aren't there more of us?

I started reading Frank Hayes' piece on H-1Bs and Students last night with doubt because the two groups represent two different experience levels. Students are entry level workers and workers on H-1B visa's are typically experienced professionals. But those entry level workers eventually want to move on to experienced positions where they will compete on an uneven playing field with H-1B workers.

Now, it's no surprise that a view in Computerworld is slanted towards importing more workers. It's a genuine concern in our industry and therefore an appropriate subject to report on. Yet, they consistently fall on the side of the debate that advocates more H-1B workers.

Frank examines the trends of Computer Science graduates and correlates it to the rise and fall of the DotCom Boom and Bust. Personally, I dismiss those cycles because they just enticed people who weren't passionate about the profession and were chasing the almighty dollar. I think kids that want to get into this business because they think they will like it are scared. Scared they will be in a six-figure job with a mortgage and two kids when some C level executive decides to offshore the whole operation to China. Scared that they'll end up being 40 and be replaced by some guy willing to take 1/2 of what they make.

Where do these crazy kids get these ideas? The national media as well as several trade publications shout it from the rooftops daily. As someone who hires IT workers regularly, I can tell you that's not the case. I'll typically go through a couple dozen resumes to fill one spot. I have a spot that went unfilled because I just couldn't find someone to fill it. Maybe you won't spend 20 years at the same company, but the work is there if you want it.

I think there is a flaw in Mr. Hayes' numbers as well. While CS graduates are most likely destined for the IT industry, there are so many other disciplines that turn out IT workers. Hiring managers need to look beyond the engineering programs and open their search up to all the majors. Some of my best people have come from respected science or business programs.

I don't think it's all gloom and doom for the IT industry in this country. The press may lead you to think that everything is being outsourced to somewhere else, but the fact is there's a lot of work out there for the taking.

Wednesday, April 04, 2007

10gR2/9iR2 Distributed bugs

Our environment is highly distributed on a bunch of nodes partitioned out by business area. When a business area needs information from another business area, we create a Materialized View and pull the data across a database link. Most systems don't need up-to-the-minute data and when they do, there's other ways to get it.

Now, we're in a somewhat unique position because we're migrating Solaris 9iR2 dbs to Linux 10gR2. In fact, we just migrated one this weekend and experienced anywhere from 100% to 500% increase in performance depending on how much database work was actually being done. However, we did encounter some issues with db links that you might want to be aware of when migrating.

The master in this case was a 9.2.0.5 db on Solaris. We upgraded it to 10.2.0.3 and moved to Linux at the same time (good old export/import). There were several "client" dbs that ran the gamut across Oracle versions and OS Platforms.

The first problem was with 9.2.0.8/Solaris trying to get data from the 10.2.0.3/Linux db. We kept running into an error while trying to refresh the Mviews:

ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-04052: error occurred when looking up remote object MYUSER.MVIEW_NAME@DBLINK_NAME
ORA-00604: error occurred at recursive SQL level 2
ORA-03106: fatal two-task communication protocol error
ORA-02063: preceding line from DBLINK_NAME
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858

Here, the problem stems from bug 5671074. In this bug, 9.2.0.8 has a problem using dblinks when the "Endianess" doesn't match between sites. Also check out ML Note 415788.1. There is a one-off patch for the 9.2.0.8 software that needs to be applied and it worked fine for us.

The second issue was a little more straight forward, but unfathomable how it didn't get caught in testing. Here, we have two 10.2.0.3 dbs trying to use an unauthenticated database link (ie. no username/password hard-coded into the definition). An OS Authenticated user can connect to both dbs, but when he tries to access an object across a database link, he is presented with ORA-01017: invalid username/password. This functionality has worked since 8i and it escapes me how it slipped through the test cycle. This too resulted in a bug, but in 10.2. Check out bug 4624183 and ML Note 339238.1 for the current status. As of this post, the bug is fixed in 11.0 and has been backported to some platforms (not mine, of course). The only work around that worked for us was to use authenticated db links with the username/password embedded in the db link. Yuck.

Sunday, April 01, 2007

My DST Nightmare

I had to be somewhere this morning at 9:00. It would take a little over an an hour to get there, so I figured I better leave by 7:30 just to be safe. That meant setting the alarm clock to 6:45.

6:45 came and the alarm went off as expected. I jumped into the shower, got dressed, and as I was heading out the door my wife says "Where are you going so early?" I told her and she rolled back over and went back to sleep.

Then I jumped into a car I usually don't drive. "Hmm, the clock says 6:32", I thought to myself, "must be I haven't driven this car for three weeks."

I reset the clock to the correct time and was off.

I got to my destination at 8:50 and there were only a couple people there. I was expecting about 60 or 70!

I approached the guy in charge and said "I must have read they flyer wrong..."

"Oh, that's today, but starts at 9:00" he says.

I looked at my phone and it said 7:50. 7:50, WTF?

Ah, yes, Daylight Savings Time. Not the "new" Daylight Savings Time, but the "old" Daylight Savings Time.

I have one of those "fancy" clocks that automatically adjusts for Daylight Savings Time. Except, I got it a couple years ago and it still thinks that the first Sunday in April is the right time to "spring forward".

Sure enough, I got home and my clock was an hour ahead of everything else. I promptly went back down to the car and reset the clock. Interestingly enough, my W2K box that was supposed to be patched with all the latest updates was also an hour ahead.