Got a MySQL project that needs a partitioned table. It's the ideal candidate really. Historical data that naturally would be partitioned by date. I need to keep an active window of data and delete old data quickly without doing DML.
Except partitioning is in 5.1 (more on that later) and I'm on 5.0.
One way I would overcome this limitation in Oracle Standard Edition would be to have multiple tables that hold a month worth of data and put a view on top of them. My users might notice a slight performance degradation, but it would be worth it to drop the data quickly.
So, I tried the same experiment in MySQL. I created three tables of this format:
CREATE TABLE `xyz_2007_01` (
`id` bigint(10) NOT NULL default '0',
`report_dt` date default NULL,
`acct_id` varchar(8) default NULL,
`some_text` varchar(222) default NULL,
PRIMARY KEY (`id`),
KEY `xyz_rdt_acct_2007_01` (`report_dt`,`acct_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And then I stuffed them with a million rows each. If I query a particular piece of data on a particular date, I expect to use the "xyz_rdt_acct*" index.
mysql> select id, report_dt, acct_id
-> from xyz_2007_01
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
Empty set (0.00 sec)
mysql> select id, report_dt, acct_id
-> from xyz_2007_02
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (0.01 sec)
mysql> select id, report_dt, acct_id
-> from xyz_2007_03
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
Empty set (0.00 sec)
OK, I'm looking for two rows in the 2007_01 "partition". Lets create a view on top of those tables to present one interface for queries.
mysql> create view xyz as
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03
-> ;
Query OK, 0 rows affected (0.00 sec)
And now we query from the view.
mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (44.36 sec)
Hmmm, that's not cool. 44 seconds to retrieve two rows. Lets try a sub-query just for kicks.
mysql> select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (33.47 sec)
About the same, but a little bit better. Lets see what is going on.
mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (43.18 sec)
Ah, three full table scans. Hmm, wonder why it's doing that. Lets do the same for the inline view.
mysql> explain select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (35.51 sec)
Basically the same thing. Lets see what happens if I only get the fields I want in the sub-query.
mysql> explain select id, report_dt, acct_id from (
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03) t
-> where report_dt = '2007-02-17'
-> and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (16.09 sec)
Well, look at that. 16 seconds still isn't great, but at least the query is using the indexes.
Lets create a view using the same logic.
mysql> drop view xyz;
Query OK, 0 rows affected (0.01 sec)
mysql> create view xyz as
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03;
Query OK, 0 rows affected (0.00 sec)
mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (16.70 sec)
Performance is consistent at least.
mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (15.99 sec)
Plan is consistent as well. Still, not a solution for me, but thought you might be interested in the findings.
The more I learn, the more questions I have. Tomorrow, I test partitioning on 5.1 beta.