Except partitioning is in 5.1 (more on that later) and I'm on 5.0.
One way I would overcome this limitation in Oracle Standard Edition would be to have multiple tables that hold a month worth of data and put a view on top of them. My users might notice a slight performance degradation, but it would be worth it to drop the data quickly.
So, I tried the same experiment in MySQL. I created three tables of this format:
CREATE TABLE `xyz_2007_01` (
`id` bigint(10) NOT NULL default '0',
`report_dt` date default NULL,
`acct_id` varchar(8) default NULL,
`some_text` varchar(222) default NULL,
PRIMARY KEY (`id`),
KEY `xyz_rdt_acct_2007_01` (`report_dt`,`acct_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And then I stuffed them with a million rows each. If I query a particular piece of data on a particular date, I expect to use the "xyz_rdt_acct*" index.
mysql> select id, report_dt, acct_id
-> from xyz_2007_01
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
Empty set (0.00 sec)
mysql> select id, report_dt, acct_id
-> from xyz_2007_02
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (0.01 sec)
mysql> select id, report_dt, acct_id
-> from xyz_2007_03
-> where report_dt ='2007-02-17'
-> and acct_id = 'X0000741';
Empty set (0.00 sec)
OK, I'm looking for two rows in the 2007_01 "partition". Lets create a view on top of those tables to present one interface for queries.
mysql> create view xyz as
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03
-> ;
Query OK, 0 rows affected (0.00 sec)
And now we query from the view.
mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (44.36 sec)
Hmmm, that's not cool. 44 seconds to retrieve two rows. Lets try a sub-query just for kicks.
mysql> select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (33.47 sec)
About the same, but a little bit better. Lets see what is going on.
mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (43.18 sec)
Ah, three full table scans. Hmm, wonder why it's doing that. Lets do the same for the inline view.
mysql> explain select id, report_dt, acct_id from (
-> select * from xyz_2007_01
-> union all
-> select * from xyz_2007_02
-> union all
-> select * from xyz_2007_03) t
-> where t.report_dt ='2007-02-17'
-> and t.acct_id = 'X0000741';
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | ALL | NULL | NULL | NULL | NULL | 788670 | |
| 3 | UNION | xyz_2007_02 | ALL | NULL | NULL | NULL | NULL | 1014751 | |
| 4 | UNION | xyz_2007_03 | ALL | NULL | NULL | NULL | NULL | 1096175 | |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+---------+-------------+
5 rows in set (35.51 sec)
Basically the same thing. Lets see what happens if I only get the fields I want in the sub-query.
mysql> explain select id, report_dt, acct_id from (
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03) t
-> where report_dt = '2007-02-17'
-> and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (16.09 sec)
Well, look at that. 16 seconds still isn't great, but at least the query is using the indexes.
Lets create a view using the same logic.
mysql> drop view xyz;
Query OK, 0 rows affected (0.01 sec)
mysql> create view xyz as
-> select id, report_dt, acct_id from xyz_2007_01
-> union all
-> select id, report_dt, acct_id from xyz_2007_02
-> union all
-> select id, report_dt, acct_id from xyz_2007_03;
Query OK, 0 rows affected (0.00 sec)
mysql> select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----------+------------+----------+
| id | report_dt | acct_id |
+----------+------------+----------+
| 20000741 | 2007-02-17 | X0000741 |
| 20967398 | 2007-02-17 | X0000741 |
+----------+------------+----------+
2 rows in set (16.70 sec)
Performance is consistent at least.
mysql> explain select id, report_dt, acct_id from xyz where report_dt ='2007-02-17' and acct_id = 'X0000741';
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 3000000 | Using where |
| 2 | DERIVED | xyz_2007_01 | index | NULL | xyz_rdt_acct_2007_01 | 15 | NULL | 788670 | Using index |
| 3 | UNION | xyz_2007_02 | index | NULL | xyz_rdt_acct_2007_02 | 15 | NULL | 1014751 | Using index |
| 4 | UNION | xyz_2007_03 | index | NULL | xyz_rdt_acct_2007_03 | 15 | NULL | 1096175 | Using index |
|NULL | UNION RESULT | union2,3,4 | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+----------------------+---------+------+---------+-------------+
5 rows in set (15.99 sec)
Plan is consistent as well. Still, not a solution for me, but thought you might be interested in the findings.
The more I learn, the more questions I have. Tomorrow, I test partitioning on 5.1 beta.
3 comments:
select id, report_dt, acc_id from xyz_2007_01 where report_dt ='2007-02-17' and account_id = 'X0000741'
-> union
select id, report_dt, acc_id from xyz_2007_02 where report_dt ='2007-02-17' and account_id = 'X0000741'
-> union
-> select id, report_dt, acc_id from xyz_2007_03 where report_dt ='2007-02-17' and account_id = 'X0000741';
would that work for you ?
Including the where clause criteria is each of the UNION statements increases the speed drastically
It does, but how do you put that into a views?
Whoa! No need to go through all that trouble. Change all the historical tables to MyISAM and use a MERGE table:
CREATE TABLE xyz_all (
`id` bigint(10) NOT NULL default '0',
`report_dt` date default NULL,
`acct_id` varchar(8) default NULL,
`some_text` varchar(222) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MERGE UNION=(xyz_2007_01,xyz_2007_02,xyz_2007_03,xyz_2007_04) ;
and then just query the merge table. Alternatively, make the merge table include all but the current month, make the current month table InnoDB and do a query with a UNION of the MERGE and InnoDB tables to get the best of both worlds.
Post a Comment