Monday, September 19, 2005

MySQL to an Oracle DBA, Part I

As you may remember, I am at MySQL training this week. The first day was introductory material which included installing MySQL and query basics. Because we didn’t breeze through the Query portion, it gave me plenty of time to experiment.


SQL> select count(*) from country;

COUNT(*)
----------
239


mysql> select count(*) from country;
+----------+
| count(*) |
+----------+
| 239 |
+----------+
1 row in set (0.00 sec)


OK, that’s cool, that’s what I expected. Let’s do a simple aggregate function:


SQL> select continent, sum(population) TotalPop
2 from country
3 group by continent
4 order by continent
5 ;

CONTINENT TOTALPOP
------------------------------ ----------
Africa 784475000
Antarctica 0
Asia 3705025700
Europe 730074600
North America 482993000
Oceania 30401150
South America 345780000

7 rows selected.


OK, that’s what we expected. Let’s try the same query in MySQL:


mysql> select continent, sum(population) TotalPop
-> from country
-> group by continent
-> order by continent
-> ;
+---------------+------------+
| continent | TotalPop |
+---------------+------------+
| Asia | 3705025700 |
| Europe | 730074600 |
| North America | 482993000 |
| Africa | 784475000 |
| Oceania | 30401150 |
| Antarctica | 0 |
| South America | 345780000 |
+---------------+------------+
7 rows in set (0.00 sec)


Hmm, that’s interesting. Maybe it has something to do with default sorting order or something. I’ll have to ask Tobias (my Instructor) about that in the morning. Let’s try sorting by a number field:


SQL> select continent, sum(population) totalpop
2 from country
3 group by continent
4 order by totalpop desc
5 ;

CONTINENT TOTALPOP
------------------------------ ----------
Asia 3705025700
Africa 784475000
Europe 730074600
North America 482993000
South America 345780000
Oceania 30401150
Antarctica 0

7 rows selected.

mysql> select continent, sum(population) totalpop
-> from country
-> group by continent
-> order by totalpop desc
-> ;
+---------------+------------+
| continent | totalpop |
+---------------+------------+
| Asia | 3705025700 |
| Africa | 784475000 |
| Europe | 730074600 |
| North America | 482993000 |
| South America | 345780000 |
| Oceania | 30401150 |
| Antarctica | 0 |
+---------------+------------+
7 rows in set (0.02 sec)


Yeah, that’s we expected. Let’s try another query:

mysql> select continent, population totalpop
-> from country
-> group by continent
-> order by totalpop desc
-> ;
+---------------+----------+
| continent | totalpop |
+---------------+----------+
| South America | 37032000 |
| Africa | 31471000 |
| Asia | 22720000 |
| Europe | 15864000 |
| North America | 217000 |
| Oceania | 68000 |
| Antarctica | 0 |
+---------------+----------+
7 rows in set (0.00 sec)


Hmm, what’s weird about this query (assuming the data is correct)? Let’s try it in Oracle:

SQL> select continent, population totalpop
2 from country
3 group by continent
4 order by totalpop desc
5 ;
select continent, population totalpop
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression


That’s right, our old friend didn’t even run the query for us.

Another interesting tidbit was with the DISTINCT Operator. In Oracle, the optimizer may determine DISTINCT values by sorting, whereas MySQL can use one of two algorithms. For example:

SQL> select distinct continent from country;

CONTINENT
------------------------------
Africa
Antarctica
Asia
Europe
North America
Oceania
South America

7 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=239 Bytes=7
648)

1 0 SORT (UNIQUE) (Cost=6 Card=239 Bytes=7648)
2 1 TABLE ACCESS (FULL) OF 'COUNTRY' (TABLE) (Cost=5 Card=23
9 Bytes=7648)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
16 consistent gets
0 physical reads
0 redo size
665 bytes sent via SQL*Net to client
507 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
7 rows processed

mysql> select distinct continent from country;
+---------------+
| continent |
+---------------+
| Asia |
| Europe |
| North America |
| Africa |
| Oceania |
| South America |
| Antarctica |
+---------------+
7 rows in set (0.00 sec)


BTW, another participant in the class suggested that Oracle always sends back the results of DISTINCT in sorted order. I didn’t think so, but wasn’t able to disprove it. Maybe I learned something new about Oracle today.

Edit: 09/19/2005 23:33 -
Tom correctly points out in the comments that Oracle doesn't always use a sort to determine duplicates. If I had that looked on metalink at doc 655027.999 or 249919.999 I would have confirmed my suspicians about always using ORDER BY. Bang head on desk, Bang head on desk, ...

4 comments:

Thomas Kyte said...

If I've said it once, I've said it - more than once..

You want data sorted, you BETTER order it.

Besides, the sort distinct does is a BINARY (not character set) sort.... when and if it sorts.

ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select distinct y from t;

Execution Plan
----------------------------------------------------------
Plan hash value: 1793979440

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 12M| | 26861 (1)| 00:05:23 |
| 1 | HASH UNIQUE | | 1000K| 12M| 38M| 26861 (1)| 00:05:23 |
| 2 | TABLE ACCESS FULL| T | 1000K| 12M| | 22001 (1)| 00:04:25 |
-----------------------------------------------------------------------------------

hash unique...... hmmm.

I sure hope they don't have any code that relies on distinct sorting - it never had to, and it didn't do a characterset sort.

and in 10gr2 it will do it even more (not even remotely sorted)

partitioning, parallel, iot's, indexes - anything like that will affect this.

group by does not sort
disctinct does not sort

order by, now that sorts...

ops$tkyte@ORA10GR2> create table t ( x int, y int );

Table created.

ops$tkyte@ORA10GR2> insert into t values ( 1, -2 );

1 row created.

ops$tkyte@ORA10GR2> insert into t values ( 2, 1 );

1 row created.

ops$tkyte@ORA10GR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> select distinct y from t;

Y
----------
1
-2

ops$tkyte@ORA10GR2> select distinct y from t order by y;

Y
----------
-2
1

Gary Myers said...

Here's a simple example for 9iR2 (okay, technically it is sorted, but by col_2, col_1 which is different from the col_1, col_2 specified in the select).

drop table test;

create table test (col_1 varchar2(40), col_2 varchar2(40), data_col char(2000) default 'X');

create unique index x1 on test (col_2, col_1);

insert into test (col_1, col_2)
select distinct owner, object_name
from dba_objects
where object_id is not null;

execute dbms_stats.gather_table_stats(user,'TEST',null,100,null,'FOR ALL COLUMNS')

select distinct col_1, col_2 from test t where col_2 like 'ALL%';

Peter K said...

The DISTINCT clause reminds me of DKB and his approach to undocumented parameters.

Because I have done a number of DISTINCT queries and the results all seems to come back in sorted order, thereby using DISTINCT will sort the results for you

Tom's approach would be:
Let's verify that DISTINCT do actually SORT the results in order. An EXPLAIN PLAN later shows that although it does sort but not in the normal sort but rather a binary sort which is different. Thereby to guarantee sorted order, always use the ORDER BY clause.

Arjen Lentz said...

Ordering the continents... in the "world" sample database, the continents column is of the ENUM type. And ENUM fields are ordered in the order they have been defined. So when you do SHOW CREATE TABLE or SHOW COLUMNS, you will see that your output corresponds with that.

When teaching a course using this sample db, inevitably people notice, which makes sure they never forget ;-). To order an enum field alphabetically, ORDER BY CONCAT(col) or use CAST().

I personally don't like ENUMs and hardly ever use them in apps.