Saturday, May 30, 2009

information_schema.tables.table_rows

Here's something to be aware of. The table_rows column in the information_schema.tables table does not accurately reflect the number of rows in the actual table.

This weekend I was moving a db from one host to another. As a good DBA, I took the row counts from each table I was moving so I could check it on the other side. When the copy was done, I queried information_schema.tables again and dumped the results to a file. Running the two files through diff, I found that almost every table had a different number of rows.

I knew that couldn't be possible because I used a method where nobody had access to these tables when I was doing the copy.

On serverA, the information_schema reported that table1 had 2337 rows. On serverB, the information_schema reported that table1 had 2341 rows after the move. Doing a "SELECT count(*) FROM table1" yeilded 2343 rows on both serverA and serverB.

5 comments:

Mohammad Lahlouh said...

do u talk about Myisam engine?

Mark Robson said...

This is a documented feature - the information_schema gives an estimate of the number of rows (only) in a InnoDB table.

It does a number of random index dives to estimate the cardinality of some unique index (presumably the primary key is a good candidate).

This gives an estimate which is good for the optimiser, but lousy if you want to know the exact number of rows.

The reason it doesn't count the rows is that that would involve determining exactly which ones were visible to the current transaction, which is expensive.

MyISAM has the luxury of not needing to worry about which rows are visible in which transaction (it doesn't support transactions)

Shlomi Noach said...

Moreover,

Try and read the TABLE_ROWS value for a given InnoDB table 10 times, and find such deviation as 5x times larger or smaller values (my own experience; numbers may vary)

rpbouman said...

Hi!

I believe the same metric is also used by the optimizer. Basically, the value for TABLE_ROWS is an estimate that is supplied by the storage engine. The estimate need not be exact (that's up to the storage engine!) but the general idea is that the engine can provide this metric really quick in comparison to performing an exact count.

I know for sure MyISAM does provide exact values. I know for sure that InnoDB does not provide exact values (if it would, it would take considerably longer to produce the data).

I am unsure what the other engines do, but the point is, if you need exact counts, this is not the way to do it - always use SELECT COUNT(*) queries to obtain that data.

shantanu said...

So many people find it accidentally, doesn't mysql has some sort of "read this before using MySQL" paper?