Thursday, March 09, 2006

Interesting ORA-00202

I have been getting an interesting ORA-00202 errors thrown to my alert.log during a backup on one of my databases. It's only happened a couple of times, but I am curious about the cause. The text of the ORA-00202 error is:

$ oerr ora 00202
00202, 00000, "controlfile: '%s'"
// *Cause: This message reports the name file involved in other messages.
// *Action: See associated error messages for a description of the problem.

While the rman level 0 backup is running, my monitoring software encounters the ORA-00202 error when trying to query v$log and dba_data_files. The message in the alert.log is:

Errors in file /oracle/admin/db1/udump/db1_ora_5953.trc:
ORA-00202: controlfile: '/u01/oradata/db1/control01.ctl'

When I look at the trace file, I see:

*** SESSION ID:(11.32) 2006-03-08 17:44:26.211
***
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Fractured block found during control file header read
Data in bad block -
type: 21 format: 2 rdba: 0x00000001
last change scn: 0xffff.0017641a seq: 0x1 flg: 0x04
consistency value in tail: 0x64191501
check value in block header: 0x24a8, computed block checksum: 0x3
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
*** 2006-03-08 17:44:27.245
ksedmp: internal or fatal error
ORA-00202: controlfile: '/u01/oradata/db1/control01.ctl'
Current SQL statement for this session:
SELECT file_name,
tablespace_name
FROM dba_data_files
WHERE autoextensible='YES'
AND maxblocks-blocks <= increment_by AND maxblocks-blocks <> 0

"Ah Ha", I say to myself, "must be a corrupt control file." I shutdown the database immediate and diffed the two control files. Although there were the same size, diff reported that they were different. Sure enough, a corrupt control file. So I copied my good control file over the bad one and restarted my database.

A couple days later, same problem. Same control file. The same control file seems unlikely unless there is an OS related error on that filesystem. I scour the messages file and there's no I/O related errors. I shutdown the database, diffed the control files again, and sure enough, they're different.

I recopied my good control file over my bad one again, started the database, switched logfiles, and shutdown immediate. Then I tested my theory that the control files should be the same and diff reported that they were the same. This time, however, I moved control01.ctl to another filesystem just for kicks to see what would happen.

My theory is that my monitoring software can't get a consistent view of the control file while rman is writing to it. At this point I'm unsure if the control file is corrupt or Oracle is just throwing the wrong error. We'll see what Oracle says...

Check back in 5 days (if I'm lucky)...

Update: I've done some testing and come to some conclusions.

1 comment:

Anonymous said...

Let me know if you need an interpreter :)