Tuesday, January 30, 2007

If it could all possibly go wrong...

If it could all possibly go wrong... it did.

I upgraded one of my databases from 9.2.0.7 to 10.2.0.2 this past weekend. What normally should have taken about 2 hours took 9. Everything that could have possibly gone wrong with an upgrade did, and I’m just glad that I was able to finally get it done and open for business without having to recover it and go back to the drawing board. For those of you who are still in the 9i world and are going to be upgrading here in the future please for God’s sake bookmark this blog entry or the Metalink notes provided. Maybe it will save you, maybe it wont….

It all started by using the DBUA. I was directed to use this wonderful and fantastic tool instead of manually doing it. Someone thinks watching a progress bar is much cooler than watching code go flying by.

According to the DBUA and associated logs after the upgrade everything was just dandy! I shut the database down poked around a bit giving myself a peace of mind then fired it backup.

BAM!
ORA-00604: error occurred at recursive SQL levelError 604 happened during db open, shutting down database

USER: terminating instance due to error 604Instance terminated by USER, pid = 13418
ORA-1092 signalled during: ALTER DATABASE OPEN

I spent the next 30 minutes trying everything I could think of as well as scouring Metalink for some kind of clue as to what was causing this. I came up with Nada, zip, nothing! I had to break down and open up a case with Oracle. After sending them all the logs and explanations, they came back with “go run the catupgrd.sql” script and see if it fixes it. Why, I have no clue but it fixed the problem and I was able to open the database.

Database is open all looks well, ok time to enable the database jobs (mainly my stats pack job).

BAM!
ORA-12899: value too large for column "PERFSTAT"."STATS$SGASTAT"."POOL"(actual: 12, maximum: 11)
ORA-06512: at "PERFSTAT.STATSPACK", line 2065ORA-06512: at PERFSTAT.STATSPACK", line 91ORA-06512: at line 1
WARNING: Detected too many memory locking problems.WARNING: Performance degradation may occur.

I figured the easy fix for this one was to simply go in and blow away perfstat and recreate him again. New database, new stats hmmm why not? Confirmed with Oracle since I still had the case open with them and they confirmed that that was the best course of action as well. $ORACLE_HOME/rdbms/admin/spdrop.sql $ORACLE_HOME/rdbms/admin/spcreate.sql easy enough.

Ok, Database is up Perfstat is fixed no errors in the alertlog, jobs are enabled, alright lets go ahead start the listener and let some feeds come in.

BAM!
ORA-04065: not executed, altered or dropped stored procedure"PUBLIC.DBMS_OUTPUT"ORA-06508: PL/SQL: could not find program unit being called:"PUBLIC.DBMS_OUTPUT"

Ohhh this is a good one. Apparently according to Metalink this error occurs because this package got its memory status broken at some point in time. When the package is recompiled, this status is cleared but (big but) the situation can still occur. The way to fix this issue is to run the utlirp.sql (not to be confused with the utlrp.sql) Check out Metalink note 370137.1 and it will explain it all.


Alright, utlirp.sql has been ran database got bounced (again) everything is looking good, feeds look good. Everything is satisfactory let’s back this puppy up.

BAM!

RMAN-03009: failure of backup command on ORA_DISK_3 channel
RMAN-10038: database session for channel ORA_DISK_3 terminated unexpectedly
channel ORA_DISK_3 disabled, job failed on it will be run on another channel

ORA-07445: exception encountered: core dump [memcpy()+40] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []

This is caused because the Controlfile autobackup parameter is off in the RMAN configuration. A nifty little bug, Check out note 399395.1 for details.

The parameter is in the “on” position… back the database up……. Good backup.! Let’s open her up for business!!

Next morning I come in check the alert log ok, ok, good, uhh huuu,…
BAM!
ORA-00054: resource busy and acquire with NOWAIT specified
ORA-06512: at "SYS.DBMS_STATS", line 13313
ORA-06512: at "SYS.DBMS_STATS", line 13659
ORA-06512: at "SYS.DBMS_STATS", line 13737
ORA-06512: at "SYS.DBMS_STATS", line 13696
ORA-06512: at "SYSTEM.DB_SCHEMA_STATS", line 7
ORA-06512: at line 1

Another bug only thing is I’m kind of confused because the note says 10.2.0.1.0 and we went to 10.2.0.2. Also it says the only way to analyze the object(s) in question is to do it during next scheduled maintenance or gather the stats before any refresh operation. Uhh sorry to tell you guys but I need to run stats at least twice a week with the amount of transactions I’m doing, and my refresh job runs like every 5 minutes. Got to think of an alternative. Note:377183.1


I’m done..think I’m going to go and take my aggressions out and play some World of Warcraft.

10 comments:

Jeff Hunter said...

Ouch. What happened when you upgraded in test?

Don said...

What a nightmare. Sorry to say that I used DBUA to go from 9.2.0.4 to 10.2.0.2 and it went just fine. Although we don't use statspack and we weren't using RMAN in 9i, perhaps that spared me.

OracleDoc said...

Yeah, Jeff I guess that would be important to add wouldn't? hee hee

Well, out of 5 upgrades before this one, all of them went extreamly smooth hince the scheduled 2 hour down time to get it done. Not one, I repeat NOT ONE of the other upgrades did I have any problems with.

Jeff Hunter said...

just curious, that's all... ;)

Noons said...

at least yours worked after a while!

here I just had to remove xdb off a 10g install - put there by dbua - because otherwise no one would be able to get anything out of ALL_OBJECTS...

APC said...

Out of interest Noons, what precise problem did the XDB cause you? Only its enhanced XML features came up in conversation yesterday as a potential reason for upgrading to 10g.

Cheers, APC

Noons said...

@apc

exactly what I said: if we install xdb, it stuffs up access to ALL_OBJECTS. Basically, any query involving that view directly or indirectly becomes incredibly slow.

So much that the query never returns - well, I gave up waiting after letting a simple select run for a couple of hours, so "never" may be too long a word.

It's a known problem in 10gr2 with a Metalink call open on it. No solution, a couple of workarounds only.

One is to remove ANY stats on schema SYS. Dind't work for us.

The other is to remove XDB. That one worked.

Anonymous said...

Is this the same procedure that you used to upgrade the test version of this system?

What changed between upgrading the test system and this version of the database?

Normally problems are noticed and found on the test system to help reduce your exposure to discovering problems in the production implementation.

Use the DBUA? Not exactly new advice. At least no one has ever been burned by following that path eh?

OracleDoc said...

Anonymous....
To answer your questions:

"Is this the same procedure that you used to upgrade the test version of this system?"

The exact same procedure was use all through out our dev, test, and other various databases. Out of all the upgrades we did, this particular database gave me all of the above grief.

"What changed between upgrading the test system and this version of the database?
Nothing.

"Normally problems are noticed and found on the test system to help reduce your exposure to discovering problems in the production implementation."

Well, yeah common sense would tell you that.

Sounds like you're trolling.. and I'm not going to take your bait.

Have that Oracle kind of day!

Anonymous said...

Hi, new to this forum...forgive me if I'm a bit late with some comments.

I run RMAN backups on a 10.2.0.2 database using Netbackup to Tape. I get the following error:
"ORA-19511: Error received from media manager layer, error text:
VxBSACreateObject: Failed with error:
Server Status: cannot connect on socket
channel t6 disabled, job failed on it will be run on another channel"

I know it has nothing to do with your upgrade error, I searched google for the RMAN error and came accross your errors.

The job fails, but the fact that it says it will use another channel to backup makes me wonder if it really failed or is it only a warning?

Is there a way I can verify the backup piece in RMAN?

Thanks,