Friday, March 30, 2007

The final nail for shared servers

I've been running shared servers and dedicated servers in a mixed mode for a number of years. In theory, there are a lot of advantages to shared servers; connection pooling, connection multiplexing, and pre-established connections.

I like to setup one database alias for dedicated connections (middle-tiers, batch jobs, etc.) and one for shared connections (CGI scripts, mod_plsql, and Client/Server sessions that have a lot of "think" time). Each alias then points to a specific listener that is either setup as a local listener or is a dispatcher. Granted, tracing is never easy on a session that is connected to a shared server. But, in general, it worked pretty good.

About a year ago I had a problem where my multiplexed connections periodically got disconnected. We tracked it down to an issue with multiplexing (implemented in shared servers) on 9.2 and Linux. For this particular application, shared servers weren't critical, so we just rerouted the alias to a dedicated listener.

Over the past 6 months or so, we've been experiencing in-memory block corruption in 9.2 dbs on 64-bit Linux. It was on multiple hosts, multiple dbs, and different patch levels of Oracle. Since the problem wasn't consistent and disguised itself as multiple ORA-00600/07445 errors we, nor Oracle Support, could ever track the root cause. Time would go by without error, the TAR would get closed, and we would experience a different ORA-00600 error.

After much prodding, we finally got an analyst at Oracle that understood what we were up against and looked at ALL our TARs, not just the current problem. They pieced together everything and pointed to bug 5324905; memory corruption from shared servers.

While they are great in theory, the disadvantages of shared servers outweigh the advantages at this point.

Thursday, March 22, 2007

The beauty of having your business logic in the database...

You may recall that I banged together a small application using Apex (nee HTMLDB) a few months ago. I got a call from a user the other day that something wasn't working quite right. Seems as one of the pages was giving them an incorrect result. I hadn't touched this particular application in about 2 months, but I knew it pulled information from a view I created just to make the Apex interface easy. The view called a piece of PL/SQL code that calculated a particular metric on-the-fly and due to some business rules, that calculation changed. It was a business-wide change, so I consulted the parties that use this package and we all agreed on the change to be made. I made the change and everybody was happy. I didn't have to get back into Apex or anything, just a simple PL/SQL change and I was done.

For those people that thought they had a better idea and rewrote the business logic in their Java middle-tier, good luck.

Monday, March 12, 2007

ORA-01009 During Gap Resolution

Encountered an interesting error with an Oracle 10.2.0.3 physical standby database.

When the standby database encounters a gap in the archivelog sequence, it tries to resolve the gap but encounters an ORA-01009: missing mandatory parameter. The interesting thing is that the standby eventually resolves the gap and gets caught up.

At first, my DB's are in sync. The standby db is applying logs as they come in without issue:

Completed: ALTER DATABASE RECOVER managed standby database disconnect
Fri Mar 9 15:27:59 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 17076
RFS[1]: Identified database type as 'physical standby'
Fri Mar 9 15:27:59 2007
RFS LogMiner: Client disabled from further notification
RFS[1]: No standby redo logfiles created
RFS[1]: Archived Log: '/u01/oraarch/db1/1_2043_612798267.dbf'
Fri Mar 9 15:28:08 2007
Media Recovery Log /u01/oraarch/db1/1_2043_612798267.dbf
Media Recovery Waiting for thread 1 sequence 2044
Fri Mar 9 15:28:14 2007
RFS[1]: No standby redo logfiles created
RFS[1]: Archived Log: '/u01/oraarch/db1/1_2044_612798267.dbf'
Fri Mar 9 15:28:22 2007
Media Recovery Log /u01/oraarch/db1/1_2044_612798267.dbf
Media Recovery Waiting for thread 1 sequence 2045

Then, a gap is artificially generated by deferring log_archive_dest_2 log shipping at log 2045. When log shipping is enabled again, my standby recognizes the gap, but generates the ORA-01009 error.

Media Recovery Waiting for thread 1 sequence 2045
Fetching gap sequence in thread 1, gap sequence 2045-2045
FAL[client, MRP0]: Error 1009 fetching archived redo log from db1.us
Fri Mar 9 15:28:29 2007
Errors in file /oracle/app/oracle/admin/db1/bdump/db1_mrp0_16203.trc:
ORA-01009: missing mandatory parameter
Fri Mar 9 15:28:30 2007

Somehow, though, the gap is resolved and the logs are applied.

RFS[1]: No standby redo logfiles created
RFS[1]: Archived Log: '/u01/oraarch/db1/1_2045_612798267.dbf'
Fri Mar 9 15:28:43 2007

The mrp0.trc file doesn't give me very many clues either:

/oracle/app/oracle/admin/db1/bdump/db1_mrp0_16203.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning and Data Mining options
ORACLE_HOME = /oracle/app/oracle/product/10.2.0.3
System name: Linux
Node name: dev101
Release: 2.6.9-42.0.3.EL
Version: #1 Mon Sep 25 17:14:19 EDT 2006
Machine: i686
Instance name: db1
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 16203, image: oracle@dev101 (MRP0)

*** SERVICE NAME:() 2007-03-09 15:26:53.586
*** SESSION ID:(153.1) 2007-03-09 15:26:53.586
ARCH: Connecting to console port...
*** 2007-03-09 15:26:53.586 61283 kcrr.c
MRP0: Background Managed Standby Recovery process started
*** 2007-03-09 15:26:58.586 1102 krsm.c
Managed Recovery: Initialization posted.
*** 2007-03-09 15:26:58.586 61283 kcrr.c
Managed Standby Recovery not using Real Time Apply
Recovery target incarnation = 1, activation ID = 207497211
Influx buffer limit = 2245 (50% x 4491)
Successfully allocated 2 recovery slaves
Using 543 overflow buffers per recovery slave
Start recovery at thread 1 ckpt scn 39582826471 logseq 2043 block 2
*** 2007-03-09 15:26:58.663
Media Recovery add redo thread 1
*** 2007-03-09 15:26:58.663 1102 krsm.c
Managed Recovery: Active posted.
*** 2007-03-09 15:26:58.930 61283 kcrr.c
Media Recovery Waiting for thread 1 sequence 2043
*** 2007-03-09 15:28:08.945
Media Recovery Log /u01/oraarch/db1/1_2043_612798267.dbf
*** 2007-03-09 15:28:12.150 61283 kcrr.c
Media Recovery Waiting for thread 1 sequence 2044
*** 2007-03-09 15:28:22.151
Media Recovery Log /u01/oraarch/db1/1_2044_612798267.dbf
*** 2007-03-09 15:28:24.353 61283 kcrr.c
Media Recovery Waiting for thread 1 sequence 2045
*** 2007-03-09 15:28:29.353 61283 kcrr.c
Fetching gap sequence in thread 1, gap sequence 2045-2045
*** 2007-03-09 15:28:29.470 61283 kcrr.c
FAL[client, MRP0]: Error 1009 fetching archived redo log from db1.us
ORA-01009: missing mandatory parameter
*** 2007-03-09 15:28:59.471
Media Recovery Log /u01/oraarch/db1/1_2045_612798267.db

At first I thought this might be NLS related, but both primary and standby hosts are configured similarly. The oerr tends to indicate some OCI/Precompiler issue, but I would think Oracle knows how to use their own libraries. Still trying to figure this one out...

Tuesday, March 06, 2007

Targeted Marketing

You know those coupons you get at the end of your register receipt at the grocery store? You know the kind, you buy Charmin and they give you $.25 of Scotts. I went to CVS the other day to pickup a bag of M&Ms for an afternoon snack. I check the bottom of the coupon since occasionally they'll give you a $.50 off your next purchase. What was on the bottom but a coupon for $1.00 of my next purchase of Playtex Tampons. Methinks somebody needs to tune their Datawarehouse...

Monday, March 05, 2007

If it's good enough for the DOT...

Make no mistake about it, I'm not running out to "upgrade" to Windoz anything as long as my heart is pumping. However, if you're stuck in a Windoz environment, here's a unique take on Windows Vista upgrades; just don't do it.

On a side note, I can see six network administrators standing over a PC with one PC-Tech replacing the hard drive at the DOT. Or maybe that's just stereotypical....

Sunday, March 04, 2007

Oracle's January CPU update

Hi all!
Thought I'd share a potential problem solver for the latest Jan CPU patch for Oracle. If you're running Solaris 10, the patch could possibly fail because Solaris 10 adds "a few bytes"to the size of the objects and opatch bombs during the validation phase.

Two ways to solve this... upgrade your opatch version to 1.0.0.0.56 (or better) or Set the environment variable OPATCH_SKIP_VERIFY=TRUE before you start. Personally I'd opt to go get the latest version of OPatch Patch 4898608 (considering Oracle just updated it 14-feb-2007)

Check out note:397342.1 for further details.

Yeah, I know it's March already I don't want to hear it...