Tuesday, May 13, 2008

Something to be aware of (10.2.0.4)

My standard install of 10.2.0.3 included 23 patches at last count. When the 10.2.0.4 patchset came out a few weeks ago I checked it and decided that since my 23 patches were included in 10.2.0.4, it would be a good idea to upgrade to that version.

I first got underway in my development environment on x86_64. I have several small dbs with 2G SGAs on a single box. The patch/upgrade process went flawless and I encountered no issues at all during the upgrade.

I usually like to run the patch in development for a month or so before I apply it to my production enviornment. However, I was fighting a nasty bug on one of my 10.2.0.3 production dbs that was fixed in 10.2.0.4 but a backport wasn't available to 10.2.0.3 at the time. So after a week in development, I decided to upgrade that production db to 10.2.0.4. That db usually runs with a 16G SGA and when I tried to start the instance with the 10.2.0.4 software, I got the venerable:

SQL> startup
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device

OK, so I know that usually means that my kernel parameters are off somewhere. It was a Saturday and I didn't have a sysadmin available, so I bumped the SGA down to 12G and the instance started right away and upgraded fine. I figured there was some shared memory parameter that I would research on Monday and we'd be back in business the next week.

I looked at Note 301830.1 thinking maybe my shmall paramter was off, but sure enough I had enough configured. I submitted a TAR and basically let Oracle Support stew on it for a few days.

In the meantime, another DBA in my group got the task to setup a new x86_64 box with 10.2.0.4 so we can move a db. He got the software installed OK, but couldn't start an instance with an SGA of more than about 2G of memory. The kernel parameters were exactly the same as my other box, but he still couldn't start an instnace with any decent amount of memory.

This was just too co-incidental. So we played around with the 10.2.0.4 installation a little more and found that 10.2.0.4 was allocating multiple shared memory segments instead of just one big segment. I used ipcs to find my shmid for the SGA and then used pmap to find out that to allocate a 2G SGA, 10.2.0.4 used about 260 shared memory segments of between 15M and 5M.

Then we installed 10.2.0.3 and tried to start with the SAME EXACT parameter file, and sure enough, one shared memory segment of 2G. In fact, we could allocate almost all the way up to our shmall in one segment.

I'm bug 7016155 and I know others have run into this problem as well. I'm sure bug 7019967 think's he's alone in this, but he's not. As usual, my bug has been sitting out there for about 12 days and nobody has looked at it.

Part II

Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.

9 comments:

Joel Garry said...

Not knowing anything about it as usual isn't going to stop me from wondering out loud.

So I'm wondering if there are two ways of setting shmmax in linux, and if Oracle is looking at both, and screwing up the vulgar one.

So what is shmmax in /proc/sys/kernel/shmmax, as well as /usr/src/linux/include/asm/shmparam.h ?

Can you mess with the latter and rebuild the kernel?

Or maybe they did something real stupid and forgot about huge pages. You're using huge pages, right?

I'm sure you've seen Note:301830.1 (credit http://freekdhooge.wordpress.com/2007/11/11/linux-unix-kernel-parameters/ )

Jeff Hunter said...

Yeah, my shmmax is set at about 25G or about 80% of my physical RAM. The shmall is also correct at my shmmax / page size. I am very intimate with note 301830.1, but according to that note, I am doing everything correct. I can almost buy the idea that my kernel params are off, but I can start a 10.2.0.3 instance just about to my shmmax. Good idea about the kernel source, though, I need to check that out.

Noons said...

Sounds nasty, this one.
Let's hope it's just an install option:

this is just the kind of thing that can de-rail a new release from being widely adopted!

No one likes this sort of stealth change to how db s/w interacts with the OS...

Anonymous said...

Just wondering if you see this problem with numa turned off. Thats one area i would look into.

Jeff Hunter said...

Anon, I'll have to check that out.

Jeff Hunter said...

Oh boy, they've asked for my kernel version. Anybody see where this is going?

Anonymous said...

Dear

I recently installed oracle 10.2.0.4 - 64 bit on a redhat linux (x86_64 release 4 update 6 kernel 2.6.9-67.0.7.ELsmp).

The database I created is running with about 7G of memory:
SQL> show parameter sga_target

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
sga_target big integer 7008M
SQL> show sga

Total System Global Area 7348420608 bytes
Fixed Size 2095280 bytes
Variable Size 1207961424 bytes
Database Buffers 6123683840 bytes
Redo Buffers 14680064 bytes

When using ipcs I also see my sga is devided in more segments (3 in stead of 1 segment), but not in as much segments as yours:

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 589825 oracle 640 16777216 21
0x00000000 622594 oracle 640 4294967296 21
0x1076b5cc 655363 oracle 640 3038773248 21

pmap shows me the link to the segments:

pmap 11784
0000000060000000 16384K rw-s- [ shmid=0x90001 ]
0000000080000000 4194304K rw-s- [ shmid=0x98002 ]
0000000180000000 2967552K rw-s- [ shmid=0xa0003 ]

No additional patches have been installed for the oracle software.

I'll hope this information can help you further concerning the case you've opened at oracle.

Regards

Jeff Hunter said...

Anon,

That's interesting. You don't by chance have the same info for a 10.2.0.3 instance on the same box, do you? I suspect you'll find only one shmseg in that case.

Anonymous said...

Dear

The results for the 10.2.0.3 with 6000M of SGA allocated:

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x425a9b80 458754 oracle 640 6293553152 16

It's just one segment of 6G.

Regards