Monday, June 23, 2008

Something to be aware of, Part II

We've passed the one month anniversary for Something to be aware of (10.2.0.4) and still no meaningful results. I've been going back and forth with Oracle Support about switching NUMA off. In a synopsis:

OS: Switch NUMA off and retest.
Me: But that never changed.
OS: Switch NUMA off and retest.
Me: We take kernel changes very seriously. It will take approximately 2 months to re-certify that configuration. Besides, that NUMA didn't change.
OS: Switch NUMA off and retest.
Me: Please direct me to the note that says 10.2.0.4 requires NUMA changes.
OS: Switch NUMA off and retest.
Me: I've already proven 10.2.0.3 works as expected in this configuration, NUMA settings never changed.
OS: Switch NUMA off and retest.


My next step is talking to a Duty manager. I really don't want to do that.

Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.

7 comments:

Jason arneil said...

IMHO, I think you should escalate for all it's worth, this seems like a real screw up by Oracle development.

I wonder if you can tell me if you are using Hugepages?

Just about to go to 10.2.0.4 myself from 10.2.0.3, it'll be a bad day if we encounter similar problems!

jason.

Unknown said...

I don't know if it's something, but in 11g memory can be allocated both from shared memory and from tmpfs. That's definitely a change. In 11g tmpfs allocation of shared memory is needed for max_memory parameter.

Anonymous said...

I would suggest escalating to the Duty Manager and expressing the concern. At the same time, if you have a test environment where it can be tested, then it will help.Anyways once you will setup this system too, Oracle Development would say to give a testcase including init.ora , etc ;) So escalating makes sense ..

Cheers

Anonymous said...

if this is a Proliant box you can also disable NUMA in the BIOS which is ever-so-slightly different than booting linux with numa-off.

what does numactl -hardware show directly before you try to start up the SGA and what is in your init.ora?

Michael said...

Jeff,

Any word from oracle development on a fix? Have they agree its something on their end and not redhat/linux?

We will be increasing our sga as our db grows so this issue is a serious one. Please keep us posted on what oracle comes up with.


Michael

Jeff Hunter said...

No word yet. However, Kevin Closson has been able to get a 16G SGA started on HP/AMD Opteron, so it's possible, just not obvious what is wrong. More updates as I get them...

Michael said...

Jeff,

Did you notice Mary's ipcs -m result?

$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 644 72 2
0x00000000 32769 root 644 16384 2
0x00000000 65538 root 644 280 2
0x1420f290 393219 oracle 600 17702060032 12


Its a single share memory segment for oracle. I've have 4 segment used by oracle and in your case you seem to have quite a few more.

Wonder how Mary got oracle to use a single shared-memory segment like 10.2.0.3?

Michael