Tuesday, June 24, 2008

Part III

I escalated my issue to a duty manager and explained that in my environment those types of changes are very difficult. He kicked it back to development for alternate solutions.

In the background, I managed to secure a test box for a couple hours that I could play with. While researching how to turn numa=off, I ran across this on Redhat's site:
The Red Hat Enterprise Linux 4 Update 1 kernel automatically disables NUMA optimizations (numa=off) by default on systems using the AMD64 dual core processor. This ensures stable operation across multiple systems where each has a different system BIOS implementation for reporting dual core processors.

Hmm, that's fishy. Why would they ask me to turn it off if the default value is off? So I did it anyway.

Same thing, good old ORA-27102: out of memory.

As long as I had the box booted with numa=off, I might as well try _enable_numa_optimization=false, right? No dice, same error.

Last, but not least, I tried their suggestion of setting vm.nr_hugepages=0 in the sysctl.conf. But I found out that's already my value:
sysctl -a | grep vm.nr_huge
vm.nr_hugepages = 0

OK, so now what?

Maybe 11g is an option...

Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.

11 comments:

Jason Williams said...

I am very curious to see what the outcome of this is. I am very anxious to see what Oracle has to say. This is a real dandy.

Noons said...

I know it sounds silly, but: can you check the paging size? Linux in general requires paging partition to be larger than main memory, preferably twice the size.

I've hit the stupidest errors when it was not the case. MOF I'm hitting one now with aix and there, it isn't supposed to happen at all!...

Kevin Closson said...

when booted numa=off, what does numactl -hardware show just before you attempt to boot the instance?

Anonymous said...

I'm sure everyone knows more about this than me, and I know it's a different OS and patch, but see Note:429872.1

Jeff Hunter said...
This comment has been removed by the author.
Jeff Hunter said...

I'll have to check my numctl settings when I can get the instance down. But if I can allocate a 16G SGA with 10.2.0.3, what is different in 10.2.0.4 that would point me to some setting being off? I'd almost believe that I did something wrong in the install process, but this happened on three separate boxes.

Kevin Closson said...

there is a lot different from 10.2.0.3 to 10.2.0.4 in this area so I'll hold speculation until I see the output of numactl -hardware just before you try to boot the instance.

Anonymous said...

Try setting huge pages as in Note: 361323.1?

Anonymous said...
This comment has been removed by a blog administrator.
Bonobo said...
This comment has been removed by a blog administrator.
Christo said...

Could it be the limitation parameters?

http://www.pythian.com/blogs/245/the-mysterious-world-of-shmmax-and-shmall