Wednesday, July 02, 2008

The moment you've all been waiting for...

No, no, not the Brangelina twins announcement, the numactl output.

$ ps -ef | grep oracle

oracle 13771 2418 0 Jun24 pts/0 00:00:00 -ksh
oracle 13772 13771 0 09:24 pts/0 00:00:00 ksh -i
oracle 13775 13772 0 09:24 pts/0 00:00:00 ps -ef
oracle 13776 13772 0 09:24 pts/0 00:00:00 grep oracle

$ numactl --hardware

available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 5725 MB
node 2 size: 8191 MB
node 2 free: 6757 MB
node 3 size: 8191 MB
node 3 free: 6671 MB

OK, so maybe not everybody was waiting for that. Oracle Support requested an strace of the startup command, so I had to bring the db down anyway. The strace was a good idea, they'll be able to see the system calls being made and such. Maybe we'll get some progress yet...

If you're following along at home: Part I, Part II, Part III

More info from the comments:

I shutdown everything again:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 4846 MB
node 2 size: 8191 MB
node 2 free: 6744 MB
node 3 size: 8191 MB
node 3 free: 6646 MB

No surprises there (except, of course, for the strangeness KC pointed out before). Then I decided to startup nomount with a 10.2.0.3 $OH and a 16g SGA:

available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7660 MB
node 1 size: 8191 MB
node 1 free: 4315 MB
node 2 size: 8191 MB
node 2 free: 6708 MB
node 3 size: 8191 MB
node 3 free: 6646 MB

Very unexpected. It looks like whatever this is showing is not affected by a 16g SGA.

I changed my $OH back to 10.2.0.4 and started a 12g SGA:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 25 MB
node 1 size: 8191 MB
node 1 free: 257 MB
node 2 size: 8191 MB
node 2 free: 6740 MB
node 3 size: 8191 MB
node 3 free: 6642 MB

This is kind of more what I expected. You see a 12g SGA is basically leaving nodes 0 and 1 free at next to nothing. If we believe the previous output, that's my 12g of RAM?

Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.

3 comments:

Anonymous said...

But, Jeff, I thought you were trying to get this thing to boot in non-NUMA mode. This is 4 node NUMA (Opteron style).

I'm curious about why node 1 has 10239MB and the rest have 8191MB. I've never had much happiness with lopsided Opteron configs, but that is just one observation. I find it very odd that before you boot any Oracle your node 1 is down to ~5700MB free.

Here is what I would do. Snapshot this numactl output, boot with a 12GB SGA run numactl again and look at the diff just to learn how 10.2.0.4 is plopping down its memory. There are NUMA apis for memory placement, which makes it easy to intuit where the block buffers go, but where is it settling down the variable SGA component...if it mistakingly lopsides to node 1 in your case you are toast when trying to boot 16GB. Right now that would plop 4GB on node 1 and *if* it mistakingly allocates the variable SGA components on node1 you will run out of memory there.

Please forward me the strace to my email cited in the contact section of my blog.

Sure wish I still had my 32GB 585...this would no longer be a mystery.

For others wishing to follow along, I'd recommend my series on Oracle on Opteron at:

http://kevinclosson.wordpress.com/kevin-closson-index/oracle-on-opteron-k8l-numa-etc/

Unknown said...
This comment has been removed by a blog administrator.
Anonymous said...

It is doable. RHEL 5.2

sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 7 10:03:35 2008
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup pfile=create1.ora
ORACLE instance started.
Total System Global Area 1.7700E+10 bytes
Fixed Size 2115104 bytes
Variable Size 503319008 bytes
Database Buffers 1.7180E+10 bytes
Redo Buffers 14659584 bytes
Database mounted.
Database opened.
SQL> quit

$ numactl --hardware
available: 4 nodes (0-3)
node 0 size: 7906 MB
node 0 free: 2025 MB
node 1 size: 8080 MB
node 1 free: 3920 MB
node 2 size: 8080 MB
node 2 free: 3969 MB
node 3 size: 8080 MB
node 3 free: 3926 MB
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10