Maybe I'm old school.
Maybe I'm just behind the times.
Today's crop of DBA candidates sure know how to point and click their way through Oracle. When I ask them how to tune a SQL statement they'll give me the exact instructions on how to click on some Grid Control adviser or how to get an "explain plan" through TOAD. I guess the new query plan is called "explain plan" now because it's easier to understand.
Recently I asked one candidate, "What if grid control goes away, how do you tune a query?"
"Well, first you start the grid control agent, and..."
"No, no, I mean, if there was no such thing as grid control, how would you tune a query?"
Crickets....
"Um, SQL*Plus?"
Yes, good old SQL*Plus.
I used to believe that GUIs are icing on the cake and as long as you know what was going on under the covers, it was perfectly acceptable to use a GUI.
Maybe I'm just old school.
Thursday, February 12, 2009
Tuesday, January 27, 2009
Vista VPN Vacillation
I tried setting up the NCO Secure Entry Client on Vista x64, but just couldn't get it to work. Before I found that VPN Software, I contacted Dell and they sent me out a Vista x32 installation CD that I could use to re-install my OS if it came to that. I messed around with x64 and NCO Secure Entry Client until the install disk came and then I punted. The Vista install took about an hour or so and the VPN setup took another five minutes to setup and test. It came up fine the very first time.
Maybe Cisco will get with the program by the time I'm ready to get a new machine. I wonder if that extra 2G of RAM will fit in my W2K box...
Maybe Cisco will get with the program by the time I'm ready to get a new machine. I wonder if that extra 2G of RAM will fit in my W2K box...
Thursday, January 22, 2009
When the Latest and Greatest isn't the Greatest
I thought my Windows 2000 computer was starting to show it's age. 512M of RAM and a 200G hard drive just doesn't go as far as it used to. I setup a Vista laptop a few months ago and it's working pretty good, so figured I'd take the plunge and get 6GB of RAM and get Vista x64 to take full advantage of it. Everything was going fine (with the normal Vista quirks, of course) until I tried to install the Cisco VPN client. Seems as though there is no support for Vista64 and my particular VPN software.
Dandy.
I found NCO Secure Entry Client and will try to set it up, but we'll see...
Dandy.
I found NCO Secure Entry Client and will try to set it up, but we'll see...
Tuesday, January 13, 2009
Adobe Automation
I have a friend who needs some Adobe Acrobat automation. The project description is:
If anybody has this skill set and is interested, please email me at marist89 (at) excite %dot% com and I will forward to the appropriate party.
We need to find a programmer who is willing to do some high-end Adobe form work for us.
We need to embed a "send to" button in some of our company forms so we can email the form to clients who can input the data and submit the form back to us by clicking on a submit button on the form.
If anybody has this skill set and is interested, please email me at marist89 (at) excite %dot% com and I will forward to the appropriate party.
Wednesday, December 31, 2008
Did I get my money worth?
At the end of November I submitted a Purchase Order for my 2009 Annual Oracle Support. I passed the six figure mark several years ago and am well on my way to topping the next hundred next year. Here is my report from Metalink of the number of incidences and close times for the year:
2. Close Rates (closed calls only)
That doesn't count the 5 TARs I still have open; one from May. 84% of my TARs are closed in four weeks or more. Let me say that again; FOUR WEEKS OR MORE. Each TAR cost me approximately $2500.
So, Oracle Community, did I get adequate return on my investment?
(Oh, and BTW, I love how my cumulative total adds up to 100.02%)
2. Close Rates (closed calls only)
Time | Incidents | % | Cumulative Total |
4 h | 1 | 1.41 | 1.41 |
12 h | 1 | 1.41 | 2.82 |
24 h | 0 | 0 | 2.82 |
48 h | 3 | 4.23 | 7.05 |
72 h | 1 | 1.41 | 8.46 |
1 Week | 2 | 2.82 | 11.28 |
2 Weeks | 3 | 4.23 | 15.51 |
4 Weeks | 40 | 56.34 | 71.85 |
>4 Weeks | 20 | 28.17 | 100.02 |
Total Incidents | 71 |
That doesn't count the 5 TARs I still have open; one from May. 84% of my TARs are closed in four weeks or more. Let me say that again; FOUR WEEKS OR MORE. Each TAR cost me approximately $2500.
So, Oracle Community, did I get adequate return on my investment?
(Oh, and BTW, I love how my cumulative total adds up to 100.02%)
Monday, December 29, 2008
The Bug, part II
AH HA! you guys took the bait!
Granted, this is a bug in the particular JDBC Driver I am using. The 10.2.0.2 driver straight out of the box has bug 4711863 that can possibly insert bad data when you use Java 1.5. In fact, I proved it with a scenario similar to:
Then, my java program sucks data from tableA and inserts it into tableB. Inside my java program, I convert a Double to BigDecimal using code like:
I query the data from tableB and boy am I surprised:
So maybe it's a sqlplus setting or something. I try:
Same deal.
I go back and verify my source data from tableA, and sure enough it's correct:
OK, so JDBC isn't 100% correct when you cast between BigDecimal and Double. Big whoop. And in fact, the fix for the aforementioned bug is fixed if you apply the appropriate patch.
My problem is in the fact that Oracle (the database software) let a bad piece of data in. Oracle says the client should type the data when binding. I agree. But I also think that the server shouldn't let non-numeric data into numeric columns, regardless. I contend that this is a bug on the server side as well as a bug on the client side. This is only one specific case of a Java program inserting bad data. I have some tables that have non-numeric characters that were never touched by Java.
Granted, this is a bug in the particular JDBC Driver I am using. The 10.2.0.2 driver straight out of the box has bug 4711863 that can possibly insert bad data when you use Java 1.5. In fact, I proved it with a scenario similar to:
drop table tablea
/
drop table tableb
/
create table tablea (
record_id number(38) primary key,
price number)
/
create table tableb (
record_id number(38),
price number)
/
-- fill it
begin
for i in 1..100 loop
insert into tablea values (i, 0+(i*.0000001));
commit;
end loop;
end;
/
quit
Then, my java program sucks data from tableA and inserts it into tableB. Inside my java program, I convert a Double to BigDecimal using code like:
String insertString = "insert into tableB values (?, ?)";
PreparedStatement stmt2 = this.dbConnection.prepareCall(insertString);
stmt2.setInt(1,rs.getInt("RECORD_ID"));
stmt2.setBigDecimal(2, new BigDecimal(rs.getDouble("PRICE")));
I query the data from tableB and boy am I surprised:
SQL> select * from tableb where price <> price*1;
RECORD_ID PRICE
-------------------- --------------------
2 .00000019RS
3 .00000029RS
4 .00000039RS
5 .00000049RS
6 .00000059RS
7 .00000069RS
8 .00000079RS
9 .00000089RS
10 .00000099RS
So maybe it's a sqlplus setting or something. I try:
SQL> set numwidth 20
SQL> /
RECORD_ID PRICE
-------------------- --------------------
2 .00000019RS
3 .00000029RS
4 .00000039RS
5 .00000049RS
6 .00000059RS
7 .00000069RS
8 .00000079RS
9 .00000089RS
10 .00000099RS
Same deal.
I go back and verify my source data from tableA, and sure enough it's correct:
SQL> select * from tablea where record_id = 2;
RECORD_ID PRICE
--------------- ---------------
2 .0000002
SQL> l
1* select record_id, price, dump(price) dmp from tableb where price <> price * 1
SQL> /
RECORD_ID PRICE DMP
--------- -------------------- ----------------------------------------
2 .00000019RS Typ=2 Len=21: 189,20,246,100,100,100,100
,100,100,100,10,50,63,24,66,18,73,52,74,
72,23
3 .00000029RS Typ=2 Len=21: 189,30,246,100,100,100,100
,100,100,99,65,25,44,36,48,77,59,78,61,5
7,85
4 .00000039RS Typ=2 Len=21: 189,40,246,100,100,100,100
,100,100,99,19,100,25,48,31,36,46,4,48,4
3,46
5 .00000049RS Typ=2 Len=21: 189,50,246,100,100,100,100
,100,100,98,74,75,6,60,13,95,32,30,35,29
,7
6 .00000059RS Typ=2 Len=21: 189,60,246,100,100,100,100
,100,100,98,29,49,87,71,96,54,18,56,22,1
4,69
7 .00000069RS Typ=2 Len=21: 189,70,246,100,100,100,100
,100,100,97,84,24,68,83,79,13,4,82,8,100
,30
8 .00000079RS Typ=2 Len=21: 189,80,246,100,100,100,100
,100,100,97,38,99,49,95,61,71,91,7,95,85
,92
9 .00000089RS Typ=2 Len=21: 189,90,246,100,100,100,100
,100,100,96,93,74,31,7,44,30,77,33,82,71
,53
10 .00000099RS Typ=2 Len=21: 189,100,246,100,100,100,10
0,100,100,96,48,49,12,19,26,89,63,59,69,
57,14
OK, so JDBC isn't 100% correct when you cast between BigDecimal and Double. Big whoop. And in fact, the fix for the aforementioned bug is fixed if you apply the appropriate patch.
My problem is in the fact that Oracle (the database software) let a bad piece of data in. Oracle says the client should type the data when binding. I agree. But I also think that the server shouldn't let non-numeric data into numeric columns, regardless. I contend that this is a bug on the server side as well as a bug on the client side. This is only one specific case of a Java program inserting bad data. I have some tables that have non-numeric characters that were never touched by Java.
Monday, December 22, 2008
Where is the bug?
So, if I have a table that has a column defined as NUMBER(10) and try to insert the value of 'ABCDEFG' in it, I get an "ORA-01858: a non-numeric character was found where a numeric was expected" error. Fair enough, I expect that the database does type checking and won't allow an invalid number in the field.
In addition, if I try to insert an 11 digit number, I get an "ORA-01438: value larger than specified precision allowed for this column" error.
So far so good.
If an invalid number, such as "-0" or "17~:49" got into the database via JDBC, would it be a client problem, a server problem, or both?
In addition, if I try to insert an 11 digit number, I get an "ORA-01438: value larger than specified precision allowed for this column" error.
So far so good.
If an invalid number, such as "-0" or "17~:49" got into the database via JDBC, would it be a client problem, a server problem, or both?
Friday, December 19, 2008
11g
I was asked the other day when I was going to start writing about 11g.
Quite frankly, I literally just got everything off 9i last weekend. The final thorn in my side of Oracle Applications was started at the close of business on Friday and lingered on to the wee hours of Saturday morning. It took about 18 months and my standard install of 10g has 23 patches applied, but finally I have a consistent environment on 10g.
And lets face it, 10g is a pretty darn good product once it's patched. The combination of 10g and the Linux 2.6 kernel is really a rock solid platform with a laundry list of features that you'll probably never need.
I'm keeping abreast of what's new in 11g and I haven't really seen anything compelling that piques my curiosity enough to even install it. Maybe my opinion will change when the calendar flips over and we have some of that mythical "free" time, but for now, I'll be sticking to 10g topics.
Quite frankly, I literally just got everything off 9i last weekend. The final thorn in my side of Oracle Applications was started at the close of business on Friday and lingered on to the wee hours of Saturday morning. It took about 18 months and my standard install of 10g has 23 patches applied, but finally I have a consistent environment on 10g.
And lets face it, 10g is a pretty darn good product once it's patched. The combination of 10g and the Linux 2.6 kernel is really a rock solid platform with a laundry list of features that you'll probably never need.
I'm keeping abreast of what's new in 11g and I haven't really seen anything compelling that piques my curiosity enough to even install it. Maybe my opinion will change when the calendar flips over and we have some of that mythical "free" time, but for now, I'll be sticking to 10g topics.
Wednesday, December 17, 2008
Phone Interview tips for DBAs
You only get one chance to make a first impression. I would think the following tips would be common sense, yet I am surprised every day.
- When on a phone interview, don't put the interviewer on hold.
- I don't mind calling you on your cell, but if the call drops, you're done.
- I know you might be nervous, but don't cut me off in mid-sentence.
- When you take my call, don't be driving in a car.
- I don't expect you to pick up the phone on the first ring, but please pick it up. If I get your Voicemail, I'll try you again in five minutes. After that, you're on your own.
Thursday, December 04, 2008
Database Administrator
Job Description
Interactive Brokers Group, industry leader in direct access brokerage, is seeking a motivated Database Administrator to provide day-to-day production support for our highly automated mission critical systems. This position requires interacting with very technical users, big-picture analysis, and acute problem solving skills.
Required Skills
Desired Skills
Job Code: JH-DA-CT
Interactive Brokers Group LLC
Apply Now
Interactive Brokers Group, industry leader in direct access brokerage, is seeking a motivated Database Administrator to provide day-to-day production support for our highly automated mission critical systems. This position requires interacting with very technical users, big-picture analysis, and acute problem solving skills.
Required Skills
- 3+ years experience as Oracle DBA in a production environment.
- Linux and/or Solaris
- Day-to-day Database Administration.
- Backup and Recovery using Recovery Manager.
- Oracle 10g (at least 2 years)
- Ability to quickly diagnose and resolve performance using standard tools. (explain plan, OWI, Profiler, Statspack)
- Some experience with partitioning
- Self-starter with initiative and desire to learn
- Ability to work quickly in a fast-paced environment.
- Rotating on-duty coverage is expected.
- Ability to communicate clearly, constructively and concisely to peers and end-users.
- Ability to work independently with minimal supervision and drive issues to closure.
Desired Skills
- Experience with Oracle Applications R12
- Automation of DBA tasks (Perl, CGI, Ksh)
- Some experience in 11g
- DR Strategies including Standby Database
- Financial Industry Background
- Oracle Connection Manager
Job Code: JH-DA-CT
Interactive Brokers Group LLC
Apply Now
Tuesday, October 14, 2008
v$session.audsid
I am working on a project that requires me to get the name of the program during the logon process to the database.
"What an ideal place for a logon trigger!", I said to myself.
I thought the task would be quite simple, just get the SYS_CONTEXT('USERENV','SESSIONID') and use that to query v$session. I know that v$session.audsid is not unique as the background processes are "0" while other processes have a number that is unique for that moment in time. Or so I thought.
I tested my logon trigger like crazy in the development enviornment. Then I put it in the QA environment for a few days to make sure it was OK. No problems whatsoever during development or QA.
The first day I put my logon trigger in production, sure enough it failed. It threw a normal old PL/SQL error that indicated two rows were returned where a single row was expected.
Hmmm, that didn't happen in development.
After about 10 days of working on it here and there, I found that the threads of an RMAN backup all had the same v$session.audsid.
While just some monitoring stuff was running:
system@jh01.dev> l
1 select sid, serial#, username, program, audsid
2* from v$session where machine = 'barney'
SID SERIAL# USERNAME PROGRAM AUDSID
---------- ---------- ------------ ---------------------------------------- ----------
170 1 oracle@barney (PMON) 0
159 1 oracle@barney (CJQ0) 0
169 1 oracle@barney (PSP0) 0
168 1 oracle@barney (MMAN) 0
167 1 oracle@barney (DBW0) 0
166 1 oracle@barney (DBW1) 0
165 1 oracle@barney (DBW2) 0
164 1 oracle@barney (DBW3) 0
152 1 oracle@barney (ARC1) 0
153 1 oracle@barney (ARC0) 0
157 1 oracle@barney (MMNL) 0
158 1 oracle@barney (MMON) 0
160 1 oracle@barney (RECO) 0
162 1 oracle@barney (CKPT) 0
163 1 oracle@barney (LGWR) 0
161 1 oracle@barney (SMON) 0
139 65 JEFFH perl@barney (TNS V1-V3) 1600567
145 139 JEFFH perl@barney (TNS V1-V3) 1600566
143 47 JEFFH perl@barney (TNS V1-V3) 1600572
146 2 oracle@barney (q000) 0
149 1 oracle@barney (QMNC) 0
156 6 oracle@barney (q001) 0
While RMAN was running with two threads:
system@jh01.dev> /
SID SERIAL# USERNAME PROGRAM AUDSID
---------- ---------- ------------ ---------------------------------------- ----------
95 3229 SYS rman@barney (TNS V1-V3) 4294967295
170 1 oracle@barney (PMON) 0
158 1 oracle@barney (MMON) 0
169 1 oracle@barney (PSP0) 0
168 1 oracle@barney (MMAN) 0
167 1 oracle@barney (DBW0) 0
166 1 oracle@barney (DBW1) 0
165 1 oracle@barney (DBW2) 0
164 1 oracle@barney (DBW3) 0
163 1 oracle@barney (LGWR) 0
152 1 oracle@barney (ARC1) 0
153 1 oracle@barney (ARC0) 0
157 1 oracle@barney (MMNL) 0
159 1 oracle@barney (CJQ0) 0
160 1 oracle@barney (RECO) 0
162 1 oracle@barney (CKPT) 0
161 1 oracle@barney (SMON) 0
118 30350 SYS rman@barney (TNS V1-V3) 4294967295
145 139 JEFFH perl@barney (TNS V1-V3) 1600566
143 47 JEFFH perl@barney (TNS V1-V3) 1600572
139 65 JEFFH perl@barney (TNS V1-V3) 1600567
146 2 oracle@barney (q000) 0
149 1 oracle@barney (QMNC) 0
156 6 oracle@barney (q001) 0
Interesting?
"What an ideal place for a logon trigger!", I said to myself.
I thought the task would be quite simple, just get the SYS_CONTEXT('USERENV','SESSIONID') and use that to query v$session. I know that v$session.audsid is not unique as the background processes are "0" while other processes have a number that is unique for that moment in time. Or so I thought.
I tested my logon trigger like crazy in the development enviornment. Then I put it in the QA environment for a few days to make sure it was OK. No problems whatsoever during development or QA.
The first day I put my logon trigger in production, sure enough it failed. It threw a normal old PL/SQL error that indicated two rows were returned where a single row was expected.
Hmmm, that didn't happen in development.
After about 10 days of working on it here and there, I found that the threads of an RMAN backup all had the same v$session.audsid.
While just some monitoring stuff was running:
system@jh01.dev> l
1 select sid, serial#, username, program, audsid
2* from v$session where machine = 'barney'
SID SERIAL# USERNAME PROGRAM AUDSID
---------- ---------- ------------ ---------------------------------------- ----------
170 1 oracle@barney (PMON) 0
159 1 oracle@barney (CJQ0) 0
169 1 oracle@barney (PSP0) 0
168 1 oracle@barney (MMAN) 0
167 1 oracle@barney (DBW0) 0
166 1 oracle@barney (DBW1) 0
165 1 oracle@barney (DBW2) 0
164 1 oracle@barney (DBW3) 0
152 1 oracle@barney (ARC1) 0
153 1 oracle@barney (ARC0) 0
157 1 oracle@barney (MMNL) 0
158 1 oracle@barney (MMON) 0
160 1 oracle@barney (RECO) 0
162 1 oracle@barney (CKPT) 0
163 1 oracle@barney (LGWR) 0
161 1 oracle@barney (SMON) 0
139 65 JEFFH perl@barney (TNS V1-V3) 1600567
145 139 JEFFH perl@barney (TNS V1-V3) 1600566
143 47 JEFFH perl@barney (TNS V1-V3) 1600572
146 2 oracle@barney (q000) 0
149 1 oracle@barney (QMNC) 0
156 6 oracle@barney (q001) 0
While RMAN was running with two threads:
system@jh01.dev> /
SID SERIAL# USERNAME PROGRAM AUDSID
---------- ---------- ------------ ---------------------------------------- ----------
95 3229 SYS rman@barney (TNS V1-V3) 4294967295
170 1 oracle@barney (PMON) 0
158 1 oracle@barney (MMON) 0
169 1 oracle@barney (PSP0) 0
168 1 oracle@barney (MMAN) 0
167 1 oracle@barney (DBW0) 0
166 1 oracle@barney (DBW1) 0
165 1 oracle@barney (DBW2) 0
164 1 oracle@barney (DBW3) 0
163 1 oracle@barney (LGWR) 0
152 1 oracle@barney (ARC1) 0
153 1 oracle@barney (ARC0) 0
157 1 oracle@barney (MMNL) 0
159 1 oracle@barney (CJQ0) 0
160 1 oracle@barney (RECO) 0
162 1 oracle@barney (CKPT) 0
161 1 oracle@barney (SMON) 0
118 30350 SYS rman@barney (TNS V1-V3) 4294967295
145 139 JEFFH perl@barney (TNS V1-V3) 1600566
143 47 JEFFH perl@barney (TNS V1-V3) 1600572
139 65 JEFFH perl@barney (TNS V1-V3) 1600567
146 2 oracle@barney (q000) 0
149 1 oracle@barney (QMNC) 0
156 6 oracle@barney (q001) 0
Interesting?
Tuesday, July 29, 2008
10.2.0.4 Post Mortem
In the end, my 10.2.0.4 SGA problem ended up being the _db_block_cache_protect parameter. Seems as though setting that parameter in 10.2.0.4 maps the SGA to /tmp/shm instead of real memory. The immediate cause of my:
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device
was that I didn't have enough space mounted at /tmp/shm. When I did allocate enough space, I got an ORA-07445 [skgmfixup_scaffolding()+129].
Anybody that deals with Oracle Support knows that when you get a different error message, that's essentially the kiss of death for your TAR. True to form, a new TAR was created with my new error message and the attempts to close the original TAR started. In the end, they would rather provide a workaround (don't use _db_block_cache_protect, 10.2 is much more stable) than a solution.
Unfortunately, the only help I can provide you is to not use _db_block_cache_protect in 10.2.0.4 with big SGAs.
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device
was that I didn't have enough space mounted at /tmp/shm. When I did allocate enough space, I got an ORA-07445 [skgmfixup_scaffolding()+129].
Anybody that deals with Oracle Support knows that when you get a different error message, that's essentially the kiss of death for your TAR. True to form, a new TAR was created with my new error message and the attempts to close the original TAR started. In the end, they would rather provide a workaround (don't use _db_block_cache_protect, 10.2 is much more stable) than a solution.
Unfortunately, the only help I can provide you is to not use _db_block_cache_protect in 10.2.0.4 with big SGAs.
Tuesday, July 22, 2008
The 10.2.0.4 16g Solution
If you've been following along, I've had quite a tough time with 16GB SGAs on 10.2.0.4. With help from Kevin Closson and Oracle Support, we found that the init.ora parameters "_db_block_cache_protect=true", "_db_block_check_for_debug=true", and "db_block_checking=true" cause 10.2.0.4 to not be able to allocate a 16G SGA. Not a problem in 10.2.0.3 mind you, but definitely a problem in 10.2.0.4.
In the end, it wasn't NUMA after all.
Click the 10.2.0.4 label below to follow all threads on this issue.
In the end, it wasn't NUMA after all.
Click the 10.2.0.4 label below to follow all threads on this issue.
Tuesday, July 15, 2008
The great NUMA debate, part V
There was some questions whether I turned NUMA off or not one one of my previous tests. I repeated the test with similar results.
If I looked at my /proc/cmdline, I got:
ro root=LABEL=/ console=ttyS1 pci=nommconf elevator=deadline selinux=0 numa=off
Then, I looked at my boot messages and saw:
NUMA turned off
Lastly, my numactl --hardware showed:
available: 1 nodes (0-0)
node 0 size: 34815 MB
node 0 free: 31507 MB
I still couldn't start a 16GB SGA. Interestingly enough, I couldn't start a 4G SGA either! I had to go back to booting without numa=off. The saga continues...
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
If I looked at my /proc/cmdline, I got:
ro root=LABEL=/ console=ttyS1 pci=nommconf elevator=deadline selinux=0 numa=off
Then, I looked at my boot messages and saw:
NUMA turned off
Lastly, my numactl --hardware showed:
available: 1 nodes (0-0)
node 0 size: 34815 MB
node 0 free: 31507 MB
I still couldn't start a 16GB SGA. Interestingly enough, I couldn't start a 4G SGA either! I had to go back to booting without numa=off. The saga continues...
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Wednesday, July 02, 2008
The moment you've all been waiting for...
No, no, not the Brangelina twins announcement, the numactl output.
$ ps -ef | grep oracle
oracle 13771 2418 0 Jun24 pts/0 00:00:00 -ksh
oracle 13772 13771 0 09:24 pts/0 00:00:00 ksh -i
oracle 13775 13772 0 09:24 pts/0 00:00:00 ps -ef
oracle 13776 13772 0 09:24 pts/0 00:00:00 grep oracle
$ numactl --hardware
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 5725 MB
node 2 size: 8191 MB
node 2 free: 6757 MB
node 3 size: 8191 MB
node 3 free: 6671 MB
OK, so maybe not everybody was waiting for that. Oracle Support requested an strace of the startup command, so I had to bring the db down anyway. The strace was a good idea, they'll be able to see the system calls being made and such. Maybe we'll get some progress yet...
If you're following along at home: Part I, Part II, Part III
More info from the comments:
I shutdown everything again:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 4846 MB
node 2 size: 8191 MB
node 2 free: 6744 MB
node 3 size: 8191 MB
node 3 free: 6646 MB
No surprises there (except, of course, for the strangeness KC pointed out before). Then I decided to startup nomount with a 10.2.0.3 $OH and a 16g SGA:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7660 MB
node 1 size: 8191 MB
node 1 free: 4315 MB
node 2 size: 8191 MB
node 2 free: 6708 MB
node 3 size: 8191 MB
node 3 free: 6646 MB
Very unexpected. It looks like whatever this is showing is not affected by a 16g SGA.
I changed my $OH back to 10.2.0.4 and started a 12g SGA:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 25 MB
node 1 size: 8191 MB
node 1 free: 257 MB
node 2 size: 8191 MB
node 2 free: 6740 MB
node 3 size: 8191 MB
node 3 free: 6642 MB
This is kind of more what I expected. You see a 12g SGA is basically leaving nodes 0 and 1 free at next to nothing. If we believe the previous output, that's my 12g of RAM?
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
$ ps -ef | grep oracle
oracle 13771 2418 0 Jun24 pts/0 00:00:00 -ksh
oracle 13772 13771 0 09:24 pts/0 00:00:00 ksh -i
oracle 13775 13772 0 09:24 pts/0 00:00:00 ps -ef
oracle 13776 13772 0 09:24 pts/0 00:00:00 grep oracle
$ numactl --hardware
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 5725 MB
node 2 size: 8191 MB
node 2 free: 6757 MB
node 3 size: 8191 MB
node 3 free: 6671 MB
OK, so maybe not everybody was waiting for that. Oracle Support requested an strace of the startup command, so I had to bring the db down anyway. The strace was a good idea, they'll be able to see the system calls being made and such. Maybe we'll get some progress yet...
If you're following along at home: Part I, Part II, Part III
More info from the comments:
I shutdown everything again:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7854 MB
node 1 size: 8191 MB
node 1 free: 4846 MB
node 2 size: 8191 MB
node 2 free: 6744 MB
node 3 size: 8191 MB
node 3 free: 6646 MB
No surprises there (except, of course, for the strangeness KC pointed out before). Then I decided to startup nomount with a 10.2.0.3 $OH and a 16g SGA:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 7660 MB
node 1 size: 8191 MB
node 1 free: 4315 MB
node 2 size: 8191 MB
node 2 free: 6708 MB
node 3 size: 8191 MB
node 3 free: 6646 MB
Very unexpected. It looks like whatever this is showing is not affected by a 16g SGA.
I changed my $OH back to 10.2.0.4 and started a 12g SGA:
available: 4 nodes (0-3)
node 0 size: 10239 MB
node 0 free: 25 MB
node 1 size: 8191 MB
node 1 free: 257 MB
node 2 size: 8191 MB
node 2 free: 6740 MB
node 3 size: 8191 MB
node 3 free: 6642 MB
This is kind of more what I expected. You see a 12g SGA is basically leaving nodes 0 and 1 free at next to nothing. If we believe the previous output, that's my 12g of RAM?
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Tuesday, June 24, 2008
Part III
I escalated my issue to a duty manager and explained that in my environment those types of changes are very difficult. He kicked it back to development for alternate solutions.
In the background, I managed to secure a test box for a couple hours that I could play with. While researching how to turn numa=off, I ran across this on Redhat's site:
Hmm, that's fishy. Why would they ask me to turn it off if the default value is off? So I did it anyway.
Same thing, good old ORA-27102: out of memory.
As long as I had the box booted with numa=off, I might as well try _enable_numa_optimization=false, right? No dice, same error.
Last, but not least, I tried their suggestion of setting vm.nr_hugepages=0 in the sysctl.conf. But I found out that's already my value:
sysctl -a | grep vm.nr_huge
vm.nr_hugepages = 0
OK, so now what?
Maybe 11g is an option...
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
In the background, I managed to secure a test box for a couple hours that I could play with. While researching how to turn numa=off, I ran across this on Redhat's site:
The Red Hat Enterprise Linux 4 Update 1 kernel automatically disables NUMA optimizations (numa=off) by default on systems using the AMD64 dual core processor. This ensures stable operation across multiple systems where each has a different system BIOS implementation for reporting dual core processors.
Hmm, that's fishy. Why would they ask me to turn it off if the default value is off? So I did it anyway.
Same thing, good old ORA-27102: out of memory.
As long as I had the box booted with numa=off, I might as well try _enable_numa_optimization=false, right? No dice, same error.
Last, but not least, I tried their suggestion of setting vm.nr_hugepages=0 in the sysctl.conf. But I found out that's already my value:
sysctl -a | grep vm.nr_huge
vm.nr_hugepages = 0
OK, so now what?
Maybe 11g is an option...
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Monday, June 23, 2008
Something to be aware of, Part II
We've passed the one month anniversary for Something to be aware of (10.2.0.4) and still no meaningful results. I've been going back and forth with Oracle Support about switching NUMA off. In a synopsis:
My next step is talking to a Duty manager. I really don't want to do that.
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
OS: Switch NUMA off and retest.
Me: But that never changed.
OS: Switch NUMA off and retest.
Me: We take kernel changes very seriously. It will take approximately 2 months to re-certify that configuration. Besides, that NUMA didn't change.
OS: Switch NUMA off and retest.
Me: Please direct me to the note that says 10.2.0.4 requires NUMA changes.
OS: Switch NUMA off and retest.
Me: I've already proven 10.2.0.3 works as expected in this configuration, NUMA settings never changed.
OS: Switch NUMA off and retest.
My next step is talking to a Duty manager. I really don't want to do that.
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Thursday, June 19, 2008
Backing up at Home
I'm pretty particular about my backups at work. In fact, we have a setup where if a nightly backup fails, we get a page in the middle of the night so we can address the problem.
At home, it's a different story. I've got a Windows 2000 PC that I haul out the CD-R's a couple times a year and backup what I think are the important files. At first it wasn't so bad, a couple CD's and I was done. But pictures from a digital camera tend to eat up space at a pretty decent rate. The last backup I did was 18 CD-R's.
About the same time, I got the opportunity to help someone with their own backup issues. They were getting ready to retire a computer and wanted a way to move about 60G of data from one machine to another. So I investigated some options and found a nice 320G SimpleTech USB drive at Costco. We setup the drive and the ArcSoft TotalMedia Backup in about 20 minutes and completed a test backup in another hour or so.
I was so impressed, that I bought one myself. For $89, I just couldn't go wrong.
So far, my impression is TotalMedia can get the basics done. However, as a backup professional I need the ability to tweak things. I'm still doing a backup every day, but looking at other options. Stay tuned for updates.
At home, it's a different story. I've got a Windows 2000 PC that I haul out the CD-R's a couple times a year and backup what I think are the important files. At first it wasn't so bad, a couple CD's and I was done. But pictures from a digital camera tend to eat up space at a pretty decent rate. The last backup I did was 18 CD-R's.
About the same time, I got the opportunity to help someone with their own backup issues. They were getting ready to retire a computer and wanted a way to move about 60G of data from one machine to another. So I investigated some options and found a nice 320G SimpleTech USB drive at Costco. We setup the drive and the ArcSoft TotalMedia Backup in about 20 minutes and completed a test backup in another hour or so.
I was so impressed, that I bought one myself. For $89, I just couldn't go wrong.
So far, my impression is TotalMedia can get the basics done. However, as a backup professional I need the ability to tweak things. I'm still doing a backup every day, but looking at other options. Stay tuned for updates.
Monday, June 16, 2008
Microsoft Vista Oy Vey!
I needed a new laptop for my home. OK, maybe "need" is not the correct word, but you get my picture.
So I did what I always do. I navigated to dell.com and purchased a Inspiron 1525 with a 15" screen, 4G of RAM and Microsoft Windows Vista Home. I went through all the normal setup things and had it on my wireless network in about 15 minutes.
So far, so good. Then I tried to add a network printer [Accept or Deny] that was attached [Accept or Deny] to my W2K desktop. Since W2K doesn't serve the drivers like XP, I had to install the drivers [Accept or Deny] for my printer before I could print. Then I had to tweak my firewall settings [Accept or Deny] so that I could print to my network printer. I used to think that the Apple Commercial was exaggerating, but it's not.
Then, to add insult to injury, I tried to print a document. First I had to activate Microsoft Word, which was simple. I typed in some text and...
Wait a second, where's the "save" button...
And what happened to the "File" menu...
Arg, even "Help" isn't on the menu any more.
I finally figured out that you have to click that stupid looking button at the top before you can "Save", "Print", "Open" or do anything you are used to.
Great way to enhance my productivity. Now it takes me 5 minutes to open and print a document instead of two mouse clicks. Word 2007 is a step backwards.
So I did what I always do. I navigated to dell.com and purchased a Inspiron 1525 with a 15" screen, 4G of RAM and Microsoft Windows Vista Home. I went through all the normal setup things and had it on my wireless network in about 15 minutes.
So far, so good. Then I tried to add a network printer [Accept or Deny] that was attached [Accept or Deny] to my W2K desktop. Since W2K doesn't serve the drivers like XP, I had to install the drivers [Accept or Deny] for my printer before I could print. Then I had to tweak my firewall settings [Accept or Deny] so that I could print to my network printer. I used to think that the Apple Commercial was exaggerating, but it's not.
Then, to add insult to injury, I tried to print a document. First I had to activate Microsoft Word, which was simple. I typed in some text and...
Wait a second, where's the "save" button...
And what happened to the "File" menu...
Arg, even "Help" isn't on the menu any more.
I finally figured out that you have to click that stupid looking button at the top before you can "Save", "Print", "Open" or do anything you are used to.
Great way to enhance my productivity. Now it takes me 5 minutes to open and print a document instead of two mouse clicks. Word 2007 is a step backwards.
Tuesday, May 13, 2008
Something to be aware of (10.2.0.4)
My standard install of 10.2.0.3 included 23 patches at last count. When the 10.2.0.4 patchset came out a few weeks ago I checked it and decided that since my 23 patches were included in 10.2.0.4, it would be a good idea to upgrade to that version.
I first got underway in my development environment on x86_64. I have several small dbs with 2G SGAs on a single box. The patch/upgrade process went flawless and I encountered no issues at all during the upgrade.
I usually like to run the patch in development for a month or so before I apply it to my production enviornment. However, I was fighting a nasty bug on one of my 10.2.0.3 production dbs that was fixed in 10.2.0.4 but a backport wasn't available to 10.2.0.3 at the time. So after a week in development, I decided to upgrade that production db to 10.2.0.4. That db usually runs with a 16G SGA and when I tried to start the instance with the 10.2.0.4 software, I got the venerable:
I first got underway in my development environment on x86_64. I have several small dbs with 2G SGAs on a single box. The patch/upgrade process went flawless and I encountered no issues at all during the upgrade.
I usually like to run the patch in development for a month or so before I apply it to my production enviornment. However, I was fighting a nasty bug on one of my 10.2.0.3 production dbs that was fixed in 10.2.0.4 but a backport wasn't available to 10.2.0.3 at the time. So after a week in development, I decided to upgrade that production db to 10.2.0.4. That db usually runs with a 16G SGA and when I tried to start the instance with the 10.2.0.4 software, I got the venerable:
SQL> startup
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device
OK, so I know that usually means that my kernel parameters are off somewhere. It was a Saturday and I didn't have a sysadmin available, so I bumped the SGA down to 12G and the instance started right away and upgraded fine. I figured there was some shared memory parameter that I would research on Monday and we'd be back in business the next week.
I looked at Note 301830.1 thinking maybe my shmall paramter was off, but sure enough I had enough configured. I submitted a TAR and basically let Oracle Support stew on it for a few days.
In the meantime, another DBA in my group got the task to setup a new x86_64 box with 10.2.0.4 so we can move a db. He got the software installed OK, but couldn't start an instance with an SGA of more than about 2G of memory. The kernel parameters were exactly the same as my other box, but he still couldn't start an instnace with any decent amount of memory.
This was just too co-incidental. So we played around with the 10.2.0.4 installation a little more and found that 10.2.0.4 was allocating multiple shared memory segments instead of just one big segment. I used ipcs to find my shmid for the SGA and then used pmap to find out that to allocate a 2G SGA, 10.2.0.4 used about 260 shared memory segments of between 15M and 5M.
Then we installed 10.2.0.3 and tried to start with the SAME EXACT parameter file, and sure enough, one shared memory segment of 2G. In fact, we could allocate almost all the way up to our shmall in one segment.
I'm bug 7016155 and I know others have run into this problem as well. I'm sure bug 7019967 think's he's alone in this, but he's not. As usual, my bug has been sitting out there for about 12 days and nobody has looked at it.
Part II
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Then we installed 10.2.0.3 and tried to start with the SAME EXACT parameter file, and sure enough, one shared memory segment of 2G. In fact, we could allocate almost all the way up to our shmall in one segment.
I'm bug 7016155 and I know others have run into this problem as well. I'm sure bug 7019967 think's he's alone in this, but he's not. As usual, my bug has been sitting out there for about 12 days and nobody has looked at it.
Part II
Click the 10.2.0.4 label below to follow all threads on this issue and the eventual solution.
Subscribe to:
Posts (Atom)