So What Co-operative: Funky redo behaviour in 10g

Thursday, December 29, 2005

Funky redo behaviour in 10g

I'll admit it, I don't have that much exposure to 10g just yet. I'm excited about it, I just don't have much experience on it.

We put up a development database on 10.2 a couple months back just so we could start getting experience on 10g. As we add more and more development schemas to this database, we're getting a taste for the new features.

We've been getting a disproportionate number of "Checkpoint not complete" messages lately. Sure, there's pretty heavy write activity going on at the time, but not enough to fill all my redo logs. I have six groups of 128M logfiles with two members each. During the heavy write activity, my logs get switched a few times in a minute. However, I know my log_archive_dest can keep up with the write activity, so there should be no problem archiving them that fast. From the alert.log:

Thu Dec 29 19:05:37 2005
Thread 1 cannot allocate new log, sequence 9945
Private strand flush not complete
Current log# 4 seq# 9944 mem# 0: /u01/oradata/demo1/redo04_01.log
Current log# 4 seq# 9944 mem# 1: /u02/oradata/demo1/redo04_02.log
Thread 1 advanced to log sequence 9945
Current log# 5 seq# 9945 mem# 0: /u01/oradata/demo1/redo05_01.log
Current log# 5 seq# 9945 mem# 1: /u02/oradata/demo1/redo05_02.log
Thu Dec 29 19:06:28 2005
Thread 1 advanced to log sequence 9946
Current log# 1 seq# 9946 mem# 0: /u01/oradata/demo1/redo01_01.log
Current log# 1 seq# 9946 mem# 1: /u02/oradata/demo1/redo01_02.log
Thu Dec 29 19:06:42 2005
Thread 1 advanced to log sequence 9947
Current log# 2 seq# 9947 mem# 0: /u01/oradata/demo1/redo02_01.log
Current log# 2 seq# 9947 mem# 1: /u02/oradata/demo1/redo02_02.log
Thu Dec 29 19:07:00 2005
Thread 1 cannot allocate new log, sequence 9948
Checkpoint not complete
Current log# 2 seq# 9947 mem# 0: /u01/oradata/demo1/redo02_01.log
Current log# 2 seq# 9947 mem# 1: /u02/oradata/demo1/redo02_02.log
Thread 1 advanced to log sequence 9948

After the write activity is over, I look at v$logfile and v$log and see that several groups are still ACTIVE.

I then doubled the size of the logfiles and reran the big job that caused the "Checkpoint not complete" error. While the job didn't cause a "Checkpoint not complete" error, it did make three of the log groups ACTIVE, which I kind of expected. However, 20-30 minutes later, I looked at v$logfile and v$log to see almost all my log groups were either ACTIVE or CURRENT.


system@demo1.us> l
1  select l.group#, lf.member, l.bytes/1024/1024 mb,  l.status, l.archived
2  from v$logfile lf, v$log l
3* where l.group# = lf.group#

GROUP# MEMBER                             MB STATUS     ARC
------ ------------------------------- ----- ---------- ---
   1 /u01/oradata/demo1/redo01_01.log   256 CURRENT    NO
   3 /u01/oradata/demo1/redo03_01.log   256 ACTIVE     YES
   3 /u02/oradata/demo1/redo03_02.log   256 ACTIVE     YES
   4 /u01/oradata/demo1/redo04_01.log   256 ACTIVE     YES
   4 /u02/oradata/demo1/redo04_02.log   256 ACTIVE     YES
   5 /u01/oradata/demo1/redo05_01.log   256 ACTIVE     YES
   5 /u02/oradata/demo1/redo05_02.log   256 ACTIVE     YES
   2 /u01/oradata/demo1/redo02_01.log   256 INACTIVE   YES
   2 /u02/oradata/demo1/redo02_02.log   256 INACTIVE   YES
   1 /u02/oradata/demo1/redo01_02.log   256 CURRENT    NO
   6 /u01/oradata/demo1/redo06_01.log   256 ACTIVE     YES
   6 /u02/oradata/demo1/redo06_02.log   256 ACTIVE     YES
                                      -----
Total                                   3072

12 rows selected.

Elapsed: 00:00:00.05

During light load, I would expect one group CURRENT and maybe one group ACTIVE. The other funny thing is the logs show they are archived, which if I look at the log_archive_dest, they are. I have to do some more research, but I suspect there's an init.ora parameter or something that isn't set correctly...

7 comments:

Videos by Professor Howdy said...: You have a riveting web log
and undoubtedly must have
atypical & quiescent potential
for your intended readership.

Happy New Year,
Dr. Howdy
'Thought & Humor'; 12/29/2005 10:12:00 PM
Jeff Hunter said...: Statspack tells me it's not even one of my top wait events.; 12/30/2005 04:48:00 PM
Anonymous said...: There is probably some internal tuning parameter that controls flushing of the IMU buffer to the redo log buffer. It is probably set wrong or there is a bug involving the conversation between the dbwr and the strand control. Check _db_writer_flush_imu and _imu_pools. The former is hopefully true, and the latter is likely 3. Play with the latter.; 12/30/2005 05:12:00 PM
Aman.... said...: wish you a great and so much 2006 sir.Happy new year to you.
with best wishes and regards
Aman Sharma.; 1/01/2006 05:11:00 AM
Anonymous said...: Howard,

"Checkpoint incompletes and lots of ACTIVEs can be due to a sluggish DBWR as well as ARCH behaviour."

I understand how sluggish DBWR can cause ACTIVES in this case. Can you please explain how ARCH can affect things. Thanks.

Ram.; 1/01/2006 12:22:00 PM
Anonymous said...: because if arch is archiving a log and is taking a while then suddenly another log needs archiving then you will wait; 1/03/2006 09:38:00 AM
Anonymous said...: I have the same behaviour on two of my databases, did you solve this or is this a bug. Which oracle version are you using?; 4/20/2007 04:38:00 AM

So What Co-operative

Thursday, December 29, 2005

Funky redo behaviour in 10g

7 comments:

Facebook Badge

Pageviews past week

Contributors

Best of (IMHO)

Blog Archive

Blogroll

Swag