Wednesday, April 02, 2008

RMAN-08137: WARNING: archive log not deleted as it is still needed

A couple days ago my network link to my standby databases went dark in the middle of the night. My dbs took notice and started generating log shipping errors in the alert.log. About 10 minutes later, I got a page from my standby db that said his gap was more than the acceptable level and could I please do something. No problem, I just turned log_shipping off until the link came back up. The link came back up later in the day, I resumed log shipping, and most of my dbs recognized the gap and started resending logs.

The operative word being "most".

One of my dbs was shipping an archived redo log when the link went down. When I resumed log shipping, Oracle thought he was still shipping a log and thought the gap started at the next log sequence number. He shipped the current logs just fine, but just wouldn't resolve the gap.

"No worries", I thought, "I'll just rcp the missing file and register them with the standby control file and I'll be on my merry way."

Sure enough, the standby started recovering immediately and caught up in a short time. I thought everything was dandy until I got a page on Monday morning (2 AM, thank you very much) that my log_archive_dest was about to fill up on the primary.

Our standard protocol for freeing up the log_archive_dest is to run an archivelog backup with DELETE INPUT specified. I kicked off the backup, killed the monitor that watched the log_archive_dest, and went back to bed. I set the alarm for 1 hour later just to make sure the backup was done.

When I got back up, the archivelog backup wasn't done and the log_archive_dest was at 99%. I then looked in the directory and saw logs that were 3 and 4 days old which I knew was not correct. I killed the backup, deleted the 4 day old logs, crosschecked, and restarted the backup.

When I got into the office, my main task was to figure out why this happened. When I looked at my RMAN message log, I noticed about every log that was being backed up was accompanied by a warning:
RMAN-08137: WARNING: archive log not deleted as it is still needed

Hmm, that's not cool. Maybe my standby was really out of sync and I just didn't know it. Sometimes the primary will ship logs and the standby will accept but not apply them because it is missing a log. But that wasn't my case, the standby was within one or two logs from the primary.

When I looked at my v$archivelog_history, the "applied" flag indicated that the primary thought a bunch of logs were not applied to the standby. I knew different, as the standby was somewhat current.

At that point, I needed assistance. I filed a TAR and within about 2 hours found out it is a bug (bug 4538727) that is fixed in the 10.2.0.4 patchset. Since the patch just came out, I haven't gotten a chance to apply it to anything so I asked for workarounds.

"None", they say.

Theoretically, I supposed I could recreate my control files and my standby control files. I'm not really excited about doing that.

Until I figure out what to do, I'll just have to use RMAN to delete my logs "backed up 1 times" and crosscheck to manage my log_archive_dest space.

My Resolution, sort of.

7 comments:

Anonymous said...

This is good to know. Thanks for posting !!

Anonymous said...

Good post. Do you know if the bug also causes old backupsets staying in the fast recovery area? I had automatic rman job to remove the old backupsets, but it won't.

Anonymous said...

Jeff, you're awesome! Thanks for posting this. Just came up against the same issue at our installation and it's been driving me nuts. By the by -- thanks, too, for your Report scripts! Outstanding tool!

JR

raj said...

my DB is 10.2.0.4 .still iam seeing the same problem

on primary.
select thread#, max (sequence#) from v$archived_log where APPLIED='YES' group by thread#;

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1140
2 1160
4 1091
3 853
and on stand by
select thread#, max (sequence#) from v$archived_log where APPLIED='YES' group by thread#;

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1183
2 1199
3 888
4 1163
control file is not updating in the primary when the logs are shipped to standby from the logs which i applied manualy.

i applied the logs manullay are

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1141,1142
2 1161,1162
4 1092
3 854,855

any advice appreciated.

raj said...

I had the same issue with 10.2.0.4

on stand by select thread#, max (sequence#) from v$archived_log where APPLIED='YES' group by thread#;

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1183
2 1199
3 888
4 1163


on primary


SQL> select thread#, max (sequence#) from v$archived_log where APPLIED='YES' group by thread#;

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1140
2 1160
4 1091
3 853

i shipped the logs on node 1 1141,1142 on node 2 1161,1162,node 3 1092, node 4 854,855. control file is not updating the logs applied to stand by on primary. on primary still it gives'
sequence#) from v$archived_log where APPLIED='YES' group by thread#;

THREAD# MAX(SEQUENCE#)
---------- --------------
1 1140
2 1160
4 1091
3 853
40 logs less on each node less than stand by.

it supposed to be fixe in 10.2.0.4. but still iam seeing the issue any hep is appreciated.

Kim Anthonisen said...

Check Metalink note 373066.1.
This issue here is Streams related, but the workaround seems to work newertheless.


Issue seems to be that "backup... delete input" has an issue, but a "delete archivelog..." works.

So remove "delete input" and add a delete instead.

Br
Kim

Hamid Minoui said...

I ran into the same issue and was able to resolve it. This may be helpful.

1) The standby destination assigned to LOG_ARCHIVE_DEST_2 as SERVICE was not showing up in the query :
select dest_id, max(SEQUENCE#) seqno from v$archived_log where APPLIED='YES' group by dest_id;

2) I kept receiving the message:
"RMAN-08137: WARNING: archive log not deleted as it is still needed", when attempting to use RMAN to delete archivelog files than I knew were already applied.
I had to use 'delete force'.

I found out that the standby redo log files where not configured on the standby and that caused this issue. The destination is configured as :

ARCH ASYNC VALID_FOR=(STANDBY_LOGFILES,STANDBY_ROLE)