RAC通过RMAN迁移到单实例BUG
上周由于客户需要,测试RAC通过增量备份迁移到单实例,在进行0级恢复的时候,一直报错。
进行alter database open resetlogs;时,报错
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 3216
Session ID: 96 Serial number: 1917
alter.log里面提示ERROR: SLAVE COMMUNICATION ERROR WITH ASM;
发现是asm恢复到单实例的一个bug,具体解决参考:
Applies to:
Oracle Server - Enterprise Edition - Version 10.2.0.1 to 11.2.0.1 [Release 10.2 to 11.2]
Information in this document applies to any platform.
Symptoms
RMAN Restore from ASM to non-ASM fails :
RMAN Restore log :
==============
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 10/15/2011 21:55:12
ORA-03113: end-of-file on communication channel
ORA-01403: no data found
ORA-01403: no data found
Related Alert.log:
============
Sat Oct 15 21:55:09 2011
ERROR: slave communication error with ASM; terminating process 6564 Errors in
file /db/oracle/diag/rdbms/claimssb/claimssb/trace/claimssb_ora_6564.trc:
Sat Oct 15 21:55:11 2011
ERROR: slave communication error with ASM; terminating process 6567 Errors in
file /db/oracle/diag/rdbms/claimssb/claimssb/trace/claimssb_ora_6567.trc:
Related Trace file:
============
*** 2011-10-18 18:25:54.956
*** SESSION ID:(1132.1) 2011-10-18 18:25:54.956
*** CLIENT ID:() 2011-10-18 18:25:54.956
*** SERVICE NAME:() 2011-10-18 18:25:54.956
*** MODULE NAME:(rman@eciscor-prod-p01 (TNS V1-V3)) 2011-10-18 18:25:54.956
*** ACTION NAME:(0000006 STARTED62) 2011-10-18 18:25:54.956
ERROR: slave communication error with ASM; terminating process 27956
*** 2011-10-18 18:25:54.958
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=2, mask=0x0)
----- Error Stack Dump -----
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
5edf43aa0 3670 package body SYS.X$DBMS_BACKUP_RESTORE
5edf43aa0 3650 package body SYS.X$DBMS_BACKUP_RESTORE
....
....
Call stack in Trace file has function kfTerminateMe . For example:
----- Call Stack Trace -----
kfTerminateMe <- kfncSlaveSubmit <- kfncLogical <- kfioLogical <- ksfdglbsz <- krbprbsz <- krbmdbp <- krbidbp
Cause
Look for function kfTerminateMe in call stack of trace file.
There is a related Bug which didn't have significant progress due to Required Information not available:
Bug 9692233 ERROR: SLAVE COMMUNICATION ERROR WITH ASM; TERMINATING PROCESS 20518 >> Closed as "33-Required Info not available"
However, this problem is actually due to below unpublished Bugs :
Unpublished Bug 9931472 ALTER DATABASE OPEN RESETLOGS FAILS WITH 3113 - ASM TO NFS STORAGE
Above unpublished Bug 9931472 is duplicate of below Bug :
Unpublished Bug 9530594 TB:SH:ASM PROCESS HIT IPC SEND TIMEOUT AND TERMINATED, THIS INDUCE DB INST CRASH
Unpublished Bug 9530594 is fixed in 11.2.0.2 onwards. One-off Patches available for some platforms.
Solution
Upgrade to 11.2.0.2 or later.
or
Apply one-off Patch 9530594 to Oracle RDBMS home.
Workaround:
You Can try to install ASM on the host where restore is failing. However, this doesn't work soemtimes. Moreover it's cumbersome to have ASM instance setup than applying Patch 9530594.
通知客户将测试环境升级到跟生产环境一样的11.2.0.3.