Oracle 11g R2 Data Guard 容灾系统构建S
目录
- 异地或同城备机故障模拟
- 主数据库演练(switchover)模拟
- 主数据库故障切换(failover)模拟
异地或同城备机故障模拟
异地或同城备机出现故障不会影响主系统的继续正常运行,中断现象出现后,主机能很快识别,一旦网络恢复,主机也能在短时间内恢复的。
我们重启异地和同城备机系统。
- 重启前,我们在主数据库系统上模拟一些持续的插入业务。
[oracle@beijing ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 17 11:33:18 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> create table tab1(c1 int);
Table created.
SQL> @/home/oracle/batchinsert.sql
PL/SQL procedure successfully completed.
SQL> /
- 重启同城备机。
[oracle@beijing_2 ~]$ su -
Password:
Last login: Mon Jun 17 17:15:47 CST 2019 on tty1
[root@beijing_2 ~]# reboot
- 重启异地备机。
[oracle@chengdu dbs]$ su -
Password:
Last login: Sun Jun 16 20:10:40 CST 2019 on pts/1
[root@chengdu ~]# reboot
- 我们查看主数据库系统的日志传送情况。
SQL> select dest_id,dest_name,status,error,valid_now,fail_date,fail_sequence,valid_type from v$archive_dest where dest_id in (2,3);
DEST_ID DEST_NAME STATUS ERROR VALID_NOW FAIL_DATE FAIL_SEQUENCE VALID_TYPE
---------- -------------------- --------- ---------------------------------------- ---------------- --------- ------------- ---------------
2 LOG_ARCHIVE_DEST_2 ERROR ORA-03135: connection lost contact INACTIVE 17-JUN-19 74 ONLINE_LOGFILE
3 LOG_ARCHIVE_DEST_3 ERROR ORA-03113: end-of-file on communication INACTIVE 17-JUN-19 73 ONLINE_LOGFILE
channel
SQL>
所有的传输中断了。停留在不同的日志上,同城的停止在73号日志,异地的停止在74号日志。同样reboot命令重启系统,主数据库系统显示的错误提示还是一样的。
我们重新启用异地和同城备用系统。
- 我们启用成都的异地备用。
[oracle@chengdu ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 17 17:48:06 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 1135747072 bytes
Fixed Size 2227664 bytes
Variable Size 704643632 bytes
Database Buffers 419430400 bytes
Redo Buffers 9445376 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@chengdu ~]$ lsnrctl start
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 17-JUN-2019 17:48:30
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Starting /home/oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /home/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /home/oracle/app/oracle/diag/tnslsnr/chengdu/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=chengdu)(PORT=1521)))
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 17-JUN-2019 17:48:30
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /home/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File /home/oracle/app/oracle/diag/tnslsnr/chengdu/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=chengdu)(PORT=1521)))
Services Summary...
Service "chengdu" has 1 instance(s).
Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@chengdu ~]$
- 我们启用北京的同城备用系统。
[oracle@beijing_2 ~]$ lsnrctl start
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 17-JUN-2019 17:49:46
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Starting /home/oracle/app/oracle/product/11.2.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /home/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Log messages written to /home/oracle/app/oracle/diag/tnslsnr/beijing_2/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=beijing_2)(PORT=1521)))
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 17-JUN-2019 17:49:47
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /home/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
Listener Log File /home/oracle/app/oracle/diag/tnslsnr/beijing_2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=beijing_2)(PORT=1521)))
Services Summary...
Service "beijing_2" has 1 instance(s).
Instance "orcl", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@beijing_2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 17 17:49:51 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 709836800 bytes
Fixed Size 2231752 bytes
Variable Size 440402488 bytes
Database Buffers 260046848 bytes
Redo Buffers 7155712 bytes
Database mounted.
Database opened.
SQL>
- 启用后,我们查看主数据库系统的日志传送状态。
SQL> /
DEST_ID DEST_NAME STATUS ERROR VALID_NOW FAIL_DATE FAIL_SEQUENCE VALID_TYPE
---------- -------------------- --------- ---------------------------------------- ---------------- --------- ------------- ---------------
2 LOG_ARCHIVE_DEST_2 VALID YES 0 ONLINE_LOGFILE
3 LOG_ARCHIVE_DEST_3 VALID YES 0 ONLINE_LOGFILE
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 76
Next log sequence to archive 78
Current log sequence 78
SQL>
我们模拟的故障大约持续了短短的5分钟时间,主数据库系统提示,日志传送已经全部自动恢复了。这个容灾系统又重新开始正常工作了。
主数据库系统故障模拟
data guard 集群中的数据库就两种角色,主数据库,备用数据库。将某角色改变为另一角色,我们叫角色转换。角色转换有两种方式,一种是switchover,另一种是故障切换。
我们这里用switchover方式,这种方式更类似于一种计划好的用于灾备演练,或维护数据库的方式。是我们人为手工进行的角色切换。
switchover能保证角色转换时数据没有丢失。故障模拟前,我们要对主备系统做一系列的配置和检查。
- 检查和调整主、备数据库系统的参数文件
前面我们配置主数据库参数文件时候,没有考虑切换情况的发生,这里要把standby的参数补充到主数据库参数文件中。
[oracle@beijing dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 17 22:34:52 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> alter system set standby_file_management=auto;
System altered.
SQL> alter system set fal_server=beijing_2;
System altered.
SQL> alter system set fal_client=beijing;
System altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
我们计划在同城主备数据库之间做switchover。所以我们在主数据库上将异地的日志传送停止下来。
SQL> alter system set log_archive_dest_state_2=defer;
System altered.
我们在主数据库也创建对应的standby redo log。切换后,可以用于接收beijing_2发送来的redo log。
SQL> host
[oracle@beijing ~]$ cd $ORACLE_HOME/dbs
[oracle@beijing dbs]$ mkdir srl
[oracle@beijing dbs]$ exit
exit
SQL> alter database add standby logfile group 10 '?/dbs/srl/srl10.dbf' size 52428800;
Database altered.
SQL> alter database add standby logfile group 11 '?/dbs/srl/srl11.dbf' size 52428800;
Database altered.
SQL>
SQL> alter database add standby logfile group 12 '?/dbs/srl/srl12.dbf' size 52428800;
Database altered.
SQL>
我们在同城备用数据上增加primary相关参数。
SQL> alter system set log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST db_unique_name=beijing_2';
System altered.
SQL> alter system set log_archive_dest_2='service=beijing valid_for=(ONLINE_LOGFILES,PRIMARY_ROLE) lgwr sync affirm db_unique_name=beijing';
System altered.
SQL> alter system set log_archive_dest_state_1=enable;
System altered.
SQL> alter system set log_archive_dest_state_2=defer;
System altered.
- 检查TNS。
主数据库检查。
[oracle@beijing dbs]$ tnsping beijing_2
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 18-JUN-2019 00:43:40
Copyright (c) 1997, 2011, Oracle. All rights reserved.
Used parameter files:
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.35)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = beijing_2)))
OK (0 msec)
同城备机检查。
[oracle@beijing_2 dbs]$ tnsping beijing
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 18-JUN-2019 00:43:31
Copyright (c) 1997, 2011, Oracle. All rights reserved.
Used parameter files:
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.30)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = beijing)))
OK (0 msec)
- 检查log gap
在主备数据库上都检查一遍。
SQL> select dest_id,dest_name,status,gap_status from v$archive_dest_status;
DEST_ID DEST_NAME STATUS GAP_STATUS
---------- -------------------- --------- ------------------------
1 LOG_ARCHIVE_DEST_1 VALID
2 LOG_ARCHIVE_DEST_2 DEFERRED
3 LOG_ARCHIVE_DEST_3 VALID NO GAP
- 是否还有日志等待应用。
在同城备用数据库上检查日志应用情况。
SQL> select dest_id,dest_name,delay_mins from v$archive_dest;
DEST_ID DEST_NAME DELAY_MINS
---------- -------------------- ----------
1 LOG_ARCHIVE_DEST_1 0
2 LOG_ARCHIVE_DEST_2 0
3 LOG_ARCHIVE_DEST_3 0
- 检查主数据库是否可以转换角色
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
TO STANDBY
具备转换条件。
- 转换主数据库
SQL> alter database commit to switchover to physical standby;
Database altered.
SQL>
- 关闭beijing主数据库,再启动,成为新的物理备用数据库
SQL> shutdown immediate;
ORA-01092: ORACLE instance terminated. Disconnection forced
SQL> startup mount;
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@beijing ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Jun 18 01:05:41 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1135747072 bytes
Fixed Size 2227664 bytes
Variable Size 738198064 bytes
Database Buffers 385875968 bytes
Redo Buffers 9445376 bytes
Database mounted.
SQL>
- 查看beijing_2备用数据库是否能转换
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
NOT ALLOWED
不允许转换。
没有收到主句库送来的标记。
我们看看成都异地的备用数据库情况。
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
data guard默认switchover的备用数据库是log_archive_dest_2的指向的备用数据库。将错就错,我们继续switchover,将成都的备用数据库转换成主数据库。
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL> alter database commit to switchover to primary;
Database altered.
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.
Total System Global Area 1135747072 bytes
Fixed Size 2227664 bytes
Variable Size 738198064 bytes
Database Buffers 385875968 bytes
Redo Buffers 9445376 bytes
Database mounted.
Database opened.
SQL> select protection_mode,database_role,guard_status from v$database;
PROTECTION_MODE DATABASE_ROLE GUARD_S
-------------------- ---------------- -------
MAXIMUM PERFORMANCE PRIMARY NONE
成功将异地成都的数据库转换为主数据库了。
我们将原来的北京主数据库重启。
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
TO STANDBY
SQL> alter database commit to switchover to physical standby;
Database altered.
SQL> shutdown immediate;
ORA-01092: ORACLE instance terminated. Disconnection forced
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@chengdu ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Jun 18 12:52:42 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 1135747072 bytes
Fixed Size 2227664 bytes
Variable Size 704643632 bytes
Database Buffers 419430400 bytes
Redo Buffers 9445376 bytes
Database mounted.
Database opened.
SQL> select protection_mode,database_role,guard_status from v$database;
PROTECTION_MODE DATABASE_ROLE GUARD_S
-------------------- ---------------- -------
MAXIMUM PERFORMANCE PHYSICAL STANDBY NONE
SQL> alter database recover managed standby database disconnect;
Database altered.
我们看到原先的主数据库也已经完成转换,成为了一台物理备机。
做一个日志切换,我们看看能否正常工作。
新主数据库chengdu。
SQL> alter system switch logfile;
System altered.
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 108
Next log sequence to archive 110
Current log sequence 110
新物理备机beijing。
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 109
Next log sequence to archive 0
Current log sequence 110
日志传送没有问题了。
beijing_2在switchover中没有用上,完成转换后也没有在data guard集群中。因为我们在chengdu节点没有配置和beijing_2的连接。
至此,主数据库演练模拟基本完成。
主数据库故障切换(failover)模拟
上一节的演练,没有任何的数据丢失,一切在掌握之中,顺利的完成角色的switch。现实中,主数据库突然宕机,主数据库数据中心遭遇灾难等等突发事件,如何保证业务连续性,顺利启动备用节点呢?我们这节模拟故障的发生,并如何保证业务连续性。
灾难计划中,我们已经考虑到主业务节点故障的情况,这意味着不同的保护模式或多或少的会造成数据的丢失,故障切换意味着一些意想不到的事情已经发生了。
- 我们在主数据库系统模拟业务数据插入。
SQL> @/home/oracle/batchinsert.sql
- 我们做一个日志切换,然后关闭主数据库系统
[oracle@beijing ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Jun 18 18:05:31 2019
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select count(*) from hem.tab1;
COUNT(*)
----------
943
SQL> /
COUNT(*)
----------
969
SQL>
SQL> alter system switch logfile;
System altered.
SQL> shutdown abort;
ORACLE instance shut down.
-我们在同城和异地备份系统gap
同城是否有log gap
SQL> select thread#,low_sequence#,high_sequence# from v$archive_gap;
no rows selected
SQL> /
no rows selected
SQL> select unique thread# as thread,max(sequence#) over(partition by thread#) as last from v$archived_log;
THREAD LAST
---------- ----------
1 111
SQL>
异地是否有log gap
SQL> select thread#,low_sequence#,high_sequence# from v$archive_gap;
no rows selected
SQL> select unique thread# as thread,max(sequence#) over(partition by thread#) as last from v$archived_log;
THREAD LAST
---------- ----------
1 111
-停止日志的应用(redo apply)
分别停止同城和异地备库的MRP
SQL> alter database recover managed standby database cancel;
Database altered.
SQL>
-完成所有日志的应用
分别完成同城和异地备库的MRP
SQL> alter database recover managed standby database finish;
Database altered.
SQL>
这一步没出现提示错误,如果有错误提示,需要不同的处理方法。
-切换到主数据库
先看看状态是否正确
同城的。
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL
异地的
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL>
都可以切换成主数据库系统。我们都试试。
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL> alter database commit to switchover to primary;
Database altered.
SQL>
两台备机都切换成主数据库系统了。
- 打开两台备机系统,切换成主数据库系统
SQL> alter database open;
Database altered.
SQL> select database_role from v$database;
DATABASE_ROLE
----------------
PRIMARY
SQL>
- 我们看看数据有没有丢失
beijing_2的数据库
SQL> select count(*) from hem.tab1;
COUNT(*)
----------
1395
SQL>
chengdu的数据库
SQL> select count(*) from hem.tab1;
COUNT(*)
----------
1352
SQL>
原来主数据库beijing
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1135747072 bytes
Fixed Size 2227664 bytes
Variable Size 738198064 bytes
Database Buffers 385875968 bytes
Redo Buffers 9445376 bytes
Database mounted.
Database opened.
SQL> select count(*) from hem.tab1;
COUNT(*)
----------
1395
SQL>
同城的beijing_2的数据得到了最大的保护,而异地chengdu的数据丢失了部分。
- 将beijing数据库转换成physical standby
我们看看beijing 数据库状态
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
FAILED DESTINATION
错误的归档传送目标。我们停止它。
SQL> alter system set log_archive_dest_state_2=defer;
System altered.
SQL> alter system set log_archive_dest_state_3=defer;
System altered.
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------
NOT ALLOWED
SQL>
对于beijing数据库来讲没有合适的physical standby 供它使用,所以这里无法将其自己转换为physical standby了。
可以参考上一篇文章的内容,重建beijing数据库,然后将其作为beijing_2的同城备机,这样在这两者之间有可以通过switchover 进行交涉交换了。
容灾系统的构建Step by Step 告一段落。
后续,我们利用Oracle 12c 来构建data guard集群。