18c & 19c Physical Standby Switchover Best Practices using SQL*Plus (Doc ID 2485237.1)

Posted 2021-03-23 Oracle&PostgreSQL

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了18c & 19c Physical Standby Switchover Best Practices using SQL*Plus (Doc ID 2485237.1)相关的知识，希望对你有一定的参考价值。

APPLIES TO:

Oracle Database - Enterprise Edition - Version 18.3.0.0.0 and later
Information in this document applies to any platform.

GOAL

This Document explain about switchover steps for 18c and 19c 本文档说明有关18c和19c的切换步骤

SOLUTION

Prerequisites 先决条件

Latest psu/bundle patches 最新的psu/bundle补丁

Master Note for Database Proactive Patch Program (Doc ID 756671.1)

Setup/configuration verification Setup/configuration验证

Primary & standby should be running with same version of RDBMS 主库和备库版本要相同
Verify the alert logfiles and make sure there are no erorrs 验证警报日志文件，并确保没有错误
Run select on v$database_block_corruption & v$nonlogged_block from primary and standby and make sure there are no corruption 在主服务器和备用服务器上查询v$database_block_corruption和v$nonlogged_block，确保没有损坏
Make sure primary and physical standby configuration are good and there are no errors in redo transport and redo apply. 确保主库和备库配置都正确，并且redo传输和redo应用没有错误。
Verify the Physical Standby Database Is Performing Properly

You can also optionally, use the below queries to check the redo transport and apply status 您也可以选择使用以下查询来检查重做传输并应用状态

On primary
To check the remote redo transport status and if there are any errors, V$ARCHIVE_DEST.ERROR will show the details
要检查远程redo传输状态以及是否有任何错误，V$ARCHIVE_DEST.ERROR将显示详细信息

SQL> col DEST_NAME for a20
SQL> col DESTINATION for a25
SQL> col ERROR for a15
SQL> col ALTERNATE for a20
SQL> set lines 1000
SQL> select DEST_NAME,DESTINATION,ERROR,ALTERNATE,TYPE,status,VALID_TYPE,VALID_ROLE from V$ARCHIVE_DEST where STATUS <>‘INACTIVE‘;

To check the last archivelog created at the primary: 要检查在主数据库上创建的最后一个归档日志：

SQL> select thread#, max(sequence#) "Last Primary Seq Generated"  
from gv$archived_log val, gv$database vdb
where val.resetlogs_change# = vdb.resetlogs_change#
group by thread# order by 1;

On Standby:
Using the below query, check the last received Archivelog from primary database (if database is RAC, then result will be displayed for each thread)

使用以下查询，检查从主数据库最后收到的Archivelog（如果数据库是RAC，则将显示每个线程的结果）

Query output is: last archive log sequence received by standby: 查询输出为：备用数据库接收到的最后一个归档日志序列：

SQL> select thread#, max(sequence#) "Last Standby Seq Received"  
 from gv$archived_log val, gv$database vdb
 where val.resetlogs_change# = vdb.resetlogs_change#
 group by thread# order by 1;

Query output is: last archive log sequence Applied by standby 查询输出为：上次批量日志序列由备用

SQL> select thread#, max(sequence#) "Last Standby Seq Applied"
 from gv$archived_log val, gv$database vdb
 where val.resetlogs_change# = vdb.resetlogs_change#
 and val.applied in (‘YES‘,‘IN-MEMORY‘)
 group by thread# order by 1;

Verify Initialization Parameters 验证初始化参数

Mainly below parameter should have configured correctly  主要是以下参数应该已正确配置

log_archive_config : should include primary and standby database (if multiple standby databases are existing, then all the standby database details should be included)  应包括主数据库和备用数据库（如果存在多个备用数据库，则应包括所有备用数据库详细信息）
fal_server             : remote server from where archivelog can be fetched  可从中获取archivelog的远程服务器
db_unique_name   : uniuque name under this configuration  在此配置下的唯一名称
log_archive_dest_n: for remote database to set archives.  用于远程数据库设置存档。

In idle primary & one standby configuration, primary should have configuration (log_archive_dest_n) to sent archives to standby with VALID_FOR clause (PRIMARY_ROLE,ONLINE_FILE) & standby will also have similar configuration.
Once switchover completes, new primary will have log_archive_dest_n configuration to sent archive logs/redo

在空闲的主数据库和一个备用数据库配置中，主数据库应具有配置（log_archive_dest_n），以便使用VALID_FOR子句（PRIMARY_ROLE，ONLINE_FILE）将存档发送到备用数据库，并且备用数据库也将具有类似的配置。
切换完成后，新的主数据库将具有log_archive_dest_n配置以发送存档日志/重做

Ensure ‘compatible‘ is set to same value at primary and standby 确保在主数据库和备用数据库上将 ‘compatible‘设置为相同的值
If the file locations are different between primary and standby, use db_file_name_convert & log_file_name_convert for datafiles and redo logfiles respectively 如果主数据库和备用数据库之间的文件位置不同，请分别对数据文件和重做日志文件使用db_file_name_convert和log_file_name_convert

Refer: Set Primary Database Initialization Parameters 参考：设置主数据库初始化参数

Understand and Test Fallback Options 了解和测试Fallback选项

Check: A.4 Problems Switching Over to a Physical Standby Database

Pre-Switchover 切换前

Ensure Prerequisites are completely verified & Along with Prerequisites follow the below guidance to have sucessful swithover 确保先决条件得到完全验证，并且与先决条件一起遵循以下指导，以确保成功完成
These steps should be executed before real planned outtage starts and make sure there are no issue 这些步骤。应在实际计划的中断开始之前执行这些步骤，并确保没有问题

Verify Redo/Archive log apply is goof and there are no gap 确认Redo/Archive日志apply，并且没有gap

run the below query in physical standby to check last archive log sequence received and applied from all the thread, This will not include current sequence as the SQL is extracing details from v$archived_log
在物理备用数据库中运行以下查询以检查从所有thread接收和应用的最后一个归档日志序列，由于SQL是从v$archived_log提取详细信息，因此将不包括当前序列

SQL> select thread#, max(sequence#) "Last Standby Seq Applied"
 from gv$archived_log val, gv$database vdb
 where val.resetlogs_change# = vdb.resetlogs_change#
 and val.applied in (‘YES‘,‘IN-MEMORY‘)
 group by thread# order by 1;

Check the MRP process status (it should be started running and applying the logs) 检查MRP进程状态（应该开始运行并应用日志）

SQL> select * from gv$dataguard_process;

Commands to stop & start the managed recovery process: 停止和启动managed recovery process的命令

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

For any reason, If standby database recovery (MRP) started with delay OR if the standby always maintained with lag then switchover will consume time to apply the logs to be sync. 出于任何原因，如果备用数据库恢复（MRP）延迟启动，或者如果备用数据库始终保持滞后，则切换将消耗时间来应用日志进行同步
Before switchover, try to maintain minimal archive log apply lag, which will reduce the total switchover time window. 在切换之前，请尝试保持最小的归档日志应用滞后时间，这将减少总的切换时间窗口。

Verify the apply delay configurations 验证应用延迟配置

If archive log gap is huge then 如果存档日志gap很大，则

1) Monitoring Redo Transport Services to make sure it there are no transport log 监视重做传输服务以确保没有传输日志

2) Standby can also be recovered using incremental backup taken from primary 也可以使用从主数据库获取的增量备份来恢复备用数据库

Restoring and Recovering Files Over the Network 通过网络还原和恢复文件

Check the datafiles & Tempfiles status 检查数据文件和临时文件状态

Expected all the datafiles should be online in primary and standby, Incase if there are files offline (OR) NOT in online status, then restore the file and recover to make sure the standby database files are same as primary database files.
期望所有数据文件在主数据库和备用数据库中都应处于联机状态，以防万一如果存在脱机（OR）不在联机状态的文件，请还原该文件并进行恢复以确保备用数据库文件与主数据库文件相同。
If there files made offline and after switchover if those files are needed to be in online after switchver, then make the files online
如果有文件脱机并且在切换后又需要在切换后将这些文件置于联机状态，则使这些文件联机

SQL> SELECT NAME FROM V$DATAFILE WHERE STATUS=’OFFLINE’;
SQL> ALTER DATABASE DATAFILE ‘datafile-name‘ ONLINE;

For Tempfiles: 对于临时文件

SQL> select tf.name filename, bytes, ts.name tablespace from v$tempfile tf, v$tablespace ts where tf.ts#=ts.ts#;

The listed tempfiles are good enough for the application, it should be fine. 列出的临时文件对于应用程序已经足够了，应该没问题

If more tempfiles needs to be added, then check in primary as well and add additional files. 如果需要添加更多临时文件，则也请检入主文件并添加其他文件

Online and standby redo logfile configuration 联机和备用重做日志文件配置

Online redologfile:

set lines 150
col member for a50
select a.thread#,a.group#,a.bytes,a.blocksize,b.type,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;

From primary when the above command executed, we may get a.status in (INACTIVE,ACTIVE,CURRENT) 从上面的命令执行时，我们可以从（primary）的（INACTIVE，ACTIVE，CURRENT）中获得一个状态。

Expected a.status from Standby is UNUSED, CLEARING or CLEARING_CURRENT, if output has different result, then manually redo logfiles needs to be cleared.
Standby预期的a.status为UNUSED，CLEARING或CLEARING_CURRENT，如果输出结果不同，则需要清除手动重做日志文件。

For Standby redo logfile(SRL):

select s.thread#,s.group#,s.status,s.bytes,l.type,l.member from v$logfile l,v$standby_log s where s.group#=l.group#;

Standby redo logfile status would be in UNASSIGNED OR ACTIVE.

Command to clear ORL group: 清除ORL组的命令

SQL> ALTER DATABASE CLEAR LOGFILE GROUP <ORL GROUP# >;

If ORL or SRL needs to be cleared in the standby, Managed recovery process has to be stopped.
如果需要在备用数据库中清除ORL或SRL，则必须停止托管恢复过程。

If the ORLs are not cleared till switchover time, then SWITCHOVER command will clear the ORLs and the start the database. But switchover will be consuming time to complete. 如果在切换时间之前未清除ORL，则SWITCHOVER命令将清除ORL并启动数据库。但是切换将花费时间来完成。
If the wait is longer (more than 15 min) then due to timeout session will get killed for oracle process, if the switchover is terminated due to timeout, retry again until switchover is sucessful. 如果等待时间更长（超过15分钟），则由于超时会话而被oracle进程杀死，如果切换因超时而终止，请重试，直到切换成功为止。

If database is configured to use OMF files for Redologfile OR log_file_name_convert is set, then Online redo logfiles would get cleared automatically with the managed recovery process is started. 如果将数据库配置为对重做日志文件使用OMF文件，或者设置了log_file_name_convert，则将在启动托管恢复过程时自动清除联机重做日志文件。

Note: log_file_name_convert parameter is recommened to set in primary & standby eventhough SRL&ORL locations are same at primary and standby 注意：即使SRL ORL位置在主数据库和备用数据库中相同，也建议在主数据库和备用数据库中设置log_file_name_convert参数
If the file locations are same at primary and standby, then configure log_file_name_convert with same value as replacing string 如果文件位置在主数据库和备用数据库中相同，则将log_file_name_convert配置为与替换字符串相同的值
Example: log_file_name_convert=‘dummy‘,‘dummy‘

To manage standby redo logfiles Refer: Managing Standby Redo Logs

Checking the alert logfiles

1) from primary alert logfile:
   * Check are there any issue reported for redo transport ?
   * There is no password file issue?
   * There are no TNS or connection issue

2) From Standby database make sure,
   * There are no error related to Managed recovery
   * Recovery is moving forward by applying the archive log / redo log
   * There are no TNS or connection issue
   * There are no I/O issue or corurption issue
   select * from v$database_block_corruption; -- it returns no rows
   select * from v$nonlogged_block; -- it returns no rows

Check Archive log GAP & Redo Delay apply

You must configure the LOG_ARCHIVE_DEST_n and LOG_ARCHIVE_DEST_STATE_n parameters for each standby database so that when a switchover or failover occurs, all standby sites continue to receive redo data from the new primary database

You execute the below command in primary database:

Considering log_archive_dest_2 is configured for the redo shipping.

SQL> SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;

STATUS should be Valid
GAP_STATUS should be NO GAP

If different result is reported, then switchover should NOT be tried.

If the delay configured, stop the managed recovery process and the start the process without delay

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;

If the delay is not removed, then switchover will take longer time.

Specifying a Time Delay for the Application of Archived Redo Log Files

Switchover:

While doing switchover, if standby connection needs to be maintained without disconnecting, then set the parameter STANDBY_DB_PRESERVE_STATES to SESSION or ALL

Verify the switchover

If this operation had been successful, a Database Altered message should be returned (execute the below SQL in the primary)

SQL> ALTER DATABASE SWITCHOVER TO <standby db_name> VERIFY;

In case of error, fix an issue and then rerun switchover verify command.

Example: "ORA-16475: succeeded with warnings, check alert log for more details", in this case check the alert logfile and then resolve all the errors/warnings

Switchover steps

If switchover verify is successful, then execute the command to switchover the database.

1) Execute in the current primary

SQL> ALTER DATABASE SWITCHOVER TO <standby db_name>;

if the step 1 is successful, then follow step 2 open the new primary database in open mode

2) execute in new primary database

SQL> ALTER DATABASE OPEN;

3) Old primary (current/new standby) should be mounted Or opened depends on the case .

If standby is Oracle Active data guard physical standby:

SQL> STARTUP;

If standby is NOT Oracle Active data guard physical standby:

SQL> STARTUP MOUNT;

4) start redo apply in new standby

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

Post Switchover

In primary:

Check is the archivelogs are being transferred to the standby and getting applied

SQL> alter system archive log current;
SQL>select dest_id,error,status from v$archive_dest where dest_id=<your remote log_archive_dest_<n>>;
SQL>select max(sequence#),thread# from v$log_history group by thread#;

If remote log_Archive_destination is 2 i.e log_archive_dest_2.

SQL>select max(sequence#) from v$archived_log where applied=‘YES‘ and dest_id=2;

In standby:

Verify the archivelog availability and the application of the archivelog file

SQL>select max(sequence#),thread# from v$archived_log group by thread#;
SQL> select name,role,instance,thread#,sequence#,action from gv$dataguard_process;

Additionally, Alert logfiles can be verified to confirm the archivelog transfer and archivelog apply in standby

以上是关于18c & 19c Physical Standby Switchover Best Practices using SQL*Plus (Doc ID 2485237.1)的主要内容，如果未能解决你的问题，请参考以下文章