Oracle19c PDB级别Failover 出错场景测试

Posted dingdingfish

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle19c PDB级别Failover 出错场景测试相关的知识,希望对你有一定的参考价值。

首先抛出观点,对于客户端连接,也就是应用程序使用的连接,不要用默认的服务。

原文出处为Real Application Clusters Administration and Deployment Guide

All of these services are used for internal management. You cannot stop or disable any of these internal services to do planned outages or to failover to Oracle Data Guard. Do not use these services for client connections.

之前在用户的RAC环境中,有一个PDB shutdown后处于mount状态,结果没有failover。原因就是其使用了数据库为PDB自动建立的默认服务。当时找到了Alfred Zhao的这篇文章:测试12.2.0.1RAC PDB级别的Failover。这篇文章中给出了成功的情形,但没有演示失败的场景,这正是这篇文章的目的。

我的环境为在Oracle公有云上建立2节点RAC,数据库版本为19.11,其中有一个PDB名为PDB1:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO

SQL> alter session set container = pdb1;

Session altered.

SQL> select name from v$active_services;

NAME
----------------------------------------------------------------
pdb1

这个服务是创建PDB时,数据库自动建立的,是内部服务(如前文所说的internal services),只应用于管理。
这个服务你是停不掉的,除非你强制停实例或杀进程:

SQL> exec DBMS_SERVICE.STOP_SERVICE('pdb1');
BEGIN DBMS_SERVICE.STOP_SERVICE('pdb1'); END;

*
ERROR at line 1:
ORA-44793: cannot stop internal services
ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 95
ORA-06512: at "SYS.DBMS_SERVICE", line 519
ORA-06512: at line 1

在Oracle公有云服务上,CDB和PDB都会在控制台提供拼写好的连接串,如下:

# CDB
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=racMJMWZ_icn189.pub.racdblab.oraclevcn.com)))

# PDB
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(:q=on)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

其中写了三个地址,实际为3个SCAN VIP:

$ srvctl config scan
SCAN name: lvracdb-s01-2022-01-14-123012-scan.pub.racdblab.oraclevcn.com, Network: 1
Subnet IPv4: 10.0.0.0/255.255.255.0/ens3, static
Subnet IPv6:
SCAN 1 IPv4 VIP: 10.0.0.213
SCAN VIP is enabled.
SCAN 2 IPv4 VIP: 10.0.0.229
SCAN VIP is enabled.
SCAN 3 IPv4 VIP: 10.0.0.74
SCAN VIP is enabled.

测试准备

没有建立专门的客户端,使用的是RAC节点1作为客户端,SQL Plus作为客户端测试程序。

在节点1上配置3个Net服务名:

# 默认服务,使用3个SCAN VIP
PDB1-SCANVIP =
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

# 默认服务,使用2个主机VIP,节点1的VIP在前面
PDB1-VIP=
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.116)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.192)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

# 默认服务,使用2个主机IP,节点1的IP在前面
PDB1-NODEIP=
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.199)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.113)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

测试语句格式如下:

sqlplus sh/$USERPWD@$SERVICE_NAME

停止实例测试

强制关闭实例1:

srvctl stop instance -d $DBNAME -i $INST1 -force

确认已关闭:

$ srvctl status database -d $DBNAME
Instance racMJMWZ1 is not running on node lvracdb-s01-2022-01-14-1230121
Instance racMJMWZ2 is running on node lvracdb-s01-2022-01-14-1230122

测试结果如下:

测试ID测试语句结果
1sqlplus sh@PDB1-SCANVIP成功
2sqlplus sh@PDB1-VIP成功
3sqlplus sh@PDB1-NODEIP成功

居然都成功了。测试1成功容易理解,为何余下2个也未出错。

关闭数据库测试

和Alfread讨论了以下,发现错误发生时数据库是关闭的,因此决定模拟此场景。
先恢复实例,保证整个环境正常:

$ srvctl start instance -d $DBNAME -i $INST1

$ srvctl status database -d $DBNAME
Instance racMJMWZ1 is running on node lvracdb-s01-2022-01-14-1230121
Instance racMJMWZ2 is running on node lvracdb-s01-2022-01-14-1230122

在节点1上登录数据库并关闭PDB PDB1:

connect / as sysdba
alter pluggable database pdb1 close immediate;

确认数据库已关:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED

测试结果如下:

测试ID测试语句结果
1sqlplus sh@PDB1-SCANVIP成功
2sqlplus sh@PDB1-VIP失败
3sqlplus sh@PDB1-NODEIP失败

失败时的报错信息均为如下:

$ sqlplus sh/$USERPWD@$PDB1-VIP

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Jan 18 04:37:20 2022
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-01109: database not open


Enter user-name:

结论

  1. 不要使用默认服务用于客户端连接
  2. 连接串中要用SCAN NAME,写法最简单,对于默认服务可以提供高可用,增删节点时无需改变。
  3. 不写SCAN NAME,写成3个SCAN VIP也是可以的,但写法太复杂。
  4. 默认服务不提供细粒度控制,如设置Application Continuity等。

补充说明

在以上测试过程中,由于并没有关闭节点1,因此其主机VIP不会漂移。
可以通过以下命令验证:

$ srvctl config nodeapps
Network 1 exists
Subnet IPv4: 10.0.0.0/255.255.255.0/ens3, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
VIP exists: network number 1, hosting node lvracdb-s01-2022-01-14-1230121
VIP Name: lvracdb-s01-2022-01-14-1230121-vip.pub.racdblab.oraclevcn.com
VIP IPv4 Address: 10.0.0.116
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:
VIP exists: network number 1, hosting node lvracdb-s01-2022-01-14-1230122
VIP Name: lvracdb-s01-2022-01-14-1230122-vip.pub.racdblab.oraclevcn.com
VIP IPv4 Address: 10.0.0.192
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:
ONS exists: Local port 6100, remote port 6200, EM port 2016, Uses SSL true
ONS is enabled
ONS is individually enabled on nodes:
ONS is individually disabled on nodes:

# 或者
$ olsnodes
lvracdb-s01-2022-01-14-1230121
lvracdb-s01-2022-01-14-1230122

$ srvctl status vip -node $NODE1
VIP 10.0.0.116 is enabled
VIP 10.0.0.116 is running on node: lvracdb-s01-2022-01-14-1230121

$ srvctl status vip -node $NODE2
VIP 10.0.0.192 is enabled
VIP 10.0.0.192 is running on node: lvracdb-s01-2022-01-14-1230122

在正常情况下,默认服务在所有节点都是启动的:

$ sudo su - grid

# 无任何输出
$ crsctl stat res -t|grep -i pdb1

$ ps -ef|grep tns
root        36     2  0 Jan14 ?        00:00:00 [netns]
oracle   17446 17161  0 07:03 pts/0    00:00:00 grep --color=auto tns
grid     58936     1  0 Jan14 ?        00:00:13 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid     59034     1  0 Jan14 ?        00:02:04 /u01/app/19.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid     62964     1  0 Jan16 ?        00:00:06 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid     62977     1  0 Jan16 ?        00:00:06 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit

$ lsnrctl services LISTENER_SCAN2
...
Service "pdb1.pub.racdblab.oraclevcn.com" has 2 instance(s).
  Instance "racMJMWZ1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:6 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.116)(PORT=1521))
  Instance "racMJMWZ2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:6 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.192)(PORT=1521))
...

获取public IP,可以参照以下:

$ olsnodes
lvracdb-s01-2022-01-14-1230121
lvracdb-s01-2022-01-14-1230122

$ hostname -f
lvracdb-s01-2022-01-14-1230121.pub.racdblab.oraclevcn.com
[grid@lvracdb-s01-2022-01-14-1230121 ~]$ nslookup $(hostname -f)
Server:         169.254.169.254
Address:        169.254.169.254#53

Non-authoritative answer:
Name:   lvracdb-s01-2022-01-14-1230121.pub.racdblab.oraclevcn.com
Address: 10.0.0.199

$ hostname -i
10.0.0.199

参考

  1. https://www.cnblogs.com/jyzhao/p/10458233.html#4
  2. https://mikedietrichde.com/2017/08/16/ora-44787-dont-mess-with-the-default-oracle-service/
  3. How To Configure Server Side Transparent Application Failover (Doc ID 460982.1)
  4. https://logic.edchen.org/how-oracle-rac-vip-failover-to-another-node/
  5. https://emersontech.com.br/en/checking-network-ips-rac/
  6. https://satya-racdba.blogspot.com/2012/07/virtual-ip-vip-address-rac-oracle.html
  7. http://ermanarslan.blogspot.com/p/oracle-rac.html

以上是关于Oracle19c PDB级别Failover 出错场景测试的主要内容,如果未能解决你的问题,请参考以下文章

创建PDB的两种操作

Oracle 12c 多租户家族(12c 18c 19c)如何在 PDB 中添加 HR 模式

ORACLE 19C PDB克隆遇到ORA-65169错误问题

ORACLE 19C PDB克隆遇到ORA-65169错误问题

ORACLE 19C PDB克隆遇到ORA-65169错误问题

Oracle19c的CDB和PDB