主机os重装节点加回RAC集群

Posted easonhyj

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了主机os重装节点加回RAC集群相关的知识,希望对你有一定的参考价值。

前言

在正常的生产环境当中,我们有时候会遇到主机磁盘以及其他硬件故障导致主机OS系统无法启动,或者OS系统本身故障无法修复的情况。这时候除了重装OS系统也没别的办法,但是重装后改如何加入原有的RAC集群呢

👇👇下面的实验过程将一步一步带你完成OS重装节点的RAC集群加入👇👇

实验环境准备

1.RAC部署

关于rac的部署,csdn上有很多,这里就直接略过
☀️后期有时间写rac的话,这里会补上连接,可以期待一下☀️

2.环境参数

主机名OSPublic IPVIPPrivate IP
rac1rhel7.6192.168.56.5192.168.56.710.10.1.1
rac2rhel7.6192.168.56.6192.168.56.810.10.1.1

3.模拟OS故障

此处由于时间原因,这里采用直接将2号节点的GI和DB软件删除来模式一个重装后的全新OS系统🌜

  • 集群状态
[grid@rac1:/home/grid]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                         
ora.cvu
      1        ONLINE  ONLINE       rac2                                         
ora.oc4j
      1        ONLINE  ONLINE       rac2                                         
ora.orcl.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac2
  • 卸载节点2的GI和DB
[root@rac2:/root]$ rm -rf /etc/oracle
[root@rac2:/root]$ rm -rf /etc/ora*
[root@rac2:/root]$ rm -rf /u01
[root@rac2:/root]$ rm -rf /tmp/CVU*
[root@rac2:/root]$ rm -rf /tmp/.oracle
[root@rac2:/root]$ rm -rf /var/tmp/.oracle
[root@rac2:/root]$ rm -f /etc/init.d/init.ohasd 
[root@rac2:/root]$ rm -f /etc/systemd/system/oracle-ohasd.service
[root@rac2:/root]$ rm -rf /etc/init.d/ohasd
  • 再次确认集群状态
[grid@rac1:/home/grid]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
ora.OCR.dg
               ONLINE  ONLINE       rac1                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.oc4j
      1        ONLINE  ONLINE       rac1                                         
ora.orcl.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  OFFLINE                                                   
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  INTERMEDIATE rac1                     FAILED OVER         
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                              
  • 确认节点2环境
[root@rac2:/]$ ll
total 28
drwxr-xr-x.   2 oracle oinstall    6 Sep 25 19:32 backup
lrwxrwxrwx.   1 root   root        7 Sep 24 15:31 bin -> usr/bin
dr-xr-xr-x.   4 root   root     4096 Sep 25 20:47 boot
drwxr-xr-x   20 root   root     3640 Sep 26 15:20 dev
drwxr-xr-x. 144 root   root     8192 Jan 14  2022 etc
drwxr-xr-x.   5 root   root       46 Sep 25 19:32 home
lrwxrwxrwx.   1 root   root        7 Sep 24 15:31 lib -> usr/lib
lrwxrwxrwx.   1 root   root        9 Sep 24 15:31 lib64 -> usr/lib64
drwxr-xr-x.   2 root   root        6 Dec 15  2017 media
drwxr-xr-x.   2 root   root        6 Dec 15  2017 mnt
drwxr-xr-x.   4 root   root       32 Sep 25 21:26 opt
dr-xr-xr-x  178 root   root        0 Jan 14  2022 proc
dr-xr-x---.  15 root   root     4096 Sep 26 15:20 root
drwxr-xr-x   37 root   root     1140 Sep 26 15:20 run
lrwxrwxrwx.   1 root   root        8 Sep 24 15:31 sbin -> usr/sbin
drwxr-xr-x.   3 root   root     4096 Sep 25 22:28 soft
drwxr-xr-x.   2 root   root        6 Dec 15  2017 srv
dr-xr-xr-x   13 root   root        0 Sep 26 15:40 sys
drwxrwxrwt.  13 root   root     4096 Sep 26 15:40 tmp
drwxr----T    4 root   root       32 Sep 25 21:59 user_root
drwxr-xr-x.  13 root   root      155 Sep 24 15:31 usr
drwxr-xr-x.  20 root   root      282 Sep 24 15:48 var
[root@rac2:/]$ ps -ef | grep grid
root      5847  4091  0 15:41 pts/0    00:00:00 grep --color=auto grid
[root@rac2:/]$ ps -ef | grep asm
root      5852  4091  0 15:41 pts/0    00:00:00 grep --color=auto asm
[root@rac2:/]$ ps -ef | grep oracle
root      5856  4091  0 15:41 pts/0    00:00:00 grep --color=auto oracle
#根目录下已经没有u01目录,也不存在grid和oracle用户进程

🔫环境确认无误下面开始进入实操

实战记录

重点 节点在加入集群之前,需要进行的前期配置与安装RAC的环境配置相同,这里略过!不清楚的同学可以自行查阅资料

💥💥一定要仔细检查各种参数配置,由于忘了创建相关目录,Eason在后面的环节各种报错😭

1. 清除重装主机的OCR条目

[grid@rac1:/home/grid]$ olsnodes
[root@rac1:/root]$ /u01/app/11.2.0/grid/bin/crsctl delete node -n rac2

✒️要检查上述步骤是否成功,在尚存执行即可 olsnodes ,重装的主机不应该出现在它列出的清单里。

2. 从OCR中删除重装主机的VIP信息

[root@rac1:/root]$ /u01/app/11.2.0/grid/bin/srvctl remove vip -i rac2 -v -f
Successfully removed VIP rac2.

⛔️清除节点2的VIP后,最好重启网络服务,否则操作系统层IP地址不会释放

3. 清除重装主机的GI和DB home的inventory信息

  • 清除GI的Inventory
[grid@rac1:/u01/app/11.2.0/grid/oui/bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=rac1" CRS=TRUE -silent -local
  • 清除DB的Inventory
[oracle@rac1:/u01/app/oracle/product/11.2.0/db/oui/bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME  CLUSTER_NODES=rac1 -silent -local

📢注:这里CLUSTER_NODES是写尚存节点的清单

4. CVU检查

[grid@rac1:/home/grid]$ /u01/app/11.2.0/grid/bin/./cluvfy  stage -pre nodeadd -n rac2 -verbose

查看校验信息,个别failed的可以忽略,比如resolv.conf等解析文件配置

5. 在1节点上执行AddNode.sh

[grid@rac1:/u01/app/11.2.0/grid/oui/bin]$ export IGNORE_PREADDNODE_CHECKS=Y
[grid@rac1:/u01/app/11.2.0/grid/oui/bin]$ ./addNode.sh -silent "CLUSTER_NEW_NODES=rac2" "CLUSTER_NEW_VIRTUAL_HOSTNAMES=rac2-vip"

👑这里如果不忽略之前的校验失败选项,直接添加节点将产生报错,这和我们安装rac时的校验一样 👑

💎忽略校验失败选项后成功添加节点

6.在2号节点运行脚本启动CRS stack

[root@rac2:/root]$ /u01/app/oraInventory/orainstRoot.sh
[root@rac2:/root]$ /u01/app/11.2.0/grid/root.sh
  • 检查集群状态
[grid@rac2:/home/grid]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.oc4j
      1        ONLINE  ONLINE       rac1                                         
ora.orcl.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  OFFLINE                                                   
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac1   

7. 在1节点上执行AddNode.sh

[oracle@rac1:/home/oracle]$ /u01/app/oracle/product/11.2.0/db/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES=rac2"
  • 执行root.sh脚本
[root@rac2:/dev]$ /u01/app/oracle/product/11.2.0/db/root.sh

📛rhel7上的11g版本,这里提示的nmhs文件不存在,可以手工从1号节点拷贝,然后修改权限

8.启动节点2实例

还记之前Eason清除原有节点信息都做了什么操作吗❓

是不是只清理了nventory信息❓

🚀现在集群中有节点2的实例信息啊,数据库中也有,那怎么办,直接启动呗🚀

❤️别急,启动之前你要修改pfile❤️

[oracle@rac2:/u01/app/oracle/product/11.2.0/db/dbs]$ mv initorcl1.ora initorcl2.ora 
[grid@rac1:/u01/app/oracle/product/11.2.0/db]$ srvctl start database -d orcl
  • 集群状态
[grid@rac1:/u01/app/oracle/product/11.2.0/db]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.oc4j
      1        ONLINE  ONLINE       rac1                                         
ora.orcl.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac1  

好了,重装节点成功加入集群

以上是关于主机os重装节点加回RAC集群的主要内容,如果未能解决你的问题,请参考以下文章

如何从rabbitmq集群中剔除某个节点以及如何将该节点加回集群

oracle 11g rac一节点操作系统重新安装后,重新加入到集群中,需要安装grid和oracle软件吗?

oracle rac中一个节点坏了怎么重装与恢复(目前数据库在另一节点中正常运行)?谢谢!

1.5小时!一键部署Oracle 11GR2 RAC 集群

1.5小时!一键部署Oracle 11GR2 RAC 集群

rac集群归档磁盘组原理