Oracle 启动后一会儿就挂掉故障处理—ORA-600 17182----惜分飞

Posted 惜分飞

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle 启动后一会儿就挂掉故障处理—ORA-600 17182----惜分飞相关的知识,希望对你有一定的参考价值。

一例正常运行的数据库突然节点不停重启(因为是rac,启动一会儿就crash,然后又被crs给启动起来,然后有crash,依次循环),告警日志类似:

Fri Mar 24 13:36:07 2023

QMNC started with pid=124, OS id=188397

ARC3: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

Completed: ALTER DATABASE OPEN

Fri Mar 24 13:36:08 2023

minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:188028 status:0x7

minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000

Fri Mar 24 13:36:08 2023

Starting background process CJQ0

Fri Mar 24 13:36:08 2023

CJQ0 started with pid=144, OS id=188451

Fri Mar 24 13:36:09 2023

Redo thread 2 internally disabled at seq44406 (CKPT)

Archived Log entry 135343 added forthread 2 sequence 44405 ID 0xcd7086e0 dest 1:

ARC0: Archiving disabled thread 2 sequence 44406

Archived Log entry 135344 added forthread 2 sequence 44406 ID 0xcd7086e0 dest 1:

Thread 1 advanced to log sequence 40030 (LGWR switch)

Current log# 2 seq# 40030 mem# 0: +DATA/xff/onlinelog/group_2.310.1087136761

Archived Log entry 135345 added forthread 1 sequence 40029 ID 0xcd7086e0 dest 1:

Fri Mar 24 13:36:30 2023

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p200_188856.trc (incident=1082418):

ORA-00600: internal error code, arguments: [17182], [0x7F4D2A13DBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1082418/xff1_p200_188856_i1082418.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Fri Mar 24 13:36:30 2023

Dumping diagnostic data indirectory=[cdmp_20230324133630], requested by (instance=1, osid=188856 (P200)), summary=[incident=1082418].

Fri Mar 24 13:36:54 2023

Decreasing number of real timeLMS from 6 to 0

Fri Mar 24 13:36:54 2023

Block recovery from logseq 40030, block 259 to scn 17199959182

Recovery of Online Redo Log: Thread 1 Group 2 Seq 40030 Reading mem 0

Mem# 0: +DATA/xff/onlinelog/group_2.310.1087136761

Block recovery stopped at EOT rba 40030.317.16

Block recovery completed at rba 40030.317.16, scn 4.20089998

Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E8579, kghrst()+1835] [flags: 0x0, count: 1]

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p200_188856.trc (incident=1082419):

ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97E8579] [SI_KERNEL(general_protection)] []

ORA-00600: internal error code, arguments: [17182], [0x7F4D2A13DBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1082419/xff1_p200_188856_i1082419.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p200_188856.trc (incident=1082420):

ORA-00600: internal error code, arguments: [17147], [0x7F4D2A13DBD0], [], [], [], [], [], [], [], [], [], []

ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97E8579] [SI_KERNEL(general_protection)] []

ORA-00600: internal error code, arguments: [17182], [0x7F4D2A13DBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1082420/xff1_p200_188856_i1082420.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p200_188856.trc (incident=1082421):

ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x7F4D2A13DBE8], [], [], [], [], [], [], [], [], [], []

ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97E8579] [SI_KERNEL(general_protection)] []

ORA-00600: internal error code, arguments: [17182], [0x7F4D2A13DBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1082421/xff1_p200_188856_i1082421.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Fri Mar 24 13:36:56 2023

Dumping diagnostic data indirectory=[cdmp_20230324133656], requested by (instance=1, osid=188856 (P200)), summary=[incident=1082420].

SMON: Restarting fast_start parallel rollback

Fri Mar 24 13:37:12 2023

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p000_188229.trc (incident=1080530):

ORA-00600: internal error code, arguments: [17182], [0x7F3AB22ADBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1080530/xff1_p000_188229_i1080530.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Fri Mar 24 13:37:12 2023

Dumping diagnostic data indirectory=[cdmp_20230324133712], requested by (instance=1, osid=188229 (P000)), summary=[incident=1080530].

Fri Mar 24 13:37:24 2023

Block recovery from logseq 40030, block 259 to scn 17199959182

Recovery of Online Redo Log: Thread 1 Group 2 Seq 40030 Reading mem 0

Mem# 0: +DATA/xff/onlinelog/group_2.310.1087136761

Block recovery completed at rba 40030.317.16, scn 4.20089999

Fri Mar 24 13:37:37 2023

Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E8579, kghrst()+1835] [flags: 0x0, count: 1]

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_p000_188229.trc (incident=1080531):

ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97E8579] [SI_KERNEL(general_protection)] []

ORA-00600: internal error code, arguments: [17182], [0x7F3AB22ADBF8], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1080531/xff1_p000_188229_i1080531.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Fri Mar 24 13:37:37 2023

Dumping diagnostic data indirectory=[cdmp_20230324133737], requested by (instance=1, osid=188229 (P000)), summary=[incident=1080531].

Fri Mar 24 13:38:16 2023

SMON: slave died unexpectedly, downgrading to serial recovery

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_smon_188020.trc (incident=1080418):

ORA-00600: internal error code, arguments: [17182], [0x7F9184B445C0], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1080418/xff1_smon_188020_i1080418.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Block recovery from logseq 40030, block 259 to scn 17199959182

Recovery of Online Redo Log: Thread 1 Group 2 Seq 40030 Reading mem 0

Mem# 0: +DATA/xff/onlinelog/group_2.310.1087136761

Block recovery completed at rba 40030.317.16, scn 4.20089999

ORACLE Instance xff1 (pid = 56) - Error 600 encountered whilerecovering transaction (10, 26) on object 242112.

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_smon_188020.trc:

ORA-00600: internal error code, arguments: [17182], [0x7F9184B445C0], [], [], [], [], [], [], [], [], [], []

Fri Mar 24 13:38:17 2023

Dumping diagnostic data indirectory=[cdmp_20230324133817], requested by (instance=1, osid=188020 (SMON)), summary=[incident=1080418].

Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E8579, kghrst()+1835] [flags: 0x0, count: 1]

Errors infile/oracle/database/diag/rdbms/xff/xff1/trace/xff1_smon_188020.trc (incident=1080419):

ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97E8579] [SI_KERNEL(general_protection)] []

ORA-00600: internal error code, arguments: [17182], [0x7F9184B445C0], [], [], [], [], [], [], [], [], [], []

Incident details in: /oracle/database/diag/rdbms/xff/xff1/incident/incdir_1080419/xff1_smon_188020_i1080419.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support forerror and packaging details.

Fri Mar 24 13:38:20 2023

PMON (ospid: 187888): terminating the instance due to error 474

System state dump requested by (instance=1, osid=187888 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file/oracle/database/diag/rdbms/xff/xff1/trace/xff1_diag_187902_20230324133820.trc

Fri Mar 24 13:38:21 2023

ORA-1092 : opitsk aborting process

Dumping diagnostic data indirectory=[cdmp_20230324133820], requested by (instance=1, osid=187888 (PMON)), summary=[abnormal instance termination].

Instance terminated by PMON, pid = 187888

这类的故障在多年前处理过几次

ORA-600 17182导致oracle异常

ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理

这个故障的原因是由于block逻辑损坏,实例无法正常做回滚恢复,从而异常.处理异常回滚问题,就可以规避掉数据库启动后一会儿就crash问题.

Oracle异常ORA-00210,ORA-00202故障处理

早上刚上班,跟往常一样,支起电脑支架,打开电脑,有种战争片里边,回归阵地,架起机关枪,准备迎敌的仪式感。

一切都那么熟悉,在启动测试环境的时候,开发的声音打破了办公室的清静:“公司XXX系统的数据库是不是挂了!!!”

听完习惯性的一身冷汗!囧!!!

赶紧打开相关工具,连上服务器确认情况:

[[email protected] ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Jul 13 09:11:29 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select open_mode from v$database;  
select open_mode from v$database
*
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file:
‘/home/oracle/u01/app/oracle/fast_recovery_area/orcl/control02.ctl‘
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

果然数据库状态不正常,提示不能打开控制文件,找不到相关文件路径。

DBA就是这样,平常系统正常的时候,大家往往会忽略你的存在,但是一旦系统出现问题,又像是充当着救火队员!!!

回归正题,正常的处理流程,我们接下来需要去查看相关的告警日志,找出异常信息:

先查看alter日志:
一般告警日志的存放路径可以通过如下命令查询:

SQL> show parameter background_dump_dest

pwd
/home/oracle/u01/app/oracle/diag/rdbms/orcl/orcl/alert   ---注:这里注释的敏感信息,参考的时候,以自己实际环境为准。

<msg time=‘2017-07-12T18:06:52.299+08:00‘ org_id=‘XXXX‘ comp_id=‘XXXX‘
 type=‘UNKNOWN‘ level=‘16‘ host_id=‘xxxxDb‘
 host_addr=‘这里是服务器IP地址‘>
 <txt>    nt OS err code: 0
 </txt>
</msg>
<msg time=‘2017-07-12T18:06:52.299+08:00‘ org_id=‘XXXX‘ comp_id=‘XXXX‘
 type=‘UNKNOWN‘ level=‘16‘ host_id=‘xxxxDb‘
 host_addr=‘这里是服务器IP地址‘>
 <txt>  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=远程客户端的IP地址,例如:192.168.1.111)(PORT=58310))
 </txt>
</msg>
<msg time=‘2017-07-12T18:06:59.902+08:00‘ org_id=‘XXXX‘ comp_id=‘XXXX‘
 client_id=‘‘ type=‘UNKNOWN‘ level=‘16‘
 host_id=‘xxxxDb‘ host_addr=‘这里是服务器IP地址‘ module=‘MMON_SLAVE‘
 pid=‘26750‘>
 <txt>Errors in file /home/oracle/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_26750.trc:
ORA-00210: 无法打开指定的控制文件
ORA-00202: 控制文件: ‘‘/home/oracle/u01/app/oracle/fast_recovery_area/orcl/control02.ctl‘‘
ORA-27041: 无法打开文件
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
 </txt>
</msg>

从上述文件中,可以大概看到是在7月12日18:06的时候出现的“控制文件找不到的相关报错信息”
有趣的是,里边有一个客户端连接Client address 可以看到IP地址,这时候还有点兴奋:终于找到你了,是不是有人在恶意攻击,然后一查该IP地址,居然是公司的IP,
尴尬了,自己人啊,于是问开发,昨天下午6点的时候,有没有人对数据库进行过操作,得到的回复是,没有进行其他操作,只是进行了简单的查询和更新操作!!!

哎,不说了!!! 心想还是,抓紧处理故障吧!

再根据日志中提到的信息,查看trc文件:

/home/oracle/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_22241.trc:

ORA-00210: 无法打开指定的控制文件
ORA-00202: 控制文件: ‘‘/home/oracle/u01/app/oracle/fast_recovery_area/orcl/control02.ctl‘‘
ORA-27041: 无法打开文件

大概知道问题想象之后,首先需要尽快恢复业务,然后再寻找出现故障的原因!
根据日志报错信息,可以确定是控制文件的问题,那么,我们先查看一下数据库里边的控制文件有哪些:

SQL> show parameter control_file

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time	     integer	 7
control_files			     string	 /home/oracle/u01/app/oracle/oradata/orcl/control01.ctl, 
                                     /home/oracle/u01/app/oracle/fast_recovery_area/orcl/control02.ctl

发现一个问题,之前没有记错的话,数据库的两个控制文件都是存放在相同的目录下,即:...oradata/orcl下,怎么第二个控制文件放在了...fast_recovery_area/orcl目录下?
奇怪了,该数据库之前没有开启闪回,为什么这个地方的控制文件存放路径发生了变化?

接着处理,确认一下当前环境是否开启闪回,

SQL> select flashback_on from v$database;
select flashback_on from v$database
                         *
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file:
‘/home/oracle/u01/app/oracle/fast_recovery_area/orcl/control02.ctl‘
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

完了,还查看不了!!!
                                
试着查看一下闪回相关的参数呢?










以上是关于Oracle 启动后一会儿就挂掉故障处理—ORA-600 17182----惜分飞的主要内容,如果未能解决你的问题,请参考以下文章

Oracle 数据库启动报ORA-00600 [kkdlcob-objn-exists]的故障处理

Oracle异常ORA-00210,ORA-00202故障处理

ORACLE 12C ORA-00312 redo故障处理

Oracle数据库故障处理方法

Oracle故障处理:Sqlplus / as sysdba 报错Ora-12560

Oracle故障处理:Ora-10873:file * needs to be either taken out of backup or media recovered