ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Applica
Posted chendian0
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Applica相关的知识,希望对你有一定的参考价值。
数据库实例自动crash并报ORA-27157、ORA-27300等错误
一、文档:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances
as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)
In this Document
Description Occurrence Symptoms Workaround Patches History References
APPLIES TO: Oracle Database Backup Service - Version N/A and later Oracle Database Cloud Exadata Service - Version N/A and later Oracle Database Cloud Service - Version N/A and later Oracle Database - Standard Edition Oracle Database - Enterprise Edition Linux x86-64 Linux x86
DESCRIPTION On Redhat 7.2, systemd-logind service introduced a new feature to remove all IPC objects when a user fully logs out. The feature is controled by the option RemoveIPC in the /etc/systemd/logind.conf configuration file, see man logind.conf(5) for details. The default value for RemoveIPC in RHEL7.2 is yes. As a result, when the last oracle or grid user disconnects, the OS removes shared memory segments and semaphores for those users. As Oracle ASM and Databases use shared memory segments for SGA, removing shared memory segments will crash the Oracle ASM and database instances. Please refer to the Redhat bug 1264533 - https://bugzilla.redhat.com/show_bug.cgi?id=1264533
OCCURRENCE The problem affects all applications including Oracle Databases that use the shared memory segments and semaphores;
thus, both, Oracle ASM and database instances are affected. Oracle Linux 7.2 avoids this problem by setting RemoveIPC to no explicitly on /etc/systemd/logind.conf configuration file, but if /etc/systemd/logind.conf is touched or modified before the upgrade started,
the yum/update will write the correct/new configuration file (with RemoveIPC=no) as logind.conf.rpmnew, and if user retains their original configuration file, then most likely the failures described in this note will occur. To avoid this problem, after the upgrade be sure to edit the logind.conf and set RemoveIPC=no.
This is documented in the Oracle Linux 7.2 release notes.
SYMPTOMS 1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation. 2) Upgrading to 11.2 and 12c GI/CRS fails. 3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash. The removal of the IPC objects by systemd-logind may happen at any time, as such the failure patterns can vary greatly,
here are some examples of how failures may look like: Most common error that occurs is that the following is found in the asm or database alert.log: ORA-27157: OS post/wait facility removed ORA-27300: OS system dependent operation:semop failed with status: 43 ORA-27301: OS failure message: Identifier removed ORA-27302: failure occurred at: sskgpwwait1 The second observed error occurs during installation and upgrade when asmca fails with the following error: KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin KFOD-00105: Could not open pfile ‘[email protected]‘ The third observed error occurred during installation and upgrade: Creation of ASM password file failed. Following error occurred: Error in Process: $GRID_HOME/bin/orapwd Enter password for SYS: OPW-00009: Could not establish connection to Automatic Storage Management instance 2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed 2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed: Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]
WORKAROUND 1) Set RemoveIPC=no in /etc/systemd/logind.conf 2) Reboot the server or restart systemd-logind as follows: # systemctl daemon-reload # systemctl restart systemd-logind
PATCHES Migrating to Oracle Linux 7.2 from Redhat 7.2 resolves this problem. If migrating to Oracle Linux 7.2 is not possible, please use the above workaround by setting RemoveIPC=no in /etc/systemd/logind.conf HISTORY 23-Nov-2015 The alert is created
=======================================================================================================================
二、
其一、故障记录与回顾:
一个客户找到我说是数据库起不来了,即是起来了也会自动关闭,经查看 alert 日志发现 报错信息 基本如下所示:
Most common error that occurs is that the following is found in the asm or database alert.log:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
其二、然后再继续重启就报 600 了,这里主要是因为 共享内存段发生了冲突。见下面的 文档解释
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup
ORA-00600: internal error code, arguments: SKGMHASH, [1], 18446744072635731584, [0], 0], [, ], [, ], [, ], [
让其修改了操作系统参数
Set RemoveIPC=no in /etc/systemd/logind.conf
然后进行重启操作系统解决了。
===============================================================================================================================
三、关于如上 600 错误的 解释:
ORA-00600 [SKGMHASH] Starting up an ASM/RDBMS Instance (Doc ID 756713.1)
In this Document
Symptoms |
Changes |
Cause |
Solution |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.1.0.6 and later
Linux x86
Generic Linux
Linux x86-64
***Checked for relevance on 25-Dec-2016***
SYMPTOMS
During the ASM/RDBMS instance startup, you receive the error:
Connected to an idle instance.
SQL> startup
ORA-00600: internal error code, arguments: SKGMHASH, [1], 18446744072635731584, [0], 0], [, ], [, ], [, ], [
The call stack trace is similar to:
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- PGOSF52_ksfdmp <- dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE
<- 1774 <- dbkePostKGE_kgsf <- kgeadse <- kgerinv_internal <- kgerinv
<- kgesinv <- kgesinw <- skgmlocate <- skgmcrone <- skgmcrmany
<- skgmcreate <- ksmcrealm <- ksmcsg <- opistr_real <- opistr
<- opiodr <- ttcpip <- opitsk <- opiino <- opiodr
<- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main
<- libc_start_main <- start
From the strace of the server process which is doing the startup:
shmget(IPC_PRIVATE, 4194304, 0660) = 2686992
shmat(2686992, 0, 0x8000 /* SHM_??? */) = -1 EACCES (Permission denied)
shmctl(2686992, IPC_RMID, 0) = 0
shmget(0xbffece80, 4096, IPC_CREAT|IPC_EXCL|0660) = 2719760
shmat(2719760, 0, 0) = ?
open("/dev/shm/ora_+ASM1_2719760_0", O_RDWR|O_CREAT|O_SYNC, 0660) = -1 EACCES (Permission denied)
shmdt(0x2a96978000) = 0
shmctl(2719760, IPC_RMID, 0) = 0
shmget(0xbffece81, 4096, IPC_CREAT|IPC_EXCL|0660) = 2752528
shmat(2752528, 0, 0) = ?
open("/dev/shm/ora_+ASM1_2752528_0", O_RDWR|O_CREAT|O_SYNC, 0660) = -1 EACCES (Permission denied)
shmdt(0x2a96978000) = 0
shmctl(2752528, IPC_RMID, 0) = 0
shmget(0xbffece82, 4096, IPC_CREAT|IPC_EXCL|0660) = 2785296
shmat(2785296, 0, 0) = ?
open("/dev/shm/ora_+ASM1_2785296_0", O_RDWR|O_CREAT|O_SYNC, 0660) = -1 EACCES (Permission denied)
shmdt(0x2a96978000) = 0
shmctl(2785296, IPC_RMID, 0) = 0
shmget(0xbffece83, 4096, IPC_CREAT|IPC_EXCL|0660) = 2818064
shmat(2818064, 0, 0) = ?
open("/dev/shm/ora_+ASM1_2818064_0", O_RDWR|O_CREAT|O_SYNC, 0660) = -1 EACCES (Permission denied)
shmdt(0x2a96978000) = 0
(...)
CHANGES
This is a new installation and you are going to configure the ASM and/or the RDBMS instance with dbca.
CAUSE
- Oracle RDBMS kernel does not have the right grant to access the /dev/shm.
If you type mount command you will see /dev/shm as a tempfs file system. Therefore, it is a file system, which keeps all files in virtual memory.
- Another cause is a conflicting with existing memory segment
SOLUTION
- Please ask your system administrator to help you grant the Oracle software owner the right permissions on "/dev/shm".
- In case the cause is a conflicting with existing memory segment, You should remove that segment,
for example:
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_ora_13099.trc (incident=1):
ORA-00600: internal error code, arguments: [SKGMHASH], [1], [1839154636], [0], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM2/incident/incdir_1/+ASM2_ora_13099_i1.trc
Sweep [inc][1]: completed
Looking in ipcs -m we see:
# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x74027e1b 98304 root 600 4 0
0x6d9f45cc 1212417 oracle 660 4096 0
# echo "obase=16;1839154636" | bc -q
6D9F45CC
Note this is the 3rd argument from the ora-600 and it matches the key of the existing shared segment. To resolve the situation, issue:
ipcrm -M 0x6d9f45cc
===============================================================================================================================
四、如下是一个网友的故障记录:
https://www.cnblogs.com/abclife/p/5859005.html
rhel7.2上安装12C RAC数据库后,其中一个数据库实例经常会自动crash。查看alert日志发现以下错误信息:
Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_j000_21047.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Fri Sep 09 16:50:53 2016
Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_rmv0_20798.trc:
ORA-27157: OS post/wait facility removed
Fri Sep 09 16:50:53 2016
Errors in file /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_q005_21328.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
错误原因描述:
在rhel7.2中,systemd-logind服务引入了一个新特性:在一个user完全退出OS后会remove掉所有的IPC对象。
该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。详细请看man logind.conf(5)。
在rhel7.2中,RemoveIPC的默认值是yes
因此,当最后一个oracle或者grid用户退出时,操作系统会remove掉这个user的shared memory segments和semaphores
而Oracle ASM和database的SGA需要使用 shared memory segments,因此remove shared memory segments将会crash掉Oracle ASM和database instances。
请参考Redhat bug 1264533 - https://bugzilla.redhat.com/show_bug.cgi?id=1264533
这个问题会影响使用shared memory segments和semaphores的所有应用,因此,Oracle ASM 实例和Oracle Database 实例均受到影响。
oel7.2为了避免这个问题,在/etc/systemd/logind.conf配置文件中明确设置RemoveIPC为no。
该问题会导致的现象:
1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.
2) Upgrading to 11.2 and 12c GI/CRS fails.
3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.
systemd-logind 可能会在任何时候 remove IPC 对象,发生错误的时候对应的日志现象也不同。比如:
Most common error that occurs is that the following is found in the asm or database alert.log:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
The second observed error occurs during installation and upgrade when asmca fails with the following error:
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘[email protected]‘
The third observed error occurred during installation and upgrade:
Creation of ASM password file failed. Following error occurred: Error in Process: /d12/app/12.1.0/grid/bin/orapwd
Enter password for SYS:
OPW-00009: Could not establish connection to Automatic Storage Management instance
2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM
The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]
修改方法:
1).设置/etc/systemd/logind.conf中RemoveIPC=no
2).重启服务器或者重启systemd-logind
重启systemd-logind:
# systemctl daemon-reload
# systemctl restart systemd-logind
MOS Doc:
ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances
as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)
以上是关于ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Applica的主要内容,如果未能解决你的问题,请参考以下文章
翻译自mos文章设置了RemoveIPC=yes 的RHEL7.2上 会crash掉Oracle asm 实例和Oracle database实例