NFS - LSF执行机跑VCS时报错No locks available

Posted 王万林 Ben

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了NFS - LSF执行机跑VCS时报错No locks available相关的知识,希望对你有一定的参考价值。

NFS - LSF执行机跑VCS时报错No locks available

问题描述

如图所示,将vcs编译脚本提交到LSF执行,报错No locks available并且退出。

但是该脚本在本机执行时work的,如图所示,

问题排查

根据提示,是文件锁相关的报错。由于任务文件在NFS上,我们对比两台机器上NFS的挂载选项。

本机的挂载选项(能跑)

LSF执行机的挂载选项(不能跑)

经对比,应该是挂载选项差异导致。我们来仔细分析一下nolocklocal_lock两个选项的用途。

lock / nolock  
                      Selects whether to use the NLM sideband protocol to lock files on the server.  If neither option is specified (or if lock is specified), NLM locking is used for this  mount  point.   When  using  the  nolock option, applications can lock files, but such locks provide exclusion only against other applications running on the same client.  Remote applications are not affected by these locks.
                      NLM  locking  must be disabled with the nolock option when using NFS to mount /var because /var contains files used by the NLM implementation on Linux.  Using the nolock option is also required when mounting exports on NFS servers that do not support the NLM protocol.

由上可知,在挂载支持NLM协议的NFS服务器时(比如NFS v3),指明lock或者两者都不指明,则默认是lock的;当指明nolock,则是不在服务端维护lock的,只在本机维护lock。而在NFS v4则不支持这两个选项,因为NFS v4不支持NLM协议,它是有状态的,锁机制是内置的。

local_lock=mechanism
                      Specifies whether to use local locking for any or both of the flock and the POSIX locking mechanisms.  mechanism can be one of all, flock, posix, or none.  This option is  supported  in  kernels  2.6.37  and later.

                      The  Linux  NFS  client  provides  a  way to make locks local. This means, the applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks.

                      If this option is not specified, or if none is specified, the client assumes that the locks are not local.

                      If all is specified, the client assumes that both flock and POSIX locks are local.

                      If flock is specified, the client assumes that only flock locks are local and uses NLM sideband protocol to lock files when POSIX locks are used.

                      If posix is specified, the client assumes that POSIX locks are local and uses NLM sideband protocol to lock files when flock locks are used.

                      To support legacy flock behavior similar to that of NFS clients < 2.6.12, use 'local_lock=flock'. This option is required when exporting NFS mounts via Samba as Samba maps Windows share mode locks as  flock. Since NFS clients > 2.6.12 implement flock by emulating POSIX locks, this will result in conflicting locks.

                      NOTE: When used together, the 'local_lock' mount option will be overridden by 'nolock'/'lock' mount option.

上述的组合比较多,不过也容易理解。

问题分析

两台机器都是以NFS v3挂载,

  • 本机的nolock,local_lock=all组合:不在NLM维护lock,只在本机维护lock;
  • LSF执行机的lock,local_lock=none组合:在NLM维护lock。

于是,上述的报错很显然,是该文件已经被其它进程锁住了(本机或其它机器),本进程无法锁住该文件而报错退出。我判断这是工具的机制,假设该文件已经被锁住,则不继续往下跑,以免发生数据不一致的问题。只有获取到lock了才往下跑。
当以nolock,local_lock=all挂载时,由于不在NLM上维护lock,进程无(尝试)获取锁的过程,直接使用该文件,因此能跑下去

问题解决

可以咨询厂商支持人员,看该文件是否需要在服务端维护锁?该工具是否有option可以选择要/不要锁?一般情况下,

  • 固定的配置文件,一般不需要;
  • 而在运行过程中会被修改的,多个运行实例对该文件的使用可能发生竞争,这种需要。

请按照实际情况,与厂商工作人员协作解决该问题。

总结

文件锁与记录锁,是文件系统的重要特性。

相关资料

http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-cmpr-970%2Fstatistics__nfs__show-nlm.html //netapp查看NLM数据
https://wiki.wireshark.org/Network_Lock_Manager //NLM介绍
https://unix.stackexchange.com/a/430591/287317 //NFS v4挂载时指明nolock也是无效的,因为NFS v4不支持NLM

以上是关于NFS - LSF执行机跑VCS时报错No locks available的主要内容,如果未能解决你的问题,请参考以下文章

iptables执行时报错"iptables : Couldn't load target `standard':No such file or directory"

Hadoop格式化时报错,已解决!

启动nfs服务时报错 systemd: rpcbind.socket failed to listen on sockets: Address family not supported by pro

python 脚本运行时报错: AttributeError: 'module' object has no attribute ***

mysql执行SQL语句时报错:[Err] 3 - Error writing file '/tmp/MYP0G1B8' (Errcode: 28 - No space left on

python中导入from appium import webdriver时报错:ModuleNotFoundError: No module named 'appium'