NFS - LSF执行机跑VCS时报错No locks available
Posted 王万林 Ben
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了NFS - LSF执行机跑VCS时报错No locks available相关的知识,希望对你有一定的参考价值。
NFS - LSF执行机跑VCS时报错No locks available
问题描述
如图所示,将vcs编译脚本提交到LSF执行,报错No locks available
并且退出。
但是该脚本在本机执行时work的,如图所示,
问题排查
根据提示,是文件锁相关的报错。由于任务文件在NFS上,我们对比两台机器上NFS的挂载选项。
本机的挂载选项(能跑)
LSF执行机的挂载选项(不能跑)
经对比,应该是挂载选项差异导致。我们来仔细分析一下nolock
与local_lock
两个选项的用途。
lock / nolock
Selects whether to use the NLM sideband protocol to lock files on the server. If neither option is specified (or if lock is specified), NLM locking is used for this mount point. When using the nolock option, applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks.
NLM locking must be disabled with the nolock option when using NFS to mount /var because /var contains files used by the NLM implementation on Linux. Using the nolock option is also required when mounting exports on NFS servers that do not support the NLM protocol.
由上可知,在挂载支持NLM协议的NFS服务器时(比如NFS v3),指明lock或者两者都不指明,则默认是lock的;当指明nolock,则是不在服务端维护lock的,只在本机维护lock。而在NFS v4则不支持这两个选项,因为NFS v4不支持NLM协议,它是有状态的,锁机制是内置的。
local_lock=mechanism
Specifies whether to use local locking for any or both of the flock and the POSIX locking mechanisms. mechanism can be one of all, flock, posix, or none. This option is supported in kernels 2.6.37 and later.
The Linux NFS client provides a way to make locks local. This means, the applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks.
If this option is not specified, or if none is specified, the client assumes that the locks are not local.
If all is specified, the client assumes that both flock and POSIX locks are local.
If flock is specified, the client assumes that only flock locks are local and uses NLM sideband protocol to lock files when POSIX locks are used.
If posix is specified, the client assumes that POSIX locks are local and uses NLM sideband protocol to lock files when flock locks are used.
To support legacy flock behavior similar to that of NFS clients < 2.6.12, use 'local_lock=flock'. This option is required when exporting NFS mounts via Samba as Samba maps Windows share mode locks as flock. Since NFS clients > 2.6.12 implement flock by emulating POSIX locks, this will result in conflicting locks.
NOTE: When used together, the 'local_lock' mount option will be overridden by 'nolock'/'lock' mount option.
上述的组合比较多,不过也容易理解。
问题分析
两台机器都是以NFS v3挂载,
- 本机的
nolock,local_lock=all
组合:不在NLM维护lock,只在本机维护lock; - LSF执行机的
lock,local_lock=none
组合:在NLM维护lock。
于是,上述的报错很显然,是该文件已经被其它进程锁住了(本机或其它机器),本进程无法锁住该文件而报错退出。我判断这是工具的机制,假设该文件已经被锁住,则不继续往下跑,以免发生数据不一致的问题。只有获取到lock了才往下跑。
当以nolock,local_lock=all
挂载时,由于不在NLM上维护lock,进程无(尝试)获取锁的过程,直接使用该文件,因此能跑下去。
问题解决
可以咨询厂商支持人员,看该文件是否需要在服务端维护锁?该工具是否有option可以选择要/不要锁?一般情况下,
- 固定的配置文件,一般不需要;
- 而在运行过程中会被修改的,多个运行实例对该文件的使用可能发生竞争,这种需要。
请按照实际情况,与厂商工作人员协作解决该问题。
总结
文件锁与记录锁,是文件系统的重要特性。
相关资料
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-cmpr-970%2Fstatistics__nfs__show-nlm.html //netapp查看NLM数据
https://wiki.wireshark.org/Network_Lock_Manager //NLM介绍
https://unix.stackexchange.com/a/430591/287317 //NFS v4挂载时指明nolock也是无效的,因为NFS v4不支持NLM
以上是关于NFS - LSF执行机跑VCS时报错No locks available的主要内容,如果未能解决你的问题,请参考以下文章
iptables执行时报错"iptables : Couldn't load target `standard':No such file or directory"
启动nfs服务时报错 systemd: rpcbind.socket failed to listen on sockets: Address family not supported by pro
python 脚本运行时报错: AttributeError: 'module' object has no attribute ***
mysql执行SQL语句时报错:[Err] 3 - Error writing file '/tmp/MYP0G1B8' (Errcode: 28 - No space left on
python中导入from appium import webdriver时报错:ModuleNotFoundError: No module named 'appium'