glusterfs引起的split brain问题
Posted 翟海飞
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了glusterfs引起的split brain问题相关的知识,希望对你有一定的参考价值。
1 问题描述
在openstack环境中的一个vm无故停止了,使用界面和命令都启动不了,查看日志:
2018-04-02 03:22:53.463 14655 INFO nova.virt.libvirt.driver [-] [instance: 0498a9d1-3958-4972-943e-fe6a44490192] Instance destroyed successfully.
2018-04-02 03:22:54.285 INFO nova.virt.libvirt.imagebackend [req-694c00cb-56f5-47b2-bd0d-50a7d34edabf 1d94aa6ff73b471b8d39edb1823fca60 a20101a00a7c4943bc8eaae2c6dd792e] base: /var/lib/nova/instances/_base/1cd4a201979d6b4edc01d6920f6015b2d0e19627
2018-04-02 03:22:55.242 ERROR nova.openstack.common.rpc.amqp [req-694c00cb-56f5-47b2-bd0d-50a7d34edabf 1d94aa6ff73b471b8d39edb1823fca60 a20101a00a7c4943bc8eaae2c6dd792e] Exception during message handling
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 117, in wrapped
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 94, in wrapped
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 209, in decorated_function
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp pass
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 195, in decorated_function
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 260, in decorated_function
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 237, in decorated_function
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 224, in decorated_function
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1454, in start_instance
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp block_device_info)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1429, in power_on
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp self._hard_reboot(context, instance, network_info, block_device_info)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1390, in _hard_reboot
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp disk_info_json)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3319, in _create_images_and_backing
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp size=info['virt_disk_size'])
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 159, in cache
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp *args, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 261, in create_image
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp prepare_template(target=base, *args, **kwargs)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 221, in inner
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp with lock:
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 83, in __enter__
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp self.trylock()
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 118, in trylock
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp fcntl.lockf(self.lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
2018-04-02 03:22:55.242 14655 TRACE nova.openstack.common.rpc.amqp IOError: [Errno 2] No such file or directory
日志显示没有文件,但是我查看/var/lib/nova/instance目录后,确认文件存在。考虑可能是glusterfs的问题。
查看glusterfs的日志:
[2018-04-02 01:48:49.746336] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
[2018-04-02 01:48:49.747353] W [client-rpc-fops.c:1822:client3_3_fxattrop_cbk] 1-dv-grizzly-ins-client-1: remote operation failed: Success
[2018-04-02 01:48:49.747382] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
[2018-04-02 01:48:50.011437] W [client-rpc-fops.c:1822:client3_3_fxattrop_cbk] 1-dv-grizzly-ins-client-1: remote operation failed: Success
[2018-04-02 01:48:50.011486] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
[2018-04-02 01:48:50.011688] W [client-rpc-fops.c:1822:client3_3_fxattrop_cbk] 1-dv-grizzly-ins-client-1: remote operation failed: Success
[2018-04-02 01:48:50.011753] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
[2018-04-02 01:48:50.012787] W [client-rpc-fops.c:1822:client3_3_fxattrop_cbk] 1-dv-grizzly-ins-client-1: remote operation failed: Success
[2018-04-02 01:48:50.012819] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
[2018-04-02 01:48:50.147697] W [client-rpc-fops.c:1822:client3_3_fxattrop_cbk] 1-dv-grizzly-ins-client-1: remote operation failed: Success
[2018-04-02 01:48:50.147790] E [afr-transaction.c:820:afr_changelog_pre_op_cbk] 1-dv-grizzly-ins-replicate-0: xattrop failed on child dv-grizzly-ins-client-1: Success
产生了几十G的日志文件
glusterfs集群是70和78两台服务器组成。使用命令gluster volume heal gluster_vol info 查看发现已经脑裂。
2 方法
参考:http://blog.51cto.com/nosmoking/1718157
一、现象
1、ovirt在升级glusterfs的版本时,测试未停止ovirt和gluster服务,直接升级gluster软件包然后重启gluster服务,结果发现其中一个vm的数据盘出现异常,无法启动
2、查看glustefs日志,可能是存在脑裂
3、原因是因为作为测试用途,仅用2台物理机做的glustefs做为ovirt的数据域,且关闭了 quorum 相关选项,这种场景容易导致脑裂,ovirt在增加数据域的时候有明显的提示。
二、处理过程
1、查看gluster的状态
--------- split-brain的报告 ---------
gluster> volume heal ovirt-data info
Brick n33.test:/data/ovirt/data/
/09cb8372-a68d-47dc-962e-70b5225be6bc/images/1d04adc8-9921-4e1c-b192-84fba64224db/bd3c9ecf-8100-42db-9db5-038dd95a4d54 - Is in split-brain
Number of entries: 1
Brick n34.test:/data/ovirt/data/
/09cb8372-a68d-47dc-962e-70b5225be6bc/images/1d04adc8-9921-4e1c-b192-84fba64224db/bd3c9ecf-8100-42db-9db5-038dd95a4d54 - Is in split-brain
Number of entries: 1
2、数据对比
--------- 异常的状态对比 ---------
[root@n33 1d04adc8-9921-4e1c-b192-84fba64224db]# getfattr -d -m . -e hex bd3c9ecf-8100-42db-9db5-038dd95a4d54
# file: bd3c9ecf-8100-42db-9db5-038dd95a4d54
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x0000000a0000000000000000
trusted.gfid=0xaa32a1b65bf14655aafbeb0f242e61e0
[root@n34 1d04adc8-9921-4e1c-b192-84fba64224db]# getfattr -d -m . -e hex bd3c9ecf-8100-42db-9db5-038dd95a4d54
# file: bd3c9ecf-8100-42db-9db5-038dd95a4d54
trusted.afr.ovirt-data-client-0=0x0000000f0000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xaa32a1b65bf14655aafbeb0f242e61e0
--------- 随便找一个正常的数据文件的状态做对比 ---------
[root@n33 560a7b8c-fbfe-4495-b315-e46593b4523e]# getfattr -d -m . -e hex 52a0c2aa-778a-4964-98c6-03cdb591cfa8
# file: 52a0c2aa-778a-4964-98c6-03cdb591cfa8
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xa36f8c0d2e014dc1ba19600aa62a20ec
[root@n34 560a7b8c-fbfe-4495-b315-e46593b4523e]# getfattr -d -m . -e hex 52a0c2aa-778a-4964-98c6-03cdb591cfa8
# file: 52a0c2aa-778a-4964-98c6-03cdb591cfa8
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xa36f8c0d2e014dc1ba19600aa62a20ec
--------- 上面是正常的现象,2个节点的状态一致 ---------
3、解释
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x0000000a0000000000000000
对比
trusted.afr.ovirt-data-client-0=0x0000000f0000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
说明:
0x 0x0000000a 00000000 00000000
| | |
| | \\_ changelog of directory entries
| \\_ changelog of metadata
\\ _ changelog of data
结论:client-0认为client1的数据有问题,反之亦然
4、处理
[root@n33 1d04adc8-9921-4e1c-b192-84fba64224db]# stat bd3c9ecf-8100-42db-9db5-038dd95a4d54
File: `bd3c9ecf-8100-42db-9db5-038dd95a4d54'
Size: 85899345920 Blocks: 27881072 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 2147485390 Links: 2
Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm)
Access: 2015-11-27 19:02:02.947525264 +0800
Modify: 2015-11-27 19:02:14.020777092 +0800
Change: 2015-11-30 09:25:41.425588695 +0800
[root@n34 1d04adc8-9921-4e1c-b192-84fba64224db]# stat bd3c9ecf-8100-42db-9db5-038dd95a4d54
File: `bd3c9ecf-8100-42db-9db5-038dd95a4d54'
Size: 85899345920 Blocks: 27881896 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 1974 Links: 2
Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm)
Access: 2015-11-27 18:59:45.987653357 +0800
Modify: 2015-11-27 18:59:57.051164636 +0800
Change: 2015-11-30 09:22:22.977259770 +0800
根据文件的状态内容来选择1个可靠的副本,丢弃另一个。
ok,我先打算保留时间更新的一个副本,也就是n33上的,则我们打算丢弃n34上的副本
[root@n34 1d04adc8-9921-4e1c-b192-84fba64224db]# setfattr -n trusted.afr.ovirt-data-client-0 -v 0x000000000000000100000000 bd3c9ecf-8100-42db-9db5-038dd95a4d54
[root@n34 1d04adc8-9921-4e1c-b192-84fba64224db]# getfattr -d -m . -e hex bd3c9ecf-8100-42db-9db5-038dd95a4d54
# file: bd3c9ecf-8100-42db-9db5-038dd95a4d54
trusted.afr.ovirt-data-client-0=0x000000000000000100000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xaa32a1b65bf14655aafbeb0f242e61e0
[root@n33 1d04adc8-9921-4e1c-b192-84fba64224db]# gluster
gluster> volume heal ovirt-data info
Brick n33.test:/data/ovirt/data/
/09cb8372-a68d-47dc-962e-70b5225be6bc/images/1d04adc8-9921-4e1c-b192-84fba64224db/bd3c9ecf-8100-42db-9db5-038dd95a4d54 - Possibly undergoing heal
Number of entries: 1
Brick n34.test:/data/ovirt/data/
/09cb8372-a68d-47dc-962e-70b5225be6bc/images/1d04adc8-9921-4e1c-b192-84fba64224db/bd3c9ecf-8100-42db-9db5-038dd95a4d54 - Possibly undergoing heal
Number of entries: 1
大概5分钟后:
gluster> volume heal ovirt-data info
Brick n33.test:/data/ovirt/data/
/09cb8372-a68d-47dc-962e-70b5225be6bc/images/1d04adc8-9921-4e1c-b192-84fba64224db/bd3c9ecf-8100-42db-9db5-038dd95a4d54 - Possibly undergoing heal
Number of entries: 1
Brick n34.test:/data/ovirt/data/
Number of entries: 0
大概半小时后:
gluster> volume heal ovirt-data info
Brick n33.test:/data/ovirt/data/
Number of entries: 0
Brick n34.test:/data/ovirt/data/
Number of entries: 0
再次对比:
[root@n33 1d04adc8-9921-4e1c-b192-84fba64224db]# getfattr -d -m . -e hex bd3c9ecf-8100-42db-9db5-038dd95a4d54
# file: bd3c9ecf-8100-42db-9db5-038dd95a4d54
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xaa32a1b65bf14655aafbeb0f242e61e0
[root@n34 1d04adc8-9921-4e1c-b192-84fba64224db]# getfattr -d -m . -e hex bd3c9ecf-8100-42db-9db5-038dd95a4d54
# file: bd3c9ecf-8100-42db-9db5-038dd95a4d54
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ovirt-data-client-0=0x000000000000000000000000
trusted.afr.ovirt-data-client-1=0x000000000000000000000000
trusted.gfid=0xaa32a1b65bf14655aafbeb0f242e61e0
三、验证
启动ovirt中对应的vm,系统是windows2008r2,可以启动,有提示异常关闭,类似突然断电,在系统中运行的服务无明显异常。
符合预期。
四、小结
建议开启 server 和 client 的 quorum 相关选项来预防脑裂,详情请参考后附redhat的文档链接。
参考
1、Managing Split-brainhttps://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html
2 https://blog.csdn.net/lblxyh/article/details/40894635
3 https://joejulian.name/post/what-is-this-new-glusterfs-directory-in-33/
以上是关于glusterfs引起的split brain问题的主要内容,如果未能解决你的问题,请参考以下文章
ZooKeeper 03 - ZooKeeper集群的脑裂问题 (Split Brain问题)
Windows Server 2016 DNS Policy Split-Brain 3
关于Elasticsearch集群脑裂brain-split的预防与解决
China Brain Project: Basic Neuroscience, Brain Diseases, and Brain-Inspired Computing