无法引导 Galera 集群
Posted
技术标签:
【中文标题】无法引导 Galera 集群【英文标题】:Failed to bootstrap Galera Cluster 【发布时间】:2014-08-08 11:55:46 【问题描述】:我正在尝试根据本教程在两台服务器之间设置多主复制:http://tecadmin.net/setup-mariadb-galera-cluster-5-5-in-centos-rhel/
我的 /etc/my.cnf.d/server.cnf 在第一台服务器上:
[mariadb]
query_cache_size=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://XXX.XXX.XXX.9
wsrep_cluster_name='cluster1'
wsrep_node_address='XXX.XXX.XXX.10'
wsrep_node_name='db10'
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst_user:wsrep_sst_pass
在第二台服务器上类似:
[mariadb]
query_cache_size=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://XXX.XXX.XXX.10
wsrep_cluster_name='cluster1'
wsrep_node_address='XXX.XXX.XXX.9'
wsrep_node_name='db9'
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst_user:wsrep_sst_pass
在两台服务器上都有带有“grant all”的mysql用户wsrep_sst_user。
在第一台服务器上以 root 身份执行后:
# service mysql bootstrap
我在 /var/lib/mysql/HOST.err 中获取日志
140618 10:53:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140618 10:53:23 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.qJO4Ec' --pid-file='/var/lib/mysql/HOST-recover.pid'
140618 10:53:25 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
140618 10:53:25 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
140618 10:53:25 [Note] WSREP: Read nil XID from storage engines, skipping position init
140618 10:53:25 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
140618 10:53:25 [Note] WSREP: wsrep_load(): Galera 25.3.5(r178) by Codership Oy <info@codership.com> loaded successfully.
140618 10:53:25 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
140618 10:53:25 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
140618 10:53:25 [Note] WSREP: Passing config to GCS: base_host = XXX.XXX.XXX.10; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; proton
140618 10:53:25 [Note] WSREP: Service thread queue flushed.
140618 10:53:25 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
140618 10:53:25 [Note] WSREP: wsrep_sst_grab()
140618 10:53:25 [Note] WSREP: Start replication
140618 10:53:25 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
140618 10:53:25 [Note] WSREP: protonet asio version 0
140618 10:53:25 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140618 10:53:25 [Note] WSREP: backend: asio
140618 10:53:25 [Note] WSREP: GMCast version 0
140618 10:53:25 [Note] WSREP: (0245da72-f6c6-11e3-ab34-cae23d9ce0ea, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140618 10:53:25 [Note] WSREP: (0245da72-f6c6-11e3-ab34-cae23d9ce0ea, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140618 10:53:25 [Note] WSREP: EVS version 0
140618 10:53:25 [Note] WSREP: PC version 0
140618 10:53:25 [Note] WSREP: gcomm: bootstrapping new group 'cluster1'
140618 10:53:25 [ERROR] WSREP: Permission denied
140618 10:53:25 [ERROR] WSREP: failed to open gcomm backend connection: 13: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Permission denied': 13 (Permission denied)
at gcomm/src/asio_tcp.cpp:listen():814
140618 10:53:25 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -13 (Permission denied)
140618 10:53:25 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'cluster1' at 'gcomm://XXX.XXX.XXX.9': -13 (Permission denied)
140618 10:53:25 [ERROR] WSREP: gcs connect failed: Permission denied
140618 10:53:25 [ERROR] WSREP: wsrep::connect() failed: 7
140618 10:53:25 [ERROR] Aborting
140618 10:53:25 [Note] WSREP: Service disconnected.
140618 10:53:26 [Note] WSREP: Some threads may fail to exit.
140618 10:53:26 [Note] /usr/sbin/mysqld: Shutdown complete
服务器版本:
# mysqld --version
mysqld Ver 5.5.37-MariaDB-wsrep for Linux on x86_64 (MariaDB Server, wsrep_25.10.r3980)
【问题讨论】:
【参考方案1】:好的,问题出在 SELinux 上,如下所述:http://galeracluster.com/documentation-webpages/selinux.html,我不得不禁用它。
【讨论】:
该链接中的信息似乎已消失 - 您还记得出了什么问题吗? 可能是这个页面,与主配置文档分开:galeracluster.com/documentation-webpages/selinux.html 这两个链接都指向未找到的页面。【参考方案2】:我找到了解决这个问题的另一种方法。我已经从 Ubuntu 14.04 LTS 更新到 Ubuntu 14.10
这发生在所有服务器上!
最终解决方案(经过数小时的搜索)是删除集群配置文件中的 " 和 '。
例如。之前
wsrep_cluster_address="gcomm://10.0.0.4,10.0.0.5"
之后
wsrep_cluster_address=gcomm://10.0.0.4,10.0.0.5
错误消失了!
【讨论】:
以上是关于无法引导 Galera 集群的主要内容,如果未能解决你的问题,请参考以下文章