Galera MySQL 上的其他节点无法添加

Posted

技术标签:

【中文标题】Galera MySQL 上的其他节点无法添加【英文标题】:Additional Nodes on Galera MySQL failing to add 【发布时间】:2014-04-07 17:21:23 【问题描述】:

好的,所以我有第二个节点,我正在尝试将其作为另一个节点添加到工作中的 galera mysql 服务器...这里配置

节点 A(工作)

[server]
[mysqld]  
[embedded]  
[mysqld-5.5]  
[mariadb]  
binlog_format=ROW  
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=172.16.1.20
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="controller_cluster"
wsrep_cluster_address="gcomm://"
wsrep_sst_receive_addres="172.16.1.20"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst:password
[mariadb-5.5]

节点 B(不会启动)

[server]
[mysqld]
skip-name-resolve
log = /var/log/mysqld.log
log-error = /var/log/mysqld.error.log
[embedded]
[mysqld-5.5]
[mariadb]
log = /var/log/mysqld.log
log-error = /var/log/mysqld.error.log
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=172.16.1.21
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="controller_cluster"
wsrep_cluster_address="gcomm://172.16.1.20"
wsrep_sst_receive_addres="172.16.1.21"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd=
wsrep_sst_method=rsync
wsrep_sst_auth=wsrep_sst:password
[mariadb-5.5] 

/var/lib/mysql 上的节点 A 权限

-rw-rw----. 1 mysql mysql     16384 Mar  4 23:54 aria_log.00000001
-rw-rw----. 1 mysql mysql        52 Mar  4 23:54 aria_log_control
-rw-r-----. 1 mysql root     283162 Mar  5 17:49 db01.deg.pod1.err
-rw-rw----. 1 mysql mysql         5 Mar  4 23:54 db01.deg.pod1.pid
-rw-------. 1 mysql mysql 134219040 Mar  5 17:48 galera.cache
-rw-rw----. 1 mysql mysql       104 Mar  5 17:48 grastate.dat
-rw-rw----. 1 mysql mysql  12582912 Mar  4 23:54 ibdata1
-rw-rw----. 1 mysql mysql   5242880 Mar  4 23:54 ib_logfile0
-rw-rw----. 1 mysql mysql   5242880 Mar  4 22:30 ib_logfile1
drwx------. 2 mysql mysql      4096 Mar  4 22:59 mysql
srwxrwxrwx. 1 mysql mysql         0 Mar  4 23:54 mysql.sock
drwx------. 2 root  root       4096 Mar  4 22:59 performance_schema
-rw-r--r--. 1 mysql mysql       124 Mar  4 22:11 RPM_UPGRADE_HISTORY
-rw-r--r--. 1 mysql mysql       124 Mar  4 22:11 RPM_UPGRADE_MARKER-LAST
drwxr-xr-x. 2 mysql mysql      4096 Mar  4 22:11 test
drwx------. 2 mysql mysql      4096 Mar  5 17:35 tt

/var/lib/mysql 上的节点 B 权限

-rw-rw----. 1 mysql mysql     16384 Mar  5 17:49 aria_log.00000001
-rw-rw----. 1 mysql mysql        52 Mar  5 17:49 aria_log_control
-rw-r-----. 1 mysql root          0 Mar  5 17:49 db02.deg.pod1.err
-rw-------. 1 mysql mysql 134219040 Mar  5 17:49 galera.cache
-rw-rw----. 1 mysql mysql       104 Mar  5 17:49 grastate.dat
-rw-rw----. 1 mysql mysql  12582912 Mar  5 17:49 ibdata1
-rw-rw----. 1 mysql mysql   5242880 Mar  5 17:49 ib_logfile0
-rw-rw----. 1 mysql mysql   5242880 Mar  5 17:49 ib_logfile1
drwx------. 2 mysql mysql      4096 Mar  4 23:10 mysql
srwxrwxrwx  1 mysql mysql         0 Mar  5 17:49 mysql.sock
-rw-------. 1 root  root        107 Mar  4 23:10 nohup.out
-rw-r--r--  1 root  root     269455 Mar  5 03:42 out.log
drwx------  2 root  root       4096 Mar  5 03:20 performance_schema
-rw-r--r--. 1 mysql mysql       124 Mar  4 22:14 RPM_UPGRADE_HISTORY
-rw-r--r--. 1 mysql mysql       124 Mar  4 22:14 RPM_UPGRADE_MARKER-LAST
drwxr-xr-x. 2 mysql mysql      4096 Mar  4 22:14 test
drwx------  2 mysql mysql      4096 Mar  5 17:36 tt
-rw-------  1 root  root          0 Mar  5 03:52 wsrep_recovery.hh4i9

mysql用户和root用户两端的密码也是一样的。

** Node B 上的失败日志 **

140305 17:49:39 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140305 17:49:39 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.A8nOzH' --pid-file='/var/lib/mysql/db02.deg.pod1-recover.pid'
nohup: ignoring input
140305 17:49:39 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:39 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 mysqld_safe WSREP: Recovered position bce8f04b-a41a-11e3-b010-4ba4a408598c:0
140305 17:49:41 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 [Warning] The syntax '--log' is deprecated and will be removed in a future release. Please use '--general-log'/'--general-log-file' instead.
140305 17:49:41 [Note] WSREP: wsrep_start_position var submitted: 'bce8f04b-a41a-11e3-b010-4ba4a408598c:0'
140305 17:49:41 [Note] WSREP: Read nil XID from storage engines, skipping position init
140305 17:49:41 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
140305 17:49:41 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
140305 17:49:41 [Note] WSREP: CRC-32C: using hardware acceleration.
140305 17:49:41 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
140305 17:49:41 [Note] WSREP: Passing config to GCS: base_host = 172.16.1.21; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140305 17:49:41 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
140305 17:49:41 [Note] WSREP: wsrep_sst_grab()
140305 17:49:41 [Note] WSREP: Start replication
140305 17:49:41 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
140305 17:49:41 [Note] WSREP: protonet asio version 0
140305 17:49:41 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140305 17:49:41 [Note] WSREP: backend: asio
140305 17:49:41 [Note] WSREP: GMCast version 0
140305 17:49:41 [Note] WSREP: (7036f7c8-a4b8-11e3-97c3-866382997e69, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140305 17:49:41 [Note] WSREP: (7036f7c8-a4b8-11e3-97c3-866382997e69, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140305 17:49:41 [Note] WSREP: EVS version 0
140305 17:49:41 [Note] WSREP: PC version 0
140305 17:49:41 [Note] WSREP: gcomm: connecting to group 'controller_cluster', peer '172.16.1.20:'
140305 17:49:41 [Note] WSREP: declaring 3f183cba-a422-11e3-b1c7-52f230abd39f stable
140305 17:49:41 [Note] WSREP: Node 3f183cba-a422-11e3-b1c7-52f230abd39f state prim
140305 17:49:41 [Note] WSREP: view(view_id(PRIM,3f183cba-a422-11e3-b1c7-52f230abd39f,48) memb 
        3f183cba-a422-11e3-b1c7-52f230abd39f,0
        7036f7c8-a4b8-11e3-97c3-866382997e69,0
 joined 
 left 
 partitioned 
)
140305 17:49:42 [Note] WSREP: gcomm: connected
140305 17:49:42 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
140305 17:49:42 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
140305 17:49:42 [Note] WSREP: Opened channel 'controller_cluster'
140305 17:49:42 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
140305 17:49:42 [Note] WSREP: Waiting for SST to complete.
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: sent state msg: 52478b99-a4b8-11e3-9283-8207651d087e
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: got state msg: 52478b99-a4b8-11e3-9283-8207651d087e from 0 (db01.deg.pod1)
140305 17:49:42 [Note] WSREP: STATE EXCHANGE: got state msg: 52478b99-a4b8-11e3-9283-8207651d087e from 1 (db02.deg.pod1)
140305 17:49:42 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 47,
        members    = 1/2 (joined/total),
        act_id     = 1,
        last_appl. = -1,
        protocols  = 0/5/2 (gcs/repl/appl),
        group UUID = bce8f04b-a41a-11e3-b010-4ba4a408598c
140305 17:49:42 [Note] WSREP: Flow-control interval: [23, 23]
140305 17:49:42 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1)
140305 17:49:42 [Note] WSREP: State transfer required: 
        Group state: bce8f04b-a41a-11e3-b010-4ba4a408598c:1
        Local state: 00000000-0000-0000-0000-000000000000:-1
140305 17:49:42 [Note] WSREP: New cluster view: global state: bce8f04b-a41a-11e3-b010-4ba4a408598c:1, view# 48: Primary, number of nodes: 2, my index: 1, protocol version 2
140305 17:49:42 [Warning] WSREP: Gap in state sequence. Need state transfer.
140305 17:49:44 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.1.21' --auth 'wsrep_sst:password' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '13639''
140305 17:49:44 [Note] WSREP: Prepared SST request: rsync|172.16.1.21:4444/rsync_sst
140305 17:49:44 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140305 17:49:44 [Note] WSREP: REPL Protocols: 5 (3, 1)
140305 17:49:44 [Note] WSREP: Assign initial position for certification: 1, protocol version: 3
140305 17:49:44 [Note] WSREP: Service thread queue flushed.
140305 17:49:44 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (bce8f04b-a41a-11e3-b010-4ba4a408598c): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():445. IST will be unavailable.
140305 17:49:44 [Note] WSREP: Node 1.0 (db02.deg.pod1) requested state transfer from '*any*'. Selected 0.0 (db01.deg.pod1)(SYNCED) as donor.
140305 17:49:44 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1)
140305 17:49:44 [Note] WSREP: Requesting state transfer: success, donor: 0
140305 17:49:45 [Warning] WSREP: 0.0 (db01.deg.pod1): State transfer to 1.0 (db02.deg.pod1) failed: -1 (Operation not permitted)
140305 17:49:45 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():723: Will never receive state. Need to abort.
140305 17:49:45 [Note] WSREP: gcomm: terminating thread
140305 17:49:45 [Note] WSREP: gcomm: joining thread
140305 17:49:45 [Note] WSREP: gcomm: closing backend
140305 17:49:46 [Note] WSREP: view(view_id(NON_PRIM,3f183cba-a422-11e3-b1c7-52f230abd39f,48) memb 
        7036f7c8-a4b8-11e3-97c3-866382997e69,0
 joined 
 left 
 partitioned 
        3f183cba-a422-11e3-b1c7-52f230abd39f,0
)
140305 17:49:46 [Note] WSREP: view((empty))
140305 17:49:46 [Note] WSREP: gcomm: closed
140305 17:49:46 [Note] WSREP: /usr/sbin/mysqld: Terminated.
140305 17:49:46 mysqld_safe mysqld from pid file /var/lib/mysql/db02.deg.pod1.pid ended
WSREP_SST: [ERROR] Parent mysqld process (PID:13639) terminated unexpectedly. (20140305 17:49:47.601)
WSREP_SST: [INFO] Joiner cleanup. (20140305 17:49:47.603)
WSREP_SST: [INFO] Joiner cleanup done. (20140305 17:49:48.110)

【问题讨论】:

打开wsrep_debug 并从中检查日志。 SST 中某处存在问题,该日志应该会告诉您是什么问题。 你找到解决办法了吗? 【参考方案1】:

节点未能添加的一个可能原因是现有节点不期望它们。在您的配置文件中,您可能只需将每个节点指向其余节点。

例如,尝试在两个galera.cnf 文件中使用wsrep_cluster_address="gcomm://"172.16.1.20,172.16.1.21"

【讨论】:

以上是关于Galera MySQL 上的其他节点无法添加的主要内容,如果未能解决你的问题,请参考以下文章

安装配置Haproxy代理MySQL Galera集群

MariaDB基于MHA和Galera Cluster实现高可用

超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一

部署MySQL Galera Cluster

Mariadb Galera Cluster 部署

Galera Cluster for MySQL