超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一

Posted

技术标签:

【中文标题】超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一【英文标题】:timeout error restarting mysql (mariadb) which is one of the nodes in 3 node galera cluster 【发布时间】:2018-12-02 07:34:02 【问题描述】:

我有一个 3 节点 mariadb 集群 (ubuntu 16.0.4 MariaDB 10.3.7) 在其中一个节点上执行 systemctl restart mysql 我收到以下错误消息:

Job for mariadb.service failed because a timout was exceeded. See "Systemctl status mariadb.service" and "jounalctl -xe" for details.

mariadb.services: Unit entered failed state.

systemctl status mysql 返回这个

enter image description here

/etc/mysql/my.cnf

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking

bind-address            = 192.168.3.15

max_connections         = 100
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 16M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 32M
max_heap_table_size     = 32M

myisam_recover_options = BACKUP
key_buffer_size         = 128M
#open-files-limit       = 2000
table_open_cache        = 400
myisam_sort_buffer_size = 512M
concurrent_insert       = 2
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M

query_cache_limit               = 128K
query_cache_size                = 64M
log_warnings            = 2
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 10
log_slow_verbosity      = query_plan

server_id               = 1

log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
binlog_format           = ROW

expire_logs_days        = 10
max_binlog_size         = 100M
# slaves
#relay_log              = /var/log/mysql/relay-bin
#relay_log_index        = /var/log/mysql/relay-bin.index
#relay_log_info_file    = /var/log/mysql/relay-bin.info
#log_slave_updates       = 1
#replicate-do-db                = DriveOn
#read_only

default_storage_engine  = InnoDB
# you can't just change log file size, requires special procedure
#innodb_log_file_size   = 50M
innodb_buffer_pool_size = 256M
innodb_log_buffer_size  = 8M
innodb_file_per_table   = 1
innodb_open_files       = 400
innodb_io_capacity      = 400
innodb_flush_method     = O_DIRECT


[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]
#no-auto-rehash # faster start of mysql but no tab completion

[isamchk]
key_buffer              = 16M

!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/

/etc/mysql/conf.d/galera.cnf

[galera]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="Test_Cluster"

#wsrep_cluster_address="gcomm://"
wsrep_cluster_address="gcomm://192.168.3.18,192.168.3.19,192.168.3.15"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="192.168.3.15"
wsrep_node_name="XXXXXXXX05"

非常感谢您对此提供的任何帮助。

谢谢

【问题讨论】:

我知道您还不能获得 7 天的正常运行时间。 一天通常就足够了。 Galera 节点相距多远? (Ping 时间和/或物理距离) 这是所有 QA\Test 环境。集群上没有流量,仅限于运行简单更新插入查询的几个测试人员。 所有 3 个节点都位于同一个网络上。 【参考方案1】:

systemctl start mariadb.service

jpadmin@AFHSCHSDRVT05:~$ systemctl start mariadb.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'mariadb.service'.
Authenticating as: xxxxxxxxxxxxx,,, (xxxxxxxxxxxx)
Password:
==== AUTHENTICATION COMPLETE ===
Job for mariadb.service failed because a timeout was exceeded. See "systemctl status         
mariadb.service" and "journalctl -xe" for details.

jpadmin@AFHSHSDRVT05:~$ sudo journalctl -xe

Jun 25 12:17:01 AFHSCHSDRVT05 CRON[7554]: (root) CMD (   cd / && run-parts --report 
/etc/cron.hourly)
Jun 25 12:17:01 AFHSCHSDRVT05 CRON[7553]: pam_unix(cron:session): session closed for     
user root
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7556]: pam_unix(cron:session): session opened for     
user root by (uid=0)
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7557]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7556]: pam_unix(cron:session): session closed for user root
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7559]: pam_unix(cron:session): session opened for user root by (uid=0)
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7560]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7559]: pam_unix(cron:session): session closed for user root
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: Accepted password for jpadmin from 192.168.171.136 port 50257 ssh2
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: pam_unix(sshd:session): session opened for user jpadmin by (uid=0)
Jun 25 14:37:50 AFHSCHSDRVT05 systemd[1]: Started Session 94 of user jpadmin.
-- Subject: Unit session-94.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-94.scope has finished starting up.
--
-- The start-up result is done.
Jun 25 14:37:50 AFHSCHSDRVT05 systemd-logind[1113]: New session 94 of user jpadmin.
-- Subject: A new session 94 has been created for user jpadmin
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat
--
-- A new session with the ID 94 has been created for the user jpadmin.
--
-- The leading process of the session is 7562.
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]:  jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/usr/sbin/mysqld
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)
Jun 25 14:38:19 AFHSCHSDRVT05 sudo[7640]: pam_unix(sudo:session): session closed for user root
Jun 25 14:38:38 AFHSCHSDRVT05 sudo[7646]:  jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/usr/sbin/mysqld
Jun 25 14:38:38 AFHSCHSDRVT05 sudo[7646]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)
Jun 25 14:38:39 AFHSCHSDRVT05 sudo[7646]: pam_unix(sudo:session): session closed for user root
Jun 25 14:39:30 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Registered Authentication Agent for unix-process:7650:27285992 (system bus name :1.168 [/usr/bin/pkttyagent --notify-
Jun 25 14:39:36 AFHSCHSDRVT05 polkit-agent-helper-1[7656]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:39:36 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Operator of unix-process:7650:27285992 successfully authenticated as unix-user:jpadmin to gain ONE-SHOT authorization
Jun 25 14:39:36 AFHSCHSDRVT05 systemd[1]: Starting MariaDB 10.3.7 database server...
-- Subject: Unit mariadb.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has begun starting up.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: Failed to start MariaDB 10.3.7 database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has failed.
--
-- The result is failed.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.
Jun 25 14:41:06 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Unregistered Authentication Agent for unix-process:7650:27285992 (system bus name :1.168, object path /org/freedeskto
Jun 25 14:41:20 AFHSCHSDRVT05 sudo[7883]:  jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/bin/journalctl -xe
Jun 25 14:41:20 AFHSCHSDRVT05 sudo[7883]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)

【讨论】:

【参考方案2】:

终于有时间解决这个问题了,这是我目前得到的。

$ mysqld_safe --syslog

180626 15:26:45 mysqld_safe Logging to syslog.
180626 15:26:45 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

$ systemctl status mariadb.service

 mariadb.service - MariaDB 10.3.7 database server
 Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
 Drop-In: /etc/systemd/system/mariadb.service.d
       └─migrated-from-my.cnf-settings.conf
 Active: failed (Result: timeout) since Tue 2018-06-26 15:21:13 CDT; 6min ago
 Docs: man:mysqld(8)
       https://mariadb.com/kb/en/library/systemd/
Process: 1230 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_rec
Process: 1203 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=
Process: 1118 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0
Tasks: 0
Memory: 22.6M
CPU: 1min 26.212s

Jun 26 15:19:42 AFHSCHSDRVT05 systemd[1]: Starting MariaDB 10.3.7 database server...
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Failed to start MariaDB 10.3.7 database server.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.

查看系统日志可以为我提供更多信息 $ cat /var/log/syslog |尾 -f

Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Reached target Multi-User System.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Reached target Graphical Interface.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Started Update UTMP about System Runlevel Changes.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Startup finished in 5.222s (kernel) + 1min 37.251s (userspace) = 1min 42.473s.
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error  --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: WSREP: Failed to recover position: '2018-06-26 15:26:45 0 [Note] /usr/sbin/mysqld (mysqld 10.3.7-MariaDB-1:10.3.7+maria~xenial-log) starting as process 2148 ...#0122018-06-26 15:26:45 0 [Warning] Can't create test file /var/lib/mysql/AFHSCHSDRVT05.lower-test#012/usr/sbin/mysqld: One can only use the --user switch if running as root#0122018-06-26 15:26:45 0 [ERROR] mysqld: File '/var/log/mysql/mariadb-bin.index' not found (Errcode: 13 "Permission denied")#0122018-06-26 15:26:45 0 [ERROR] Aborting'

【讨论】:

【参考方案3】:

$ /var/log/mysql# namei -l /var/log/mysql/mariadb-bin.index

f: /var/log/mysql/mariadb-bin.index
drwxr-xr-x root  root   /
drwxr-xr-x root  root   var
drwxrwxr-x root  syslog log
drwxr-s--- mysql mysql  mysql
-rw-rw---- mysql mysql  mariadb-bin.index

【讨论】:

【参考方案4】:

嗯...如何通过 mysql 获取 /var/log/mysql 的完整所有者

$ chown -R mysql:mysql /var/log/mysql

$ mysqld_safe --syslog

$ cat /var/log/syslog |尾 -f

Jun 26 16:12:46 AFHSCHSDRVT05 systemd[1]: Starting Daily apt download activities...
Jun 26 16:12:50 AFHSCHSDRVT05 systemd[1]: Started Daily apt download activities.
Jun 26 16:17:01 AFHSCHSDRVT05 CRON[2210]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jun 26 16:21:44 AFHSCHSDRVT05 sudo: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 26 16:34:22 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 16:34:22 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error  --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 16:39:42 AFHSCHSDRVT05 systemd[1]: Started Session 3 of user jpadmin.
Jun 26 16:46:16 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 16:46:16 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error  --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 16:48:28 AFHSCHSDRVT05 mysqld_safe: WSREP: Failed to recover position: '2018-06-26 16:46:16 0 [Note] /usr/sbin/mysqld (mysqld 10.3.7-MariaDB-1:10.3.7+maria~xenial-log) starting as process 2865 ...#0122018-06-26 16:46:17 0 [ERROR] mysqld: Can't lock aria control file '/var/lib/mysql/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds#0122018-06-26 16:46:48 0 [ERROR] mysqld: Got error 'Could not get an exclusive lock; file is probably in use by another process' when trying to use aria control file '/var/lib/mysql/aria_log_control'#0122018-06-26 16:46:48 0 [ERROR] Plugin 'Aria' init function returned error.#0122018-06-26 16:46:48 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.#0122018-06-26 16:46:48 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs#0122018-06-26 16:46:48 0 [Note] InnoDB: Using Linux native AIO#0122018-06-26 16:46:48 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins#0122018-06-26 16:46:48 0 [Note] InnoDB: Uses event mutexes#0122018-06-26 16:46:48 0 [Note] InnoDB: Compressed tables use zl

【讨论】:

【参考方案5】:

首先你应该检查你的 /var/log/mysql/error.log

添加到您的 etc/sql 目录中的 my.cnf

[mysqld]
innodb_force_recovery=1

启动服务器。

systemctl start mariadb

一旦服务器重新启动转储所有数据库。

mysqldump -h localhost --opt --lock-all-tables --triggers --routines --flush-logs -u root -p --all-databases --events > mysql_wordpress.dump

如果它没有启动你必须将上述innodb_force_recovery = 1的值增加到2,如果它仍然没有启动使用3等等。支持的最大值为 6,但您可能会损坏数据库文件

现在恢复您的数据库 mysql -u root -p < mysql.dump 这应该可以修复所有索引/数据库错误。

完成后请务必删除或注释掉 innodb_force_recovery。

【讨论】:

以上是关于超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一的主要内容,如果未能解决你的问题,请参考以下文章

尝试启动 MySQL 守护程序时发生超时错误。 CentOS 5

超过锁定等待超时;尝试重新启动事务 MYSQL Python

“错误代码:1205。超过锁定等待超时;尝试重新启动事务”删除事件

使用 C# 在 Mysql 上死锁 - “超过锁定等待超时;尝试重新启动事务”

重新启动后出现“等待服务连接时超时”错误

mysql_query("START TRANSACTION")- 超过锁定等待超时;尝试在 Codeigniter Mysql 中重新启动事务