超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一
Posted
技术标签:
【中文标题】超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一【英文标题】:timeout error restarting mysql (mariadb) which is one of the nodes in 3 node galera cluster 【发布时间】:2018-12-02 07:34:02 【问题描述】:我有一个 3 节点 mariadb 集群 (ubuntu 16.0.4 MariaDB 10.3.7) 在其中一个节点上执行 systemctl restart mysql 我收到以下错误消息:
Job for mariadb.service failed because a timout was exceeded. See "Systemctl status mariadb.service" and "jounalctl -xe" for details.
mariadb.services: Unit entered failed state.
systemctl status mysql 返回这个
enter image description here
/etc/mysql/my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
bind-address = 192.168.3.15
max_connections = 100
connect_timeout = 5
wait_timeout = 600
max_allowed_packet = 16M
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 32M
max_heap_table_size = 32M
myisam_recover_options = BACKUP
key_buffer_size = 128M
#open-files-limit = 2000
table_open_cache = 400
myisam_sort_buffer_size = 512M
concurrent_insert = 2
read_buffer_size = 2M
read_rnd_buffer_size = 1M
query_cache_limit = 128K
query_cache_size = 64M
log_warnings = 2
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 10
log_slow_verbosity = query_plan
server_id = 1
log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index
binlog_format = ROW
expire_logs_days = 10
max_binlog_size = 100M
# slaves
#relay_log = /var/log/mysql/relay-bin
#relay_log_index = /var/log/mysql/relay-bin.index
#relay_log_info_file = /var/log/mysql/relay-bin.info
#log_slave_updates = 1
#replicate-do-db = DriveOn
#read_only
default_storage_engine = InnoDB
# you can't just change log file size, requires special procedure
#innodb_log_file_size = 50M
innodb_buffer_pool_size = 256M
innodb_log_buffer_size = 8M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 400
innodb_flush_method = O_DIRECT
[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
#no-auto-rehash # faster start of mysql but no tab completion
[isamchk]
key_buffer = 16M
!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/
/etc/mysql/conf.d/galera.cnf
[galera]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
# Galera Cluster Configuration
wsrep_cluster_name="Test_Cluster"
#wsrep_cluster_address="gcomm://"
wsrep_cluster_address="gcomm://192.168.3.18,192.168.3.19,192.168.3.15"
# Galera Synchronization Configuration
wsrep_sst_method=rsync
# Galera Node Configuration
wsrep_node_address="192.168.3.15"
wsrep_node_name="XXXXXXXX05"
非常感谢您对此提供的任何帮助。
谢谢
【问题讨论】:
我知道您还不能获得 7 天的正常运行时间。 一天通常就足够了。 Galera 节点相距多远? (Ping 时间和/或物理距离) 这是所有 QA\Test 环境。集群上没有流量,仅限于运行简单更新插入查询的几个测试人员。 所有 3 个节点都位于同一个网络上。 【参考方案1】:systemctl start mariadb.service
jpadmin@AFHSCHSDRVT05:~$ systemctl start mariadb.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'mariadb.service'.
Authenticating as: xxxxxxxxxxxxx,,, (xxxxxxxxxxxx)
Password:
==== AUTHENTICATION COMPLETE ===
Job for mariadb.service failed because a timeout was exceeded. See "systemctl status
mariadb.service" and "journalctl -xe" for details.
jpadmin@AFHSHSDRVT05:~$ sudo journalctl -xe
Jun 25 12:17:01 AFHSCHSDRVT05 CRON[7554]: (root) CMD ( cd / && run-parts --report
/etc/cron.hourly)
Jun 25 12:17:01 AFHSCHSDRVT05 CRON[7553]: pam_unix(cron:session): session closed for
user root
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7556]: pam_unix(cron:session): session opened for
user root by (uid=0)
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7557]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 25 13:17:01 AFHSCHSDRVT05 CRON[7556]: pam_unix(cron:session): session closed for user root
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7559]: pam_unix(cron:session): session opened for user root by (uid=0)
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7560]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 25 14:17:01 AFHSCHSDRVT05 CRON[7559]: pam_unix(cron:session): session closed for user root
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: Accepted password for jpadmin from 192.168.171.136 port 50257 ssh2
Jun 25 14:37:50 AFHSCHSDRVT05 sshd[7562]: pam_unix(sshd:session): session opened for user jpadmin by (uid=0)
Jun 25 14:37:50 AFHSCHSDRVT05 systemd[1]: Started Session 94 of user jpadmin.
-- Subject: Unit session-94.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-94.scope has finished starting up.
--
-- The start-up result is done.
Jun 25 14:37:50 AFHSCHSDRVT05 systemd-logind[1113]: New session 94 of user jpadmin.
-- Subject: A new session 94 has been created for user jpadmin
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat
--
-- A new session with the ID 94 has been created for the user jpadmin.
--
-- The leading process of the session is 7562.
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]: jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/usr/sbin/mysqld
Jun 25 14:38:17 AFHSCHSDRVT05 sudo[7640]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)
Jun 25 14:38:19 AFHSCHSDRVT05 sudo[7640]: pam_unix(sudo:session): session closed for user root
Jun 25 14:38:38 AFHSCHSDRVT05 sudo[7646]: jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/usr/sbin/mysqld
Jun 25 14:38:38 AFHSCHSDRVT05 sudo[7646]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)
Jun 25 14:38:39 AFHSCHSDRVT05 sudo[7646]: pam_unix(sudo:session): session closed for user root
Jun 25 14:39:30 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Registered Authentication Agent for unix-process:7650:27285992 (system bus name :1.168 [/usr/bin/pkttyagent --notify-
Jun 25 14:39:36 AFHSCHSDRVT05 polkit-agent-helper-1[7656]: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 25 14:39:36 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Operator of unix-process:7650:27285992 successfully authenticated as unix-user:jpadmin to gain ONE-SHOT authorization
Jun 25 14:39:36 AFHSCHSDRVT05 systemd[1]: Starting MariaDB 10.3.7 database server...
-- Subject: Unit mariadb.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has begun starting up.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: Failed to start MariaDB 10.3.7 database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mariadb.service has failed.
--
-- The result is failed.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 25 14:41:06 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.
Jun 25 14:41:06 AFHSCHSDRVT05 polkitd(authority=local)[1243]: Unregistered Authentication Agent for unix-process:7650:27285992 (system bus name :1.168, object path /org/freedeskto
Jun 25 14:41:20 AFHSCHSDRVT05 sudo[7883]: jpadmin : TTY=pts/0 ; PWD=/home/jpadmin ; USER=root ; COMMAND=/bin/journalctl -xe
Jun 25 14:41:20 AFHSCHSDRVT05 sudo[7883]: pam_unix(sudo:session): session opened for user root by jpadmin(uid=0)
【讨论】:
【参考方案2】:终于有时间解决这个问题了,这是我目前得到的。
$ mysqld_safe --syslog
180626 15:26:45 mysqld_safe Logging to syslog.
180626 15:26:45 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
$ systemctl status mariadb.service
mariadb.service - MariaDB 10.3.7 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: timeout) since Tue 2018-06-26 15:21:13 CDT; 6min ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 1230 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_rec
Process: 1203 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=
Process: 1118 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0
Tasks: 0
Memory: 22.6M
CPU: 1min 26.212s
Jun 26 15:19:42 AFHSCHSDRVT05 systemd[1]: Starting MariaDB 10.3.7 database server...
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Failed to start MariaDB 10.3.7 database server.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.
查看系统日志可以为我提供更多信息 $ cat /var/log/syslog |尾 -f
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Unit entered failed state.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: mariadb.service: Failed with result 'timeout'.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Reached target Multi-User System.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Reached target Graphical Interface.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Started Update UTMP about System Runlevel Changes.
Jun 26 15:21:13 AFHSCHSDRVT05 systemd[1]: Startup finished in 5.222s (kernel) + 1min 37.251s (userspace) = 1min 42.473s.
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 15:26:45 AFHSCHSDRVT05 mysqld_safe: WSREP: Failed to recover position: '2018-06-26 15:26:45 0 [Note] /usr/sbin/mysqld (mysqld 10.3.7-MariaDB-1:10.3.7+maria~xenial-log) starting as process 2148 ...#0122018-06-26 15:26:45 0 [Warning] Can't create test file /var/lib/mysql/AFHSCHSDRVT05.lower-test#012/usr/sbin/mysqld: One can only use the --user switch if running as root#0122018-06-26 15:26:45 0 [ERROR] mysqld: File '/var/log/mysql/mariadb-bin.index' not found (Errcode: 13 "Permission denied")#0122018-06-26 15:26:45 0 [ERROR] Aborting'
【讨论】:
【参考方案3】:$ /var/log/mysql# namei -l /var/log/mysql/mariadb-bin.index
f: /var/log/mysql/mariadb-bin.index
drwxr-xr-x root root /
drwxr-xr-x root root var
drwxrwxr-x root syslog log
drwxr-s--- mysql mysql mysql
-rw-rw---- mysql mysql mariadb-bin.index
【讨论】:
【参考方案4】:嗯...如何通过 mysql 获取 /var/log/mysql 的完整所有者
$ chown -R mysql:mysql /var/log/mysql
$ mysqld_safe --syslog
$ cat /var/log/syslog |尾 -f
Jun 26 16:12:46 AFHSCHSDRVT05 systemd[1]: Starting Daily apt download activities...
Jun 26 16:12:50 AFHSCHSDRVT05 systemd[1]: Started Daily apt download activities.
Jun 26 16:17:01 AFHSCHSDRVT05 CRON[2210]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 26 16:21:44 AFHSCHSDRVT05 sudo: pam_ecryptfs: pam_sm_authenticate: /home/jpadmin is already mounted
Jun 26 16:34:22 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 16:34:22 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 16:39:42 AFHSCHSDRVT05 systemd[1]: Started Session 3 of user jpadmin.
Jun 26 16:46:16 AFHSCHSDRVT05 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jun 26 16:46:16 AFHSCHSDRVT05 mysqld_safe: WSREP: Running position recovery with --disable-log-error --pid-file='/var/lib/mysql/AFHSCHSDRVT05-recover.pid'
Jun 26 16:48:28 AFHSCHSDRVT05 mysqld_safe: WSREP: Failed to recover position: '2018-06-26 16:46:16 0 [Note] /usr/sbin/mysqld (mysqld 10.3.7-MariaDB-1:10.3.7+maria~xenial-log) starting as process 2865 ...#0122018-06-26 16:46:17 0 [ERROR] mysqld: Can't lock aria control file '/var/lib/mysql/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds#0122018-06-26 16:46:48 0 [ERROR] mysqld: Got error 'Could not get an exclusive lock; file is probably in use by another process' when trying to use aria control file '/var/lib/mysql/aria_log_control'#0122018-06-26 16:46:48 0 [ERROR] Plugin 'Aria' init function returned error.#0122018-06-26 16:46:48 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.#0122018-06-26 16:46:48 0 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs#0122018-06-26 16:46:48 0 [Note] InnoDB: Using Linux native AIO#0122018-06-26 16:46:48 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins#0122018-06-26 16:46:48 0 [Note] InnoDB: Uses event mutexes#0122018-06-26 16:46:48 0 [Note] InnoDB: Compressed tables use zl
【讨论】:
【参考方案5】:首先你应该检查你的 /var/log/mysql/error.log
添加到您的 etc/sql 目录中的 my.cnf
[mysqld]
innodb_force_recovery=1
启动服务器。
systemctl start mariadb
一旦服务器重新启动转储所有数据库。
mysqldump -h localhost --opt --lock-all-tables --triggers --routines --flush-logs -u root -p --all-databases --events > mysql_wordpress.dump
如果它没有启动你必须将上述innodb_force_recovery = 1的值增加到2,如果它仍然没有启动使用3等等。支持的最大值为 6,但您可能会损坏数据库文件
现在恢复您的数据库
mysql -u root -p < mysql.dump
这应该可以修复所有索引/数据库错误。
完成后请务必删除或注释掉 innodb_force_recovery。
【讨论】:
以上是关于超时错误重新启动 mysql (mariadb) 这是 3 节点 galera 集群中的节点之一的主要内容,如果未能解决你的问题,请参考以下文章
尝试启动 MySQL 守护程序时发生超时错误。 CentOS 5
超过锁定等待超时;尝试重新启动事务 MYSQL Python
“错误代码:1205。超过锁定等待超时;尝试重新启动事务”删除事件
使用 C# 在 Mysql 上死锁 - “超过锁定等待超时;尝试重新启动事务”
mysql_query("START TRANSACTION")- 超过锁定等待超时;尝试在 Codeigniter Mysql 中重新启动事务