pt-heartbeat

Posted SlowTech

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pt-heartbeat相关的知识,希望对你有一定的参考价值。

pt-heartbeat是用来监测主从延迟的情况的,众所周知,传统的通过show slave status\G命令中的Seconds_Behind_Master值来判断主从延迟并不靠谱。

pt-heartbeat的思路比较巧妙,它在master中插入一条带有当前时间(mysql中的now()函数)的记录到心跳表中,然后,该记录会复制到slave中。slave根据当前的系统时间戳(Perl中的time函数)减去heartbeat表中的记录值来判断主从的延迟情况。具体,可参考下面--skew参数的说明。

 

常见用法:

Master上

需用--update参数

# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test

其中,--update会每秒更新一次heartbeat表的记录 -D指的是heartbeat表所在的database

-D是--database的缩写,--database后面即可以直接加参数,如--database test,也可用等号加参数,如--database=test,但是缩写只能通过-D test加参数。

# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database test 
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database=test 

注意:在第一次运行时,需带上 --create-table参数创建heartbeat表并插入第一条记录。也可加上--daemonize参数,让该脚本以后台进程运行。

 

Slave上

可用--monitor参数或者--check参数

--monitor参数是持续监测并输出结果

# pt-heartbeat -D test --monitor -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123

10061.00s [ 167.68s, 33.54s, 11.18s ]
10062.00s [ 335.38s, 67.08s, 22.36s ]
10063.01s [ 503.10s, 100.62s, 33.54s ]
...

--check参数是只监测一次就退出了

# pt-heartbeat -D test --check -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123

10039.00

注意:--update, --monitor和--check三者是互斥的,--daemonize只适用于--update场景

 

看看各参数的意义

--ask-pass

    连接数据库时提示密码

Prompt for a password when connecting to MySQL.

--charset
    short form: -A

    默认字符集(个人感觉这个选项很鸡肋)

short form: -A; type: string
Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode
on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.

--check

   检查从的延迟后退出,如果在级联复制中,还可以指定--recurse参数,这时候,会检测从库的从库的延迟情况。

Check slave delay once and exit. If you also specify --recurse, the tool will try to discover slave’s of the given slave and check and print their lag, too. The hostname or IP and port for each slave is printed before its
delay. --recurse only works with MySQL.

--check-read-only

   检查server是否是只读的,如果是只读,则会跳过插入动作。

Check if the server has read_only enabled; If it does, the tool skips doing any inserts.

--config

   Read this comma-separated list of config files; if specified, this must be the first option on the command line.

   将参数写入到参数文件中,

   有以下几点需要注意:

   1> # pt-heartbeat --config pt-heartbeat.conf,而不能是# pt-heartbeat --config=pt-heartbeat.conf 

   2> 参数文件中只支持如下写法

option
option=value

    option面前不能带上--,而且option只能是全拼,不能是缩写,譬如database,不能缩写为-D

    具体写法可参考:https://www.percona.com/doc/percona-toolkit/2.1/configuration_files.html

    试举一例,如下所示

# cat pt-heartbeat.conf 
host=192.168.244.20
user=monitor
password=monitor123
monitor
database=test
master-server-id=1 

--create-table

   创建heartbeat表如果该表不存在,该表由--database和--table参数来确认。

   其中--table表的定义如下所示:

CREATE TABLE heartbeat (
ts varchar(26) NOT NULL,
server_id int unsigned NOT NULL PRIMARY KEY,
file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS
position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS
exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);

 

Create the heartbeat --table if it does not exist.
This option causes the table specified by --database and --table to be created with the following MAGIC_create_heartbeat table definition:
CREATE TABLE heartbeat (
ts varchar(26) NOT NULL,
server_id int unsigned NOT NULL PRIMARY KEY,
file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS
position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS
exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);
The heartbeat table requires at least one row. If you manually create the heartbeat table, then you must insert a row by doing:
INSERT INTO heartbeat (ts, server_id) VALUES (NOW(), N);
or if using --utc:
INSERT INTO heartbeat (ts, server_id) VALUES (UTC_TIMESTAMP(), N);
where N is the server’s ID; do not use @@server_id because it will replicate and slaves will insert their own server ID instead of the master’s server ID.
This is done automatically by --create-table.
A legacy version of the heartbeat table is still supported:
CREATE TABLE heartbeat (
id int NOT NULL PRIMARY KEY,
ts datetime NOT NULL
);
Legacy tables do not support --update instances on each slave of a multi-slave hierarchy like “master ->slave1 -> slave2”. To manually insert the one required row into a legacy table:
INSERT INTO heartbeat (id, ts) VALUES (1, NOW());
or if using --utc:
INSERT INTO heartbeat (id, ts) VALUES (1, UTC_TIMESTAMP());
The tool automatically detects if the heartbeat table is legacy.

--create-table-engine

   指定heartbeat表的存储引擎

type: string
Sets the engine to be used for the heartbeat table. The default storage engine is InnoDB as of MySQL 5.5.5.

--daemonize

   脚本以守护进程运行,这样即使脚本执行的终端断开了,脚本也不会停止运行。

Fork to the background and detach from the shell. POSIX operating systems only.

--database

   指定heartbeat表所在的数据库

short form: -D; type: string
The database to use for the connection.

--dbi-driver

   pt-heartbeat不仅能检测MySQL之间的心跳延迟情况,还可以检测PG。

   该参数指定连接使用的驱动,默认为mysql,也可指定为Pg

default: mysql; type: string
Specify a driver for the connection; mysql and Pg are supported.

--defaults-file

   指定参数文件的位置,必须为绝对路径。

short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.

--file

   将最新的--monitor信息输出到文件中,注意最新,新的信息会覆盖旧的信息。

   如果不加该参数,则monitor的信息会直接输出到终端上,该选项通常和--daemonize参数一起使用。

   譬如,

   # pt-heartbeat -D test --monitor -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123 --file=result

   该命令会在当前目录下生成一个result文件,记录最新的检测信息

# cat result 
1376.00s [ 1126.25s, 225.25s, 75.08s ]
type: string
Print latest --monitor output to this file.
When --monitor is given, prints output to the specified file instead of to STDOUT. The file is opened, truncated,and closed every interval, so it will only contain the most recent statistics. Useful when --daemonize
is given.

--frames

   统计的时间窗口,默认为1m,5m,15m,即分别统计1min,5min和15min内的平均延迟情况。

   单位可以是s,m,h,d,注意:时间窗口越大,需要缓存的结果越多,对内存的消耗也越大。

type: string; default: 1m,5m,15m
Timeframes for averages.
Specifies the timeframes over which to calculate moving averages when --monitor is given. Specify as a comma-separated list of numbers with suffixes. The suffix can be s for seconds, m for minutes, h for hours, or d
for days. The size of the largest frame determines the maximum memory usage, as up to the specified number of per-second samples are kept in memory to calculate the averages. You can specify as many timeframes as
you like.

--help 

Show help and exit.

--host 

   指定连接的主机,可缩写为-h

short form: -h; type: string
Connect to host.

--[no]insert-heartbeat-row

   官方解释如下:

default: yes
Insert a heartbeat row in the --table if one doesn’t exist.
The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this
feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.

    事实上,在执行如下命令时,

    # pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123

    如果,heartbeat表为空,则会自动insert一条记录。

    但如果指定了--no-insert-heartbeat-row参数,则不会自动创建,此时,会提示如下信息:

# pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123 --no-insert-heartbeat-row
No row found in heartbeat table for server_id 1.
At least one row must be inserted into the heartbeat table for server_id 1.
Please read the DESCRIPTION section of the pt-heartbeat POD.

    PS:在测试的过程中,发现官方并没有完整的校验这个参数,即便传入--no-insert-heartbeat和--insert-heartbeat参数也不会报错,但是传入--123-insert-heartbeat-ro,会报错“Unknown option: 123-insert-heartbeat-ro”。

default: yes
Insert a heartbeat row in the --table if one doesn’t exist.
The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this
feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.

--interval

   update和check heartbeat表的频率,默认是1s。

type: float; default: 1.0
How often to update or check the heartbeat --table. Updates and checks begin on the first whole second then repeat every --interval seconds for --update and every --interval plus --skew seconds for
--monitor.
For example, if at 00:00.4 an --update instance is started at 0.5 second intervals, the first update happens at 00:01.0, the next at 00:01.5, etc. If at 00:10.7 a --monitor instance is started at 0.05 second intervals with
the default 0.5 second --skew, then the first check happens at 00:11.5 (00:11.0 + 0.5) which will be --skew seconds after the last update which, because the instances are checking at synchronized intervals, happened at
00:11.0.
The tool waits for and begins on the first whole second just to make the interval calculations simpler. Therefore,the tool could wait up to 1 second before updating or checking.
The minimum (fastest) interval is 0.01, and the maximum precision is two decimal places, so 0.015 will be rounded to 0.02.
If a legacy heartbeat table (see --create-table) is used, then the maximum precision is 1s because the ts column is type datetime.

--log

   在脚本以守护进程执行时,将结果输出到log指定的文件中。

type: string
Print all output to this file when daemonized.

--master-server-id

   指定master的server_id,在检测从的延迟时,必须指定该参数,不然会报如下错误:

The --master-server-id option must be specified because the heartbeat table `test`.`heartbeat` uses the server_id column for --update or --check but the servers master could not be automatically determined.

 

type: string
Calculate delay from this master server ID for --monitor or --check. If not given, pt-heartbeat attempts to connect to the server’s master and determine its server id.

--monitor

   持续的检测并输出从的延迟情况

   其中,检测并输出的频率有--interval参数决定,默认为1s

   注意:与--check的区别在于:

      1> --monitor是持续输出的,而--check是检测一次即退出。

      2> --monitor可与--file参数搭配,而--check与--file参数搭配无效。

Monitor slave delay continuously.
Specifies that pt-heartbeat should check the slave’s delay every second and report to STDOUT (or if --file is given, to the file instead). The output is the current delay followed by moving averages over the timeframe
given in --frames. For example,
5s [ 0.25s, 0.05s, 0.02s ]

--password

  指定登录的密码,缩写为-p

short form: -p; type: string
Password to use when connecting. If password contains commas they must be escaped with a backslash:
“exam,ple”

--pid

  创建pid文件

type: string
Create the given PID file. The tool won’t start if the PID file already exists and the PID it contains is different than the current PID. However, if the PID file exists and the PID it contains is no longer running, the tool will
overwrite the PID file with the current PID. The PID file is removed automatically when the tool exits.

--port

  指定登录的端口,缩写为-P

short form: -P; type: int
Port number to use for connection.

--print-master-server-id

  同时输出主的server_id,在--monitor情况下,默认输出为

1272.00s [ 21.20s,  4.24s,  1.41s ]

 如果指定了该参数,则输出为

1272.00s [ 21.20s,  4.24s,  1.41s ] 1

 

Print the auto-detected or given --master-server-id. If --check or --monitor is specified, specifying this option will print the auto-detected or given --master-server-id at the end of each line.

 

--recurse

 在--check模式下,用于检测级联复制中从的延迟情况。其中,--recurse用于指定级联的层级。

type: int
Check slaves recursively to this depth in --check mode.
Try to discover slave servers recursively, to the specified depth. After discovering servers, run the check on each one of them and print the hostname (if possible), followed by the slave delay.
This currently works only with MySQL. See --recursion-method.

--recursion-method

  在级联复制中,找到slave的方法。有show processlist和show slave hosts两种。

type: array; default: processlist,hosts
Preferred recursion method used to find slaves.
Possible methods are:
METHOD USES
=========== ==================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
none Do not find slaves
The processlist method is preferred because SHOW SLAVE HOSTS is not reliable. However, the hosts method is required if the server uses a non-standard port (not 3306). Usually pt-heartbeat does the right thing and finds
the slaves, but you may give a preferred method and it will be used first. If it doesn’t find any slaves, the other methods will be tried.

--replace

  在--update模式下,默认是使用update操作进行记录的更新,但有时候你不太确认heartbeat表中是否任何记录时,此时可使用replace操作。

  注意:如果是通过update进行记录的更新,如果在脚本运行的过程中,truncate heartbeat表,脚本并不会异常退出,但是heartbeat表也有不会生成新的记录。

  但如果是通过replace方式进行记录的更新,则即便是在上面这种场景下,heartbeat表仍旧会生成新的记录。个人感觉通过replace操作进行记录的更新更靠谱。

Use REPLACE instead of UPDATE for –update.
When running in --update mode, use REPLACE instead of UPDATE to set the heartbeat table’s timestamp.The REPLACE statement is a MySQL extension to SQL. This option is useful when you don’t know whether
the table contains any rows or not. It must be used in conjunction with –update.

--run-time

  指定脚本运行的时间,无论是针对--update操作还是--monitor操作均实用。

type: time
Time to run before exiting.

--sentinel

  “哨兵”,如果指定的文件存在则提出,默认为/tmp/pt-heartbeat-sentinel

type: string; default: /tmp/pt-heartbeat-sentinel
Exit if this file exists.

  经测试,即便没有带上--sentinel参数,如果/tmp/pt-heartbeat-sentinel文件存在,则脚本一执行时就直接退出。

  --sentinel作用在于自定义监控文件。

  譬如在执行如下命令时, /root/123文件并不存在,则该脚本会继续运行,在脚本运行的过程中,创建该文件,则脚本会马上退出。

  # pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123  --sentinel=/root/123

--slave-user

   设置连接slave的用户

type: string
Sets the user to be used to connect to the slaves. This parameter allows you to have a different user with less privileges on the slaves but that user must exist on all slaves.

--slave-password

    设置连接slave的用户密码

type: string
Sets the password to be used to connect to the slaves. It can be used with –slave-user and the password for the user must be the same on all slaves.

 --set-vars

    设置脚本在与MySQL交互过程时的会话变量,但似乎并没有什么用

type: Array
Set the MySQL variables in this comma-separated list of variable=value pairs.
By default, the tool sets:
wait_timeout=10000
Variables specified on the command line override these defaults. For example, specifying --set-vars wait_timeout=500 overrides the defaultvalue of 10000.
The tool prints a warning and continues if a variable cannot be set.

--skew

  指定check相对于update的延迟时间。默认为0.5秒

  即--update更新一次后,--check会在0.5秒后检查此次更新所对应的主从延迟情况。

  可能有人会比较好奇,脚本是如何知道记录是何时更新的,实际上,每次--update的时间都是秒的整点值,譬如,其中一次记录的值为“2016-09-25T13:04:06.003130”。然后,0.5s后,脚本获取slave上的系统时间,然后减去heartbeat中记录值,来作为主从延迟的时间。这就要求,主从上的系统时间需要保持一致,不然得到的结果就没有参考价值。

  下面,可看看源码实现,这个是整个脚本的核心逻辑。

         my ($ts, $hostname, $server_id) = $sth->fetchrow_array();
         my $now = time;
         PTDEBUG && _d("Heartbeat from server", $server_id, "\n",
            " now:", ts($now, $utc), "\n",
            "  ts:", $ts, "\n",
            "skew:", $skew);
         my $delay = $now - unix_timestamp($ts, $utc) - $skew;
         PTDEBUG && _d(Delay, sprintf(%.6f, $delay), on, $hostname);

         # Because we adjust for skew, if the ts are less than skew seconds
         # apart (i.e. replication is very fast) then delay will be negative. 
         # So it‘s effectively 0 seconds of lag.
         $delay = 0.00 if $delay < 0;
type: float; default: 0.5
How long to delay checks.
The default is to delay checks one half second. Since the update happens as soon as possible after the beginning of the second on the master, this allows one half second of replication delay before reporting that the slave lags
the master by one second. If your clocks are not completely accurate or there is some other reason you’d like to delay the slave more or less, you can tweak this value. Try setting the PTDEBUG environment variable to see
the effect this has.

--socket

short form: -S; type: string
Socket file to use for connection.

 

--table

  指定心跳表的名字,默认为heartbeat

type: string; default: heartbeat
The table to use for the heartbeat.
Don’t specify database.table; use --database to specify the database.
See --create-table.

--update

  更新master中heartbeat表的记录

Update a master’s heartbeat.

--user

  指定连接的用户

short form: -u; type: string
User for login if not current user.

--utc

  忽略系统时区,而使用UTC。如果要使用该选项,则--update,--monitor,--check中必须同时使用。

Ignore system time zones and use only UTC. By default pt-heartbeat does not check or adjust for different system or MySQL time zones which can cause the tool to compute the lag incorrectly. Specifying this option is
a good idea because it ensures that the tool works correctly regardless of time zones.
If used, this option must be used for all pt-heartbeat instances: --update, --monitor, --check, etc.
You should probably set the option in a --config file. Mixing this option with pt-heartbeat instances not using this option will cause false-positive lag readings due to different time zones (unless all your systems are
set to use UTC, in which case this option isn’t required).

--version

  打印版本信息

--[no]version-check

  检查pt,连接的MySQL Server,Perl以及DBD::mysql的版本信息。

  并且打印这些软件特定版本的问题

Check for the latest version of Percona Toolkit, MySQL, and other programs.
84 Chapter 2. Tools
Percona Toolkit Documentation, Release 2.2.19
This is a standard “check for updates automatically” feature, with two additional features. First, the tool checks the version of other programs on the local system in addition to its own version. For example, it checks the
version of every MySQL server it connects to, Perl, and the Perl module DBD::mysql. Second, it checks for and warns about versions with known problems. For example, MySQL 5.5.25 had a critical bug and was re-released
as 5.5.25a.
Any updates or known problems are printed to STDOUT before the tool’s normal output. This feature should never interfere with the normal operation of the tool.

 

 

 

   

 

以上是关于pt-heartbeat的主要内容,如果未能解决你的问题,请参考以下文章

pt-heartbeat

pt-heartbeat 监测RDS延迟

pt-heartbeat监测mysql主从同步延迟

pt-heartbeat(percona toolkit)

当master down掉后,pt-heartbeat不断重试会导致内存缓慢增长

pt-heartbeat --update --daemonize 只执行一次秒退的问题