myshard问题 - 数据不一致
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了myshard问题 - 数据不一致相关的知识,希望对你有一定的参考价值。
业务说,为什么10号机房缺少这条数据,其他机房却有?
mysql> select * from tbl_groupinfo where gid=xxxxxxx limit 10; +------------+--------------+-------------+---------------------+------------+--------------+------------+-------------+-----------------+--------+--------+----------+-------------+-------------------+---------------+--------------+-----------+----------+----------+---------------------+-----------+ | sid | tm_timestamp | tm_lasttime | gid | group_name | default_flag | group_attr | group_owner | group_extension | is_del | app_id | mic_seat | invite_perm | invite_media_perm | pub_id_search | apply_verify | public_id | introduc | topic_id | __version | __deleted | +------------+--------------+-------------+---------------------+------------+--------------+------------+-------------+-----------------+--------+--------+----------+-------------+-------------------+---------------+--------------+-----------+----------+----------+---------------------+-----------+ | xxxxxxxxxx | 1495773704 | 1495773704 | xxxxxxxxxxx | 处对象 | 0 | 5 | 3611732366 | vx:wtc2033 | 0 | 18 | 8 | 0 | 0 | 1 | 0 | 0 | | 0 | 6126694332813803019 | 0 | +------------+--------------+-------------+---------------------+-------
大概断定,10号机房的数据同步是有问题的,先看这条记录,是从哪个机房插入的,然后再看10号机房与该机房之间的同步是否有问题,使用8827登录,获取这条数据的版本号__version,由函数转换得到这条数据,来自14号机房插入的, 日期:2017-05-26 05:03:03 机房号:14 端口号:11
这相当于MySQL里的binlog,会记录每条SQL,来自于哪个server-id,目的是为了防止循环复制,myshard不仅在binlog记录server-id,每条记录都带有版本号,包含了从哪个机房,哪个端口写入的,什么时候写入的
到这里,知道14号机房写入的数据,无法同步到10号机房,可以去14号看一下同步命令
[[email protected] local]# echo stat | /scripts/nc_myshard 0 14505 |egrep "speed|behind|offset" shard_local Read_offset 48494420885 shard_local Read_speed 33373 shard_local Read_bytes_behind 0 sync_r12m0 Read_offset 48494420885 sync_r12m0 Read_speed 33373 sync_r12m0 Read_bytes_behind 0 sync_r13m0 Read_offset 48494420885 sync_r13m0 Read_speed 33373 sync_r13m0 Read_bytes_behind 0 sync_r1m0 Read_offset 48494420885 sync_r1m0 Read_speed 33373 sync_r1m0 Read_bytes_behind 0 sync_r3m0 Read_offset 48494420885 sync_r3m0 Read_speed 33373 sync_r3m0 Read_bytes_behind 0 shard_remote Read_offset 52080697507 shard_remote Read_speed 27290 shard_remote Read_bytes_behind 0
发现没有r10m0这个机房来拉取数据,那证明同步有问题了,去10号机房看同步的日志,看到不断去重连14号机房这个点
[[email protected] db_sync_HelloSrv_r10m0_d]# zcat db_sync_xxxxxxxx_r10m0_d.log.13.gz|grep xxx.xxx.xxx.144|more May 13 15:05:31 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:05:51 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:06:11 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2 May 13 15:06:41 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:07:01 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:07:21 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2 May 13 15:07:51 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:08:11 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:08:31 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2 May 13 15:09:01 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:09:21 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:09:41 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2 May 13 15:10:11 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:10:31 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:10:51 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2 May 13 15:11:21 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:0 May 13 15:11:41 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:1 May 13 15:12:01 info db_sync_xxxxxxxx_r10m0_d[]: [tid:77428] [OpLogSynchronizer::run] connecting to sync_r14m0 via xxx.xxx.xxx.144:12505, retry:2
看到有很多日志,不断重试去连接14号机房,其中最早的重连发生在
db_sync_xxxxxxxx_r10m0_d.log.13.gz
这个文件,而这个文件在5月14日记录的
-rw-r--r--. 1 root adm 174K May 13 00:10 db_sync_xxxxxxxx_r10m0_d.log.14.gz -rw-r--r--. 1 root adm 300K May 14 00:10 db_sync_xxxxxxxx_r10m0_d.log.13.gz -rw-r--r--. 1 root adm 230K May 15 00:10 db_sync_xxxxxxxx_r10m0_d.log.12.gz -rw-r--r--. 1 root adm 234K May 16 00:10 db_sync_xxxxxxxx_r10m0_d.log.11.gz -rw-r--r--. 1 root adm 260K May 17 00:10 db_sync_xxxxxxxx_r10m0_d.log.10.gz -rw-r--r--. 1 root adm 261K May 18 00:10 db_sync_xxxxxxxx_r10m0_d.log.9.gz -rw-r--r--. 1 root adm 260K May 19 00:10 db_sync_xxxxxxxx_r10m0_d.log.8.gz -rw-r--r--. 1 root adm 258K May 20 00:10 db_sync_xxxxxxxx_r10m0_d.log.7.gz -rw-r--r--. 1 root adm 260K May 21 00:10 db_sync_xxxxxxxx_r10m0_d.log.6.gz -rw-r--r--. 1 root adm 268K May 22 00:10 db_sync_xxxxxxxx_r10m0_d.log.5.gz -rw-r--r--. 1 root adm 254K May 23 00:10 db_sync_xxxxxxxx_r10m0_d.log.4.gz -rw-r--r--. 1 root adm 259K May 24 00:10 db_sync_xxxxxxxx_r10m0_d.log.3.gz -rw-r--r--. 1 root adm 262K May 25 00:10 db_sync_xxxxxxxx_r10m0_d.log.2.gz -rw-r--r--. 1 root adm 262K May 26 00:10 db_sync_xxxxxxxx_r10m0_d.log.1.gz
一般重连只有2种可能,一个是14号机房没有开放白名单,不允许10号机房访问,
以上是关于myshard问题 - 数据不一致的主要内容,如果未能解决你的问题,请参考以下文章
myshard问题 -server1_nobinlog, Error: Too many connections