Mysql主从库 数据不同步 解决记录
背 景
周五快下班的时候,同事对我说线上的主从库不能同步了。线上服务器也就我一个人维护,难道是我做的好事?最后一次维护主从库,还是一年前在Linux系统上安装主从库。
上次主从库不同步发生在两年前的测试环境。而测试环境是windows系统,线上环境是Linux系统。这次线上环境主从库突然出现问题,很想迅速解决。
但是在打开连接服务器的 putty 时,连接主从库的命令愣是想不起来。好记性不如烂笔头,出现问题还是要多记录一下,不然下次敲个命令都要找半天,耽误时间。
一、先看 Master 主库的状态
# 登录mysql Master主库,输入登录密码
mysql -S /tmp/mysql_3306.sock -p
# 查看主库状态
show master status;
主库的日志文件是66,Position是 537609。
二、再来看 Slave 从库的状态
# 登录mysql Slave从库,输入登录密码
mysql -S /tmp/mysql_3307.sock -p
# 查看从库状态,格式化一下输出信息
show slave status\G
日志文件如下:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: mysql_sync
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql3306_bin.000066
Read_Master_Log_Pos: 537609
Relay_Log_File: ecs-7f7b-relay-bin.000176
Relay_Log_Pos: 23138574
Relay_Master_Log_File: mysql3306_bin.000063
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reason s are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay lo g, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 23138353
Relay_Log_Space: 23824100
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reason s are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay lo g, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3306
Master_UUID: 79529d27-823f-11eb-9fd0-fa163e773a43
Master_Info_File: /mnt/databases/mysql/mysql_3307/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 221014 17:05:44
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
错误提示:Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem,or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
日志分析:
-
从库两个重要的同步状态指标,Slave_IO_running 的状态是正常的,而Slave_SQL_running已经关闭。
-
从库同步的文件编号是63,Position是 23118574,也就是说主从库同步相差4个文件。
-
Master 主服务器的 binary log 文件损坏,Slave从服务器的Relay log文件损坏。
-
最后一次错误是1594,最后一次 Sql 错误也是1594。
在网上查了一下,错误 1594 一般是由掉电引起。而我使用的是云服务器,怎么会有机房掉电呢?想想我最近几天对服务器做了啥?嗯,做了个压测,把服务器的CPU撑爆了,没想到数据库最先受到了影响。
三、解决问题
1、解决 #1594 错误
# 停止从库状态
stop slave;
# 重置从库
reset slave all;
# 重置同步状态,从主库最新的文件、最新的位置开始同步
CHANGE MASTER TO MASTER_HOST='127.0.0.1',MASTER_USER='mysql_sync',MASTER_PASSWORD='xxxx',MASTER_PORT=3306,MASTER_LOG_FILE='mysql3306_bin.000066',MASTER_LOG_POS=23138353;
# 开启从库状态;
start slave;
重启Slave从库的同步开关后,再次查看slave状态,Slave_IO_Running变成了No,出现了新的报错信息1236。
错误提示:
Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'
2、解决 #1236 错误
# 刷新主库日志
flush logs
# 查看主库状态
show master status;
回到 Slave 从库再次操作:
# 停止从库同步状态
stop slave;
# 修改从库日志同步的开始文件
change master to master_log_file='mysql3306_bin.000068',master_log_pos=154;
master_log_file为主库的最新日志文件,master_log_pos为主库最新的位置。
设置后重启 Slave 从库状态:
从库同步的两个状态都为Yes,主从库同步开关设置成功。3、另一种解决办法 前面两种办法存在一个问题。在数据同步失效期间,新增的数据再次修改时,是无法被同步到 Slave从库的。
其实还可以试一试这个参数:
# 从库 slave 跳过一个事务,一次跳过一个错误信息
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1
https://dev.mysql.com/doc/refman/5.6/en/set-global-sql-slave-skip-counter.html先关闭从库的同步,再跳过错误事件,N便是错误事件的次数,最后再重启从库的同步。这样丢失的数据就可以减到最少。
既然出现了错误,总归要看看错误,才能决定是不是可以跳过,怎么查看出现的错误呢?
# 在Master主机上查看 Position 做了什么操作
show binlog events in 'mysql-bin.000002' from 6840;
mysql-bin.000002 和 6840 是 Last_Error 中的提示信息,可以用此命令查看错误详情,以此来确定出现的错误信息到底能不能跳过,如果可以那就跳过吧。这个方法可以尝试。
4、遗留问题主从库数据同步开关已经Ok。但是在数据同步失效期间,新增的数据在Slave从数据库上还是不存在的,这些数据再次修改时,还是无法被同步到 Slave从库的。这个只能在想其他办法解决。
以上便是我调整本次主从库同步异常的笔记,希望对你有帮助。