Mysql8 MHA高可用搭建之问题及注意事项
1、执行masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf出现问题
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 19 14:41:23 2021 - [debug] Connecting via SSH from root@10.8.40.77(10.8.40.77:22) to sysadm@10.6.119.241(10.6.119.241:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@10.8.40.77(10.8.40.77:22) to sysadm@10.6.119.241(10.6.119.241:22) failed!
Fri Feb 19 14:41:25 2021 - [debug]
解决:是因为mha的manager和slave在一台机器上,所以/etc/mha/mysql_mha.cnf最后一个注释掉,即把与manager在一台机器上的[server3]注释即可
#[server3]
#hostname=10.6.119.241
#指定该节点不参与master选举
#no_master=1
再次出现1的问题:现象是在sysadm用户下看到的–conf=/etc/mha/mysql_mha.cnf内容与root权限下看到的不一致
解决:原因是从root权限内切换到sysadm 导致ssh免密互通无效,重新打开一个连接session,再次设置ssh-keygen就可以了(这个问题卡了一下午)
2、执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现问题
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Feb 19 14:47:07 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解决:修改配置文件
[server1]
hostname=10.8.40.77
port=3307
#指定该节点可以参与master选举
candidate_master=1
[server2]
hostname=10.8.40.68
port=3307
candidate_master=1
check_repl_delay=0
3、mysql-如何删除主从同步
stop slave;
reset slave all;
4、重新进行主从同步的时候出现Slave_SQL_Running: NO
解决:进入主库
show master status;
+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
| mysql_bin.000003 | 2216 | | | 7d3cc0a4-6955-11eb-8bbd-0cc47a9cd887:1-9,
找到File和Position
在从节点执行
stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_log_file='mysql_bin.mysql_bin.000009',master_log_pos=156;
start slave;
show slave status\G;
5、重新进行主从同步的时候出现Slave_IO_Running: NO
解决:进入错误日志查看具体的报错信息,日志位置/data1/mysql8/log/mariadb/
6、当主库挂了之后,mha进行故障转移的时候出现如下错误
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Monitoring server 10.6.119.241 is NOT reachable!
Mon Feb 22 14:31:33 2021 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Mon Feb 22 14:31:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:33 2021 - [warning] Connection failed 2 time(s)..
Mon Feb 22 14:31:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:34 2021 - [warning] Connection failed 3 time(s)..
Mon Feb 22 14:31:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:35 2021 - [warning] Connection failed 4 time(s)..
Mon Feb 22 14:31:35 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Mon Feb 22 14:31:36 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:36 2021 - [warning] Connection failed 1 time(s)..
解决:At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
这句话意思是至少有一个监控节点是不可达的,就是上面的10.6.119.241这个节点
这是因为77和68都是物理机不是虚机,所以他们之间是有区别的,
77和68都是用root用户登录的,而241节点是需要先登录sysadm账号的,进而才能切换成root账户
是由配置secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.77 -s 10.8.40.68 -s 10.6.119.241引起的
所以不能直接使用masterha_secondary_check,需要对masterha_secondary_check脚本进行修改
当是241服务器的时候切换登录账号为sysadm
foreach my $monitoring_server (@monitoring_servers) {
###############在脚本中增加这一段###########################
my $slave_ip = '10.6.119.241';
if ( $monitoring_server eq "$slave_ip" ){
$ssh_user = "sysadm";
}
###############在脚本中增加这一段###########################
my $ssh_user_host = $ssh_user . '@' . $monitoring_server;
my $command =
"ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e "
. "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new"
. "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, "
. "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); "
. "exit 3; } exit 0;\\\" \"";
print "=====$command";
my $ret = system($command);
7、当mha进行了故障转移之后mha会自动停止
注意:在重新启动mha之前一定要先删除mysql_mha.failover.complete 这个文件
a、注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@myql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha is stopped(2:NOT_RUNNING).
b、提供新的从节点以修复复制集群
原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。
注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,
并再次检测其状态。
我们就以刚刚关闭的那台主作为新添加的机器,来进行数据库的恢复:
原本的 slave1 已经成为了新的主机器,所以,我们对其进行完全备份,而后把备份的数据发送到我们新添加的机器上:
[root@mysql-slave1 ~]# mkdir /backup
[root@mysql-slave1 ~]# mysqldump --all-database > /backup/mysql-backup-`date +%F-%T`-all.sql
[root@mysql-slave1 ~]# scp /backup/mysql-backup-2017-11-23-09\:57\:09-all.sql root@node2:~
然后在 node2 节点上进行数据恢复:
[root@mysql-master ~]# mysql < mysql-backup-2017-11-23-09\:57\:09-all.sql
接下来就是配置主从。照例查看一下现在的主的二进制日志和位置,然后就进行如下设置:
mysql> change master to master_host='172.16.14.202',master_port=3306,
-> master_user='slave',master_password='slave',
-> master_log_file='mysql-bin.000005',master_log_pos=154;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.14.202
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 358
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 524
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
可以看出,我们的主从已经配置好了。
c、/etc/mha/mysql_mha.cnf 修改配置文件
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241
修改成除了主节点外的其他节点
d、新节点提供后再次执行检查操作
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
8、新节点上线, 故障转换恢复注意事项
-
1在生产环境中, 当你的主节点挂了后, 一定要在从节点上做一个备份, 拿着备份文件把主节点手动提升为从节点, 并指明从哪一个日志文件的位置开始复制
-
2每一次自动完成转换后, 每一次的(replication health )检测不ok始终都是启动不了必须手动修复主节点, 除非你改配置文件
-
3手动修复主节点提升为从节点后, 再次运行检测命令
[root@mysql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:9561) is running(0:PING_OK), master:192.168.37.133
- 4再次运行起来就恢复成功了
[root@mysql-mha ~]# masterha_manager --conf=/etc/mha_master/mha.cnf
9、第二次启动mha之后,主节点挂掉出错
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln310] Last failover was done at 2021/02/22 14:31:44. Current time is too early to do failover again. If you want to do failover, manually remove /data1/mysql_mha/mysql_mha.failover.complete and run this script again.
Mon Feb 22 15:41:52 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65.
解决:先删除/data1/mysql_mha/mysql_mha.failover.complete
10、VIP设置问题
解决:vip地址应该选择与服务器地址在同一网段的ip 如:10.8.40.79
10.8.40.80 可以ping通 但是navicat连接不上(这个ip已经被别人占用)
10.8.40.79 成功
ip a 看到如下内容
ip a
inet 10.8.40.79/24 brd 10.8.40.255 scope global secondary eno1:1
11、解决vip漂移问题 网卡名称不一致问题?
解决
修改网卡名称:
1)、vi /etc/default/grub
增加net.ifnames=0 biosdevname=0,最终如下
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb net.ifnames=0 biosdevname=0 quiet”
2)、重新生成GRUB配置并更新内核参数
grub2-mkconfig -o /boot/grub2/grub.cfg
3)、 进入/etc/sysconfig/network-scripts 对应网卡文件中修改
HWADDR=08:00:27:9f:1d:c5(要修改的eno1的MAC地址)
DEVICE=eno1
NAME=eno1
4)、在/etc/udev/rules.d/70-persistent-net.rules中添加自定义规则,若是没有70-persistent-net.rules新建就可以了
增加UBSYSTEM==“net”, ACTION==“add”, DRIVERS=="?*", ATTR{address}==“mac地址”, NAME=“eno1”
5)、reboot命令重启
12、 两个节点多次进行主从切换,执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf报错
Mon Feb 22 16:17:21 2021 - [warning] SQL Thread is stopped(no error) on 10.8.40.77(10.8.40.77:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 10.8.40.68(10.8.40.68:3307), replicating from 10.8.40.77(10.8.40.77:3307)
Master 10.8.40.77(10.8.40.77:3307), replicating from 10.8.40.68(10.8.40.68:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Feb 22 16:17:21 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解决:从服务器set global read_only=1;
13、centos7设置mysql开机自启动
vi /etc/rc.local
/etc/init.d/mysqld8 start
14、mysql_mha.failover.complete文件
如果两次切换之间时间太短(8小时),需要将此文件删掉
15、如果前面启动mha时加了--remove_dead_master_conf参数,则会将旧的主库的信息删除
16、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误
Thu Feb 25 09:00:28 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012 --slave_pass=xxx
Thu Feb 25 09:00:28 2021 - [info] Connecting to sysadm@10.6.119.241(10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Checking slave recovery environment settings..
readdir() attempted on invalid dirhandle $dir at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 271.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:00:29 2021 - [info] Got exit code 1 (Not master dead).
解决:给10.6.119.241下面的sysadm用户的/data1/mysql8赋予权限
chown -R sysadm:sysadm /data1/mysql8/data
17、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误
Thu Feb 25 09:06:57 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012 --slave_pass=xxx
Thu Feb 25 09:06:57 2021 - [info] Connecting to sysadm@10.6.119.241(10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Checking slave recovery environment settings..
Relay log found at /data1/mysql8/data, up to relay_bin.000012
Temporary relay log file is /data1/mysql8/data/relay_bin.000012
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1030 (HY000) at line 1: Got error 168 - 'Unknown (generic) error from engine' from storage engine
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 404.
main::check() called at /usr/bin/apply_diff_relay_logs line 536
eval {...} called at /usr/bin/apply_diff_relay_logs line 514
main::main() called at /usr/bin/apply_diff_relay_logs line 121
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:06:58 2021 - [info] Got exit code 1 (Not master dead).
解决:在mysql配置文件my.cnf增加配置
default-storage-engine=innodb
innodb_force_recovery=0
max_allowed_packet=1024M
18、启动10.6.119.241的mysql出现错误
Starting MySQL.... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).
解决:先去查看错误日志
2021-02-25T01:18:56.929627Z 0 [ERROR] [MY-010274] [Server] Could not open unix socket lock file /var/lib/mysql/mysql.sock.lock.
还是权限的问题
给sysadm用户增加/var/lib/mysql/权限
chown -R sysadm:sysadm /var/lib/mysql
19、yum 不能使用时候,可以直接从另外一台相似环境上复制/etc/yum.repo.d目录,就可以使用了
版权声明:本文为CSDN博主「尚天竹」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_15350581/article/details/114090794