Mysql

Mysql8 MHA高可用搭建之问题及注意事项

2022-02-18  本文已影响0人  前浪浪奔浪流

1、执行masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf出现问题

Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 19 14:41:23 2021 - [debug]  Connecting via SSH from root@10.8.40.77(10.8.40.77:22) to sysadm@10.6.119.241(10.6.119.241:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@10.8.40.77(10.8.40.77:22) to sysadm@10.6.119.241(10.6.119.241:22) failed!
Fri Feb 19 14:41:25 2021 - [debug]

解决:是因为mha的manager和slave在一台机器上,所以/etc/mha/mysql_mha.cnf最后一个注释掉,即把与manager在一台机器上的[server3]注释即可

#[server3]
#hostname=10.6.119.241
#指定该节点不参与master选举
#no_master=1

再次出现1的问题:现象是在sysadm用户下看到的–conf=/etc/mha/mysql_mha.cnf内容与root权限下看到的不一致
解决:原因是从root权限内切换到sysadm 导致ssh免密互通无效,重新打开一个连接session,再次设置ssh-keygen就可以了(这个问题卡了一下午)

2、执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现问题

Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Feb 19 14:47:07 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!

解决:修改配置文件

[server1]
hostname=10.8.40.77
port=3307
#指定该节点可以参与master选举
candidate_master=1
[server2]
hostname=10.8.40.68
port=3307
candidate_master=1
check_repl_delay=0

3、mysql-如何删除主从同步

stop slave;
reset slave all;

4、重新进行主从同步的时候出现Slave_SQL_Running: NO

解决:进入主库
show master status;
    +------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                    |
    +------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
    | mysql_bin.000003 |     2216 |              |                  | 7d3cc0a4-6955-11eb-8bbd-0cc47a9cd887:1-9,
    
找到File和Position

在从节点执行
stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_log_file='mysql_bin.mysql_bin.000009',master_log_pos=156;
start slave;
show slave status\G;

5、重新进行主从同步的时候出现Slave_IO_Running: NO

解决:进入错误日志查看具体的报错信息,日志位置/data1/mysql8/log/mariadb/

6、当主库挂了之后,mha进行故障转移的时候出现如下错误

ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
    *Copyright(c) UIH All rights Reserved*
    Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    Monitoring server 10.6.119.241 is NOT reachable!
    Mon Feb 22 14:31:33 2021 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
    Mon Feb 22 14:31:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
    Mon Feb 22 14:31:33 2021 - [warning] Connection failed 2 time(s)..
    Mon Feb 22 14:31:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
    Mon Feb 22 14:31:34 2021 - [warning] Connection failed 3 time(s)..
    Mon Feb 22 14:31:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
    Mon Feb 22 14:31:35 2021 - [warning] Connection failed 4 time(s)..
    Mon Feb 22 14:31:35 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
    Mon Feb 22 14:31:36 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
    Mon Feb 22 14:31:36 2021 - [warning] Connection failed 1 time(s)..

解决:At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
这句话意思是至少有一个监控节点是不可达的,就是上面的10.6.119.241这个节点
这是因为77和68都是物理机不是虚机,所以他们之间是有区别的,
77和68都是用root用户登录的,而241节点是需要先登录sysadm账号的,进而才能切换成root账户
是由配置secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.77 -s 10.8.40.68 -s 10.6.119.241引起的
所以不能直接使用masterha_secondary_check,需要对masterha_secondary_check脚本进行修改
当是241服务器的时候切换登录账号为sysadm

  foreach my $monitoring_server (@monitoring_servers) {
  ###############在脚本中增加这一段###########################
      my $slave_ip = '10.6.119.241';
      if ( $monitoring_server eq "$slave_ip" ){
        $ssh_user = "sysadm";
      }
  ###############在脚本中增加这一段###########################
      my $ssh_user_host = $ssh_user . '@' . $monitoring_server;
      my $command =
    "ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e "
        . "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new"
        . "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, "
        . "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); "
        . "exit 3; } exit 0;\\\" \"";
      print "=====$command";
      my $ret = system($command);

7、当mha进行了故障转移之后mha会自动停止

注意:在重新启动mha之前一定要先删除mysql_mha.failover.complete 这个文件
a、注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@myql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha is stopped(2:NOT_RUNNING).
    
b、提供新的从节点以修复复制集群
    原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。
    注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,
    并再次检测其状态。   
    我们就以刚刚关闭的那台主作为新添加的机器,来进行数据库的恢复:   
    原本的 slave1 已经成为了新的主机器,所以,我们对其进行完全备份,而后把备份的数据发送到我们新添加的机器上:
[root@mysql-slave1 ~]# mkdir /backup
[root@mysql-slave1 ~]# mysqldump --all-database > /backup/mysql-backup-`date +%F-%T`-all.sql
[root@mysql-slave1 ~]# scp /backup/mysql-backup-2017-11-23-09\:57\:09-all.sql root@node2:~
然后在 node2 节点上进行数据恢复:

[root@mysql-master ~]# mysql < mysql-backup-2017-11-23-09\:57\:09-all.sql
接下来就是配置主从。照例查看一下现在的主的二进制日志和位置,然后就进行如下设置:

mysql> change master to master_host='172.16.14.202',master_port=3306,
        -> master_user='slave',master_password='slave',
        -> master_log_file='mysql-bin.000005',master_log_pos=154;
    Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> start slave;
    Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 172.16.14.202
                      Master_User: slave
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000005
              Read_Master_Log_Pos: 358
                   Relay_Log_File: localhost-relay-bin.000002
                    Relay_Log_Pos: 524
            Relay_Master_Log_File: mysql-bin.000005
                 Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes
     可以看出,我们的主从已经配置好了。  

c、/etc/mha/mysql_mha.cnf 修改配置文件
secondary_check_script=/usr/bin/masterha_secondary_check  -s 10.8.40.68  -s 10.6.119.241
修改成除了主节点外的其他节点
d、新节点提供后再次执行检查操作
 masterha_check_repl --conf=/etc/mha/mysql_mha.cnf

8、新节点上线, 故障转换恢复注意事项

[root@mysql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:9561) is running(0:PING_OK), master:192.168.37.133
[root@mysql-mha ~]# masterha_manager --conf=/etc/mha_master/mha.cnf

9、第二次启动mha之后,主节点挂掉出错

[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln310] Last failover was done at 2021/02/22 14:31:44. Current time is too early to do failover again. If you want to do failover, manually remove /data1/mysql_mha/mysql_mha.failover.complete and run this script again.
Mon Feb 22 15:41:52 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_manager line 65.

解决:先删除/data1/mysql_mha/mysql_mha.failover.complete
10、VIP设置问题
解决:vip地址应该选择与服务器地址在同一网段的ip 如:10.8.40.79
10.8.40.80 可以ping通 但是navicat连接不上(这个ip已经被别人占用)
10.8.40.79 成功
ip a 看到如下内容

ip a
inet 10.8.40.79/24 brd 10.8.40.255 scope global secondary eno1:1

11、解决vip漂移问题 网卡名称不一致问题?
解决
修改网卡名称:
1)、vi /etc/default/grub
增加net.ifnames=0 biosdevname=0,最终如下
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb net.ifnames=0 biosdevname=0 quiet”
2)、重新生成GRUB配置并更新内核参数
grub2-mkconfig -o /boot/grub2/grub.cfg
3)、 进入/etc/sysconfig/network-scripts 对应网卡文件中修改
HWADDR=08:00:27:9f:1d:c5(要修改的eno1的MAC地址)
DEVICE=eno1
NAME=eno1
4)、在/etc/udev/rules.d/70-persistent-net.rules中添加自定义规则,若是没有70-persistent-net.rules新建就可以了
增加UBSYSTEM==“net”, ACTION==“add”, DRIVERS=="?*", ATTR{address}==“mac地址”, NAME=“eno1”
5)、reboot命令重启

12、 两个节点多次进行主从切换,执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf报错

Mon Feb 22 16:17:21 2021 - [warning] SQL Thread is stopped(no error) on 10.8.40.77(10.8.40.77:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 10.8.40.68(10.8.40.68:3307), replicating from 10.8.40.77(10.8.40.77:3307)
Master 10.8.40.77(10.8.40.77:3307), replicating from 10.8.40.68(10.8.40.68:3307)

Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Feb 22 16:17:21 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

解决:从服务器set global read_only=1;

13、centos7设置mysql开机自启动

vi /etc/rc.local
/etc/init.d/mysqld8 start

14、mysql_mha.failover.complete文件
如果两次切换之间时间太短(8小时),需要将此文件删掉
15、如果前面启动mha时加了--remove_dead_master_conf参数,则会将旧的主库的信息删除

16、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误

Thu Feb 25 09:00:28 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012  --slave_pass=xxx
Thu Feb 25 09:00:28 2021 - [info]   Connecting to sysadm@10.6.119.241(10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
  Checking slave recovery environment settings..
readdir() attempted on invalid dirhandle $dir at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 271.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:00:29 2021 - [info] Got exit code 1 (Not master dead).

解决:给10.6.119.241下面的sysadm用户的/data1/mysql8赋予权限
chown -R sysadm:sysadm /data1/mysql8/data

17、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误

Thu Feb 25 09:06:57 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012  --slave_pass=xxx
Thu Feb 25 09:06:57 2021 - [info]   Connecting to sysadm@10.6.119.241(10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
  Checking slave recovery environment settings..
    Relay log found at /data1/mysql8/data, up to relay_bin.000012
    Temporary relay log file is /data1/mysql8/data/relay_bin.000012
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1030 (HY000) at line 1: Got error 168 - 'Unknown (generic) error from engine' from storage engine
mysql command failed with rc 1:0!
 at /usr/bin/apply_diff_relay_logs line 404.
        main::check() called at /usr/bin/apply_diff_relay_logs line 536
        eval {...} called at /usr/bin/apply_diff_relay_logs line 514
        main::main() called at /usr/bin/apply_diff_relay_logs line 121
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:06:58 2021 - [info] Got exit code 1 (Not master dead).

解决:在mysql配置文件my.cnf增加配置

default-storage-engine=innodb
innodb_force_recovery=0
max_allowed_packet=1024M

18、启动10.6.119.241的mysql出现错误

 Starting MySQL.... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).

解决:先去查看错误日志
2021-02-25T01:18:56.929627Z 0 [ERROR] [MY-010274] [Server] Could not open unix socket lock file /var/lib/mysql/mysql.sock.lock.
还是权限的问题
给sysadm用户增加/var/lib/mysql/权限
chown -R sysadm:sysadm /var/lib/mysql

19、yum 不能使用时候,可以直接从另外一台相似环境上复制/etc/yum.repo.d目录,就可以使用了

版权声明:本文为CSDN博主「尚天竹」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_15350581/article/details/114090794

上一篇 下一篇

猜你喜欢

热点阅读