MySQL高可用方案 - 一主二从 + MHA自动故障转移搭建

2023-10-16 本文已影响0人尹楷楷

准备工作

四台机器

1、manager 192.168.201.129 安装mha-manager
2、master 192.168.201.130 安装mysql和mha-node
3、slave1 192.168.201.133 安装mysql和mha-node
4、slave2 192.168.201.134 安装mysql和mha-node

4台主机的主机名修改与时间同步

#主节点master服务器
hostnamectl set-hostname mysql1
su
# 修改主机名并切换
ntpdate ntp.aliyun.com      # 设置时间同步对接在线源

# 修改Slave1、Slave2两台从节点服务器的主机名并切换
Slave1：
hostnamectl set-hostname mysql2
su
ntpdate ntp.aliyun.com # 设置时间同步对接在线源

Slave2：
hostnamectl set-hostname mysql3
su
ntpdate ntp.aliyun.com # 设置时间同步对接在线源

# Manager服务器时间同步
hostnamectl set-hostname manager
su
ntpdate ntp.aliyun.com # 设置时间同步对接在线源

3台mysql的配置

ps：所有mysql都请安装在/usr/local/mysql 默认路径下以防一些环境变量找不到！

# master 
[root@mysql1 data]# cat /etc/my.cnf
[mysqld]
server-id = 1
log_bin = mysql-bin
log_slave_updates = true
binlog_format = row
relay-log = relay-log-bin     
relay-log-index = slave-relay-bin.index 

# slave1 
[root@mysql1 data]# cat /etc/my.cnf
[mysqld]
server-id = 2
log_bin = mysql-bin
log_slave_updates = true
binlog_format = row
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index

# slave2
[root@mysql1 data]# cat /etc/my.cnf
[mysqld]
server-id = 3
log_bin = mysql-bin
log_slave_updates = true
binlog_format = row
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index

每个mysql实例都执行下面语句创建myslave（主从复制用户）和mha用户

#创建主从复制账号myslave
grant replication slave on *.* to 'myslave'@'%.%' identified by '123456';

#创建mha账号
grant all privileges on *.* to 'mha'@'%.%' identified by '123456';

#解决从库通过主机名进行登录主库无法登录问题
grant all privileges on *.* to 'mha'@'mysql1' identified by '123456';
grant all privileges on *.* to 'mha'@'mysql2' identified by '123456';
grant all privileges on *.* to 'mha'@'mysql3' identified by '123456';
flush privileges;

两台从机都执行自主复制命令

一定要保证主从复制是正常的才能继续做 mha！

change master to master_host='192.168.201.130',master_user='myslave',master_password='123456',master_log_file='mysql-bin.000007',master_log_pos=2363; 

# 在 Slave1、Slave2 节点执行同步操作
start slave;

#在 Slave1、Slave2 节点查看数据同步结果
show slave status;
#确保 IO 和 SQL 线程都是 Yes，代表同步正常。
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

#两个从库必须设置为只读模式
set global read_only=1;

安装mha

依赖安装

yum install epel-release --nogpgcheck -y \
yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

node安装

采用npm安装的形式

wget --no-check-certificate  http://githubfast.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
rpm -ivh  mha4mysql-node-0.58-0.el7.centos.noarch.rpm

node安装后生成的文件

#node 组件安装后也会在/usr/local/bin 下面会生成几个脚本（这些工具通常由 MHAManager 的脚本触发，无需人为操作）主要如下：
save_binary_logs      #保存和复制 master 的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的 slave
filter_mysqlbinlog    #去除不必要的 ROLLBACK 事件（MHA 已不再使用这个工具）
purge_relay_logs      #清除中继日志（不会阻塞 SQL 线程）

manager安装

wget --no-check-certificate http://githubfast.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
rpm -ivh  mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
# 还需要拉个压缩包。里面有我们需要的script
wget --no-check-certificate http://githubfast.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
tar -xvf  mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58

# 把脚本复制过去
cp -rp ./samples/scripts/* /usr/local/bin

复制后的脚本：

# 这几个脚本master和slave都安装下
#拷贝后会有四个执行文件，这四脚本文件都需要自己改的！不能直接用。直接用报错
# 复制相关脚本到/usr/local/bin目录(软件包解压缩后就有了，不是必须，因为这些脚本不完整，需要自己修改，这是软件开发着留给我们自己发挥的,如果开启下面的任何一个脚本对应的参数，而对应这里的脚本又没有修改，则会抛错，自己被坑的很惨)
- master_ip_failover   #自动切换时vip管理的脚本，不是必须，如果我们使用keepalived的，我们可以自己编写脚本完成对vip的管理，比如监控mysql，如果mysql异常，我们停止keepalived就行，这样vip就会自动漂移
- master_ip_online_change        #在线手动切换时vip的管理，不是必须，同样可以可以自行编写简单的shell完成
- power_manager                     #故障发生后关闭主机的脚本，不是必须
- send_report                       #因故障切换后发送报警的脚本，不是必须，可自行编写简单的shell完成。

manager安装完成后生成文件

cd /root/mha4mysql-manager-0.58/bin/

masterha_check_ssh #检查 MHA 的 SSH 配置状况
masterha_check_repl #检查 MySQL 复制状况
masterha_manger #启动 manager的脚本
masterha_check_status #检测当前 MHA 运行状态
masterha_master_monitor #检测 master 是否宕机
masterha_master_switch #控制故障转移（自动或者手动）
masterha_conf_host #添加或删除配置的 server 信息
masterha_stop #关闭manager

使用

[root@manager bin]# masterha_check_status   --conf=/etc/masterha/app1.cnf   #测试命令
app1 (pid:32421) is running(0:PING_OK), master:192.168.201.133

4台机器都要配置ssh互相访问

在manager上配置在所有服务器上配置无密码认证

在 manager 节点上配置到所有数据库节点的无密码认证

ssh-keygen -t rsa   #一路按回车键
#分别对三台主机进行导入
ssh-copy-id 192.168.201.130
ssh-copy-id 192.168.201.134
ssh-copy-id 192.168.201.133

在三个MySQL机器上开启ssh互认

192.168.201.130：

ssh-keygen -t rsa   #一路按回车键
#分别对三台主机进行导入
ssh-copy-id 192.168.201.134
ssh-copy-id 192.168.201.133

192.168.201.134：

ssh-keygen -t rsa   #一路按回车键
#分别对三台主机进行导入
ssh-copy-id 192.168.201.130
ssh-copy-id 192.168.201.133

192.168.201.133：

ssh-keygen -t rsa   #一路按回车键
#分别对三台主机进行导入
ssh-copy-id 192.168.201.130
ssh-copy-id 192.168.201.134

配置manager服务（核心）

1、删除原有内容，直接复制并修改vip相关参数

>  /usr/local/bin/master_ip_failover

2、可在拷贝前输入 :set paste 解决vim粘贴乱序问题,我们将脚本内所有内容删除进行重新编写

vim /usr/local/bin/master_ip_failover

编写内容如下
可以使用:set paste 避免格式错误

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################添加内容部分#########################################
my $vip = '192.168.142.200';       #虚拟VIP地址设定
my $brdc = '192.168.142.255';
my $ifdev = 'ens33';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;

#下方注释项为通过IP命令生成网卡和IP地址，并通过IP命令删除对应IP地址，与上方内容选择其中一种即可
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

3、创建 MHA 软件目录并拷贝配置文件
准备一个masterha目录

mkdir /etc/masterha

复制文件

cp /root/mha4mysql-manager-0.58/samples/conf/app1.cnf etc/masterha

manager服务的配置文件app1.cnf复制到准备好的文件目录中
删除原有内容

> /etc/masterha/app1.cnf

vim /etc/masterha/app1.cnf

[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=123456
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=myslave
report_script=/usr/local/bin/send_report　　　　　
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.201.133 -s 192.168.201.134
shutdown_script=""
ssh_user=root
user=mha

[server1]
hostname=192.168.201.130
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.201.133
port=3306

[server3]
hostname=192.168.201.134
port=3306

在master上开启虚拟IP（VIP）

192.168.201.130上执行

#第一次配置需要在 Master 节点上手动
ifconfig ens33:1 192.168.201.200/24

[root@mysql1 data]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:dd:84:c1:71  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.201.130  netmask 255.255.255.0  broadcast 192.168.201.255
        inet6 fe80::f11:981:ab58:83aa  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:f3:0c:71  txqueuelen 1000  (Ethernet)
        RX packets 810978  bytes 84148912 (80.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1112910  bytes 248816223 (237.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.201.200  netmask 255.255.255.0  broadcast 192.168.201.255
        ether 00:0c:29:f3:0c:71  txqueuelen 1000  (Ethernet)

ps：手动删除VIP命令

 ip addr del 192.168.201.200/255.255.255.0 dev ens33:1

在manager上启动mha

启动

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

以上命令参数如下：
--conf=/etc/masterha/app1.cnf #指定配置文件位置
--remove_dead_master_conf #该参数代表当发生主从切换后，老的主库的 ip 将会从配置文件中移除。
--manger_log #日志存放位置。
--ignore_last_failover #在缺省情况下，如果 MHA 检测到连续发生宕机，且两次宕机间隔不足 8 小时的话，则不会进行 Failover

查看manager日志

tail -f /var/log/masterha/app1/manager.log

关闭mha


#若要关闭 manager 服务，可以使用如下命令。
masterha_stop --conf=/etc/masterha/app1.cnf

查看mha状态

[root@manager bin]# masterha_check_status   --conf=/etc/masterha/app1.cnf   #测试命令
app1 (pid:29573) is running(0:PING_OK), master:192.168.201.130
[root@manager bin]# masterha_check_ssh --conf=/etc/masterha/app1.cnf   #测试命令
Mon Oct 16 09:09:35 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct 16 09:09:35 2023 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:09:35 2023 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:09:35 2023 - [info] Starting SSH connection tests..
Mon Oct 16 09:09:36 2023 - [debug] 
Mon Oct 16 09:09:35 2023 - [debug]  Connecting via SSH from root@192.168.201.130(192.168.201.130:22) to root@192.168.201.133(192.168.201.133:22)..
Mon Oct 16 09:09:36 2023 - [debug]   ok.
Mon Oct 16 09:09:36 2023 - [debug]  Connecting via SSH from root@192.168.201.130(192.168.201.130:22) to root@192.168.201.134(192.168.201.134:22)..
Mon Oct 16 09:09:36 2023 - [debug]   ok.
Mon Oct 16 09:09:37 2023 - [debug] 
Mon Oct 16 09:09:36 2023 - [debug]  Connecting via SSH from root@192.168.201.133(192.168.201.133:22) to root@192.168.201.130(192.168.201.130:22)..
Mon Oct 16 09:09:36 2023 - [debug]   ok.
Mon Oct 16 09:09:36 2023 - [debug]  Connecting via SSH from root@192.168.201.133(192.168.201.133:22) to root@192.168.201.134(192.168.201.134:22)..
Mon Oct 16 09:09:37 2023 - [debug]   ok.
Mon Oct 16 09:09:38 2023 - [debug] 
Mon Oct 16 09:09:36 2023 - [debug]  Connecting via SSH from root@192.168.201.134(192.168.201.134:22) to root@192.168.201.130(192.168.201.130:22)..
Mon Oct 16 09:09:37 2023 - [debug]   ok.
Mon Oct 16 09:09:37 2023 - [debug]  Connecting via SSH from root@192.168.201.134(192.168.201.134:22) to root@192.168.201.133(192.168.201.133:22)..
Mon Oct 16 09:09:37 2023 - [debug]   ok.
Mon Oct 16 09:09:38 2023 - [info] All SSH connection tests passed successfully.
[root@manager bin]# masterha_check_repl    --conf=/etc/masterha/app1.cnf   #测试命令
Mon Oct 16 09:09:54 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct 16 09:09:54 2023 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:09:54 2023 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:09:54 2023 - [info] MHA::MasterMonitor version 0.58.
Mon Oct 16 09:09:55 2023 - [info] GTID failover mode = 0
Mon Oct 16 09:09:55 2023 - [info] Dead Servers:
Mon Oct 16 09:09:55 2023 - [info] Alive Servers:
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.133(192.168.201.133:3306)
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.134(192.168.201.134:3306)
Mon Oct 16 09:09:55 2023 - [info] Alive Slaves:
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.130(192.168.201.130:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:09:55 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:09:55 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:09:55 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:09:55 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:09:55 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:09:55 2023 - [info] Current Alive Master: 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:09:55 2023 - [info] Checking slave configurations..
Mon Oct 16 09:09:55 2023 - [info]  read_only=1 is not set on slave 192.168.201.130(192.168.201.130:3306).
Mon Oct 16 09:09:55 2023 - [warning]  relay_log_purge=0 is not set on slave 192.168.201.130(192.168.201.130:3306).
Mon Oct 16 09:09:55 2023 - [info]  read_only=1 is not set on slave 192.168.201.133(192.168.201.133:3306).
Mon Oct 16 09:09:55 2023 - [warning]  relay_log_purge=0 is not set on slave 192.168.201.133(192.168.201.133:3306).
Mon Oct 16 09:09:55 2023 - [warning]  relay_log_purge=0 is not set on slave 192.168.201.134(192.168.201.134:3306).
Mon Oct 16 09:09:55 2023 - [info] Checking replication filtering settings..
Mon Oct 16 09:09:55 2023 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Oct 16 09:09:55 2023 - [info]  Replication filtering check ok.
Mon Oct 16 09:09:55 2023 - [info] GTID (with auto-pos) is not supported
Mon Oct 16 09:09:55 2023 - [info] Starting SSH connection tests..
Mon Oct 16 09:09:58 2023 - [info] All SSH connection tests passed successfully.
Mon Oct 16 09:09:58 2023 - [info] Checking MHA Node version..
Mon Oct 16 09:09:59 2023 - [info]  Version check ok.
Mon Oct 16 09:09:59 2023 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct 16 09:09:59 2023 - [info] HealthCheck: SSH to 192.168.201.130 is reachable.
Mon Oct 16 09:09:59 2023 - [info] Master MHA Node version is 0.58.
Mon Oct 16 09:09:59 2023 - [info] Checking recovery script configurations on 192.168.201.130(192.168.201.130:3306)..
Mon Oct 16 09:09:59 2023 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000009 
Mon Oct 16 09:09:59 2023 - [info]   Connecting to root@192.168.201.130(192.168.201.130:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mysql/data, up to mysql-bin.000009
Mon Oct 16 09:10:00 2023 - [info] Binlog setting check done.
Mon Oct 16 09:10:00 2023 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Oct 16 09:10:00 2023 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.201.130 --slave_ip=192.168.201.130 --slave_port=3306 --workdir=/tmp --target_version=5.7.43-log --manager_version=0.58 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Mon Oct 16 09:10:00 2023 - [info]   Connecting to root@192.168.201.130(192.168.201.130:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000003
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000003
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Oct 16 09:10:00 2023 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.201.133 --slave_ip=192.168.201.133 --slave_port=3306 --workdir=/tmp --target_version=5.7.43-log --manager_version=0.58 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Mon Oct 16 09:10:00 2023 - [info]   Connecting to root@192.168.201.133(192.168.201.133:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000002
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Oct 16 09:10:01 2023 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.201.134 --slave_ip=192.168.201.134 --slave_port=3306 --workdir=/tmp --target_version=5.7.43-log --manager_version=0.58 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Mon Oct 16 09:10:01 2023 - [info]   Connecting to root@192.168.201.134(192.168.201.134:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000002
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Oct 16 09:10:01 2023 - [info] Slaves settings check done.
Mon Oct 16 09:10:01 2023 - [info] 
192.168.201.130(192.168.201.130:3306) (current master)
 +--192.168.201.130(192.168.201.130:3306)
 +--192.168.201.133(192.168.201.133:3306)
 +--192.168.201.134(192.168.201.134:3306)

Mon Oct 16 09:10:01 2023 - [info] Checking replication health on 192.168.201.130..
Mon Oct 16 09:10:01 2023 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln490] Slave IO thread is not running on 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:10:01 2023 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1526]  failed!
Mon Oct 16 09:10:01 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 420.
Mon Oct 16 09:10:01 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Oct 16 09:10:01 2023 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
[root@manager bin]# masterha_master_monitor    --conf=/etc/masterha/app1.cnf   #测试命令
Mon Oct 16 09:11:16 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct 16 09:11:16 2023 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:11:16 2023 - [info] Reading server configuration from /etc/masterha/app1.cnf..

查看当前master是谁,manager执行

# 查看主节点master是谁
masterha_check_status --conf=/etc/masterha/app1.cnf
#查看当前mater状态可以查看VIP所在位置，即当前为主节点的主机
cat /var/log/masterha/app1/manager.log | grep "current master"

测试MHA故障转移功能

1、停掉master的mysqld
service mysqld stop
2、监控manager日志输出

Mon Oct 16 09:11:23 2023 - [info] Slaves settings check done.
Mon Oct 16 09:11:23 2023 - [info] 
192.168.201.130(192.168.201.130:3306) (current master)
 +--192.168.201.130(192.168.201.130:3306)
 +--192.168.201.133(192.168.201.133:3306)
 +--192.168.201.134(192.168.201.134:3306)

Mon Oct 16 09:11:23 2023 - [info] Checking master_ip_failover_script status:
Mon Oct 16 09:11:23 2023 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.201.130 --orig_master_ip=192.168.201.130 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.201.200===

Checking the Status of the script.. OK 
Mon Oct 16 09:11:23 2023 - [info]  OK.
Mon Oct 16 09:11:23 2023 - [warning] shutdown_script is not defined.
Mon Oct 16 09:11:23 2023 - [info] Set master ping interval 1 seconds.
Mon Oct 16 09:11:23 2023 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.201.133 -s 192.168.201.134
Mon Oct 16 09:11:23 2023 - [info] Starting ping health check on 192.168.201.130(192.168.201.130:3306)..
Mon Oct 16 09:11:27 2023 - [warning] Got error when monitoring master:  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 489.
Mon Oct 16 09:11:27 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln491] Target master's advisory lock is already held by someone. Please check whether you monitor the same master from multiple monitoring processes.
Mon Oct 16 09:11:27 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln511] Error happened on health checking.  at /usr/local/bin/masterha_master_monitor line 48.
Mon Oct 16 09:11:27 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Oct 16 09:11:27 2023 - [info] Got exit code 1 (Not master dead).
Mon Oct 16 09:14:04 2023 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon Oct 16 09:14:04 2023 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.201.133 -s 192.168.201.134  --user=root  --master_host=192.168.201.130  --master_ip=192.168.201.130  --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECT
Mon Oct 16 09:14:04 2023 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Mon Oct 16 09:14:04 2023 - [info] HealthCheck: SSH to 192.168.201.130 is reachable.
Monitoring server 192.168.201.133 is reachable, Master is not reachable from 192.168.201.133. OK.
Monitoring server 192.168.201.134 is reachable, Master is not reachable from 192.168.201.134. OK.
Mon Oct 16 09:14:05 2023 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Oct 16 09:14:05 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.201.130' (111))
Mon Oct 16 09:14:05 2023 - [warning] Connection failed 2 time(s)..
Mon Oct 16 09:14:06 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.201.130' (111))
Mon Oct 16 09:14:06 2023 - [warning] Connection failed 3 time(s)..
Mon Oct 16 09:14:07 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.201.130' (111))
Mon Oct 16 09:14:07 2023 - [warning] Connection failed 4 time(s)..
Mon Oct 16 09:14:07 2023 - [warning] Master is not reachable from health checker!
Mon Oct 16 09:14:07 2023 - [warning] Master 192.168.201.130(192.168.201.130:3306) is not reachable!
Mon Oct 16 09:14:07 2023 - [warning] SSH is reachable.
Mon Oct 16 09:14:07 2023 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Oct 16 09:14:07 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct 16 09:14:07 2023 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:14:07 2023 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Oct 16 09:14:08 2023 - [info] GTID failover mode = 0
Mon Oct 16 09:14:08 2023 - [info] Dead Servers:
Mon Oct 16 09:14:08 2023 - [info]   192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:08 2023 - [info] Alive Servers:
Mon Oct 16 09:14:08 2023 - [info]   192.168.201.133(192.168.201.133:3306)
Mon Oct 16 09:14:08 2023 - [info]   192.168.201.134(192.168.201.134:3306)
Mon Oct 16 09:14:08 2023 - [info] Alive Slaves:
Mon Oct 16 09:14:08 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:08 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:08 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:14:08 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:08 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:08 2023 - [info] Checking slave configurations..
Mon Oct 16 09:14:08 2023 - [info]  read_only=1 is not set on slave 192.168.201.133(192.168.201.133:3306).
Mon Oct 16 09:14:08 2023 - [warning]  relay_log_purge=0 is not set on slave 192.168.201.133(192.168.201.133:3306).
Mon Oct 16 09:14:08 2023 - [warning]  relay_log_purge=0 is not set on slave 192.168.201.134(192.168.201.134:3306).
Mon Oct 16 09:14:08 2023 - [info] Checking replication filtering settings..
Mon Oct 16 09:14:08 2023 - [info]  Replication filtering check ok.
Mon Oct 16 09:14:08 2023 - [info] Master is down!
Mon Oct 16 09:14:08 2023 - [info] Terminating monitoring script.
Mon Oct 16 09:14:08 2023 - [info] Got exit code 20 (Master dead).
Mon Oct 16 09:14:08 2023 - [info] MHA::MasterFailover version 0.58.
Mon Oct 16 09:14:08 2023 - [info] Starting master failover.
Mon Oct 16 09:14:08 2023 - [info] 
Mon Oct 16 09:14:08 2023 - [info] * Phase 1: Configuration Check Phase..
Mon Oct 16 09:14:08 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] GTID failover mode = 0
Mon Oct 16 09:14:09 2023 - [info] Dead Servers:
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info] Checking master reachability via MySQL(double check)...
Mon Oct 16 09:14:09 2023 - [info]  ok.
Mon Oct 16 09:14:09 2023 - [info] Alive Servers:
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.133(192.168.201.133:3306)
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.134(192.168.201.134:3306)
Mon Oct 16 09:14:09 2023 - [info] Alive Slaves:
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info] Starting Non-GTID based failover.
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Oct 16 09:14:09 2023 - [info] Executing master IP deactivation script:
Mon Oct 16 09:14:09 2023 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.201.130 --orig_master_ip=192.168.201.130 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.201.200===

Disabling the VIP on old master: 192.168.201.130 
Mon Oct 16 09:14:09 2023 - [info]  done.
Mon Oct 16 09:14:09 2023 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Oct 16 09:14:09 2023 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] * Phase 3: Master Recovery Phase..
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:154
Mon Oct 16 09:14:09 2023 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:154
Mon Oct 16 09:14:09 2023 - [info] Oldest slaves:
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:14:09 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:09 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:09 2023 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Mon Oct 16 09:14:09 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] Fetching dead master's binary logs..
Mon Oct 16 09:14:10 2023 - [info] Executing command on the dead master 192.168.201.130(192.168.201.130:3306): save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=154 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/saved_master_binlog_from_192.168.201.130_3306_20231016091408.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 154 to mysql-bin.000009 EOF into /tmp/saved_master_binlog_from_192.168.201.130_3306_20231016091408.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  No need to dump effective binlog data from /usr/local/mysql/data/mysql-bin.000009 (pos starts 154, filesize 154). Skipping.
 Binlog Checksum enabled
 /tmp/saved_master_binlog_from_192.168.201.130_3306_20231016091408.binlog has no effective data events.
Event not exists.
Mon Oct 16 09:14:10 2023 - [info] Additional events were not found from the orig master. No need to save.
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] * Phase 3.3: Determining New Master Phase..
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Mon Oct 16 09:14:10 2023 - [info] All slaves received relay logs to the same position. No need to resync each other.
Mon Oct 16 09:14:10 2023 - [info] Searching new master from slaves..
Mon Oct 16 09:14:10 2023 - [info]  Candidate masters from the configuration file:
Mon Oct 16 09:14:10 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Mon Oct 16 09:14:10 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Mon Oct 16 09:14:10 2023 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Oct 16 09:14:10 2023 - [info]  Non-candidate masters:
Mon Oct 16 09:14:10 2023 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Oct 16 09:14:10 2023 - [info] New master is 192.168.201.133(192.168.201.133:3306)
Mon Oct 16 09:14:10 2023 - [info] Starting master failover..
Mon Oct 16 09:14:10 2023 - [info] 
From:
192.168.201.130(192.168.201.130:3306) (current master)
 +--192.168.201.133(192.168.201.133:3306)
 +--192.168.201.134(192.168.201.134:3306)

To:
192.168.201.133(192.168.201.133:3306) (new master)
 +--192.168.201.134(192.168.201.134:3306)
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] * Phase 3.5: Master Log Apply Phase..
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Oct 16 09:14:10 2023 - [info] Starting recovery on 192.168.201.133(192.168.201.133:3306)..
Mon Oct 16 09:14:10 2023 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Mon Oct 16 09:14:10 2023 - [info]   done.
Mon Oct 16 09:14:10 2023 - [info]  All relay logs were successfully applied.
Mon Oct 16 09:14:10 2023 - [info] Getting new master's binlog name and position..
Mon Oct 16 09:14:10 2023 - [info]  mysql-bin.000009:400
Mon Oct 16 09:14:10 2023 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.201.133', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=400, MASTER_USER='myslave', MASTER_PASSWORD='xxx';
Mon Oct 16 09:14:10 2023 - [info] Executing master IP activate script:
Mon Oct 16 09:14:10 2023 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.201.130 --orig_master_ip=192.168.201.130 --orig_master_port=3306 --new_master_host=192.168.201.133 --new_master_ip=192.168.201.133 --new_master_port=3306 --new_master_user='mha'   --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.201.200===

Enabling the VIP - 192.168.201.200 on the new master - 192.168.201.133 
Mon Oct 16 09:14:10 2023 - [info]  OK.
Mon Oct 16 09:14:10 2023 - [info] ** Finished master recovery successfully.
Mon Oct 16 09:14:10 2023 - [info] * Phase 3: Master Recovery Phase completed.
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] * Phase 4: Slaves Recovery Phase..
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Oct 16 09:14:10 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info] -- Slave diff file generation on host 192.168.201.134(192.168.201.134:3306) started, pid: 51829. Check tmp log /var/log/masterha/app1/192.168.201.134_3306_20231016091408.log if it takes time..
Mon Oct 16 09:14:11 2023 - [info] 
Mon Oct 16 09:14:11 2023 - [info] Log messages from 192.168.201.134 ...
Mon Oct 16 09:14:11 2023 - [info] 
Mon Oct 16 09:14:10 2023 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Oct 16 09:14:11 2023 - [info] End of log messages from 192.168.201.134.
Mon Oct 16 09:14:11 2023 - [info] -- 192.168.201.134(192.168.201.134:3306) has the latest relay log events.
Mon Oct 16 09:14:11 2023 - [info] Generating relay diff files from the latest slave succeeded.
Mon Oct 16 09:14:11 2023 - [info] 
Mon Oct 16 09:14:11 2023 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Oct 16 09:14:11 2023 - [info] 
Mon Oct 16 09:14:11 2023 - [info] -- Slave recovery on host 192.168.201.134(192.168.201.134:3306) started, pid: 51898. Check tmp log /var/log/masterha/app1/192.168.201.134_3306_20231016091408.log if it takes time..
Mon Oct 16 09:14:12 2023 - [info] 
Mon Oct 16 09:14:12 2023 - [info] Log messages from 192.168.201.134 ...
Mon Oct 16 09:14:12 2023 - [info] 
Mon Oct 16 09:14:11 2023 - [info] Starting recovery on 192.168.201.134(192.168.201.134:3306)..
Mon Oct 16 09:14:11 2023 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Mon Oct 16 09:14:11 2023 - [info]   done.
Mon Oct 16 09:14:11 2023 - [info]  All relay logs were successfully applied.
Mon Oct 16 09:14:11 2023 - [info]  Resetting slave 192.168.201.134(192.168.201.134:3306) and starting replication from the new master 192.168.201.133(192.168.201.133:3306)..
Mon Oct 16 09:14:11 2023 - [info]  Executed CHANGE MASTER.
Mon Oct 16 09:14:12 2023 - [info]  Slave started.
Mon Oct 16 09:14:12 2023 - [info] End of log messages from 192.168.201.134.
Mon Oct 16 09:14:12 2023 - [info] -- Slave recovery on host 192.168.201.134(192.168.201.134:3306) succeeded.
Mon Oct 16 09:14:12 2023 - [info] All new slave servers recovered successfully.
Mon Oct 16 09:14:12 2023 - [info] 
Mon Oct 16 09:14:12 2023 - [info] * Phase 5: New master cleanup phase..
Mon Oct 16 09:14:12 2023 - [info] 
Mon Oct 16 09:14:12 2023 - [info] Resetting slave info on the new master..
Mon Oct 16 09:14:12 2023 - [info]  192.168.201.133: Resetting slave info succeeded.
Mon Oct 16 09:14:12 2023 - [info] Master failover to 192.168.201.133(192.168.201.133:3306) completed successfully.
Mon Oct 16 09:14:12 2023 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Mon Oct 16 09:14:12 2023 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.201.130(192.168.201.130:3306) to 192.168.201.133(192.168.201.133:3306) succeeded

# 检测到192.168.201.130宕机了！
Master 192.168.201.130(192.168.201.130:3306) is down!

Check MHA Manager logs at manager:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.201.130(192.168.201.130:3306)
The latest slave 192.168.201.133(192.168.201.133:3306) has all relay logs for recovery.
# 选取了192.168.201.133这台slave升级为新的master
Selected 192.168.201.133(192.168.201.133:3306) as a new master.
192.168.201.133(192.168.201.133:3306): OK: Applying all logs succeeded.
192.168.201.133(192.168.201.133:3306): OK: Activated master IP address.
192.168.201.134(192.168.201.134:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
# 192.168.201.134重新连接新的master：192.168.201.133
192.168.201.134(192.168.201.134:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.201.133(192.168.201.133:3306)
192.168.201.133(192.168.201.133:3306): Resetting slave info succeeded.
Master failover to 192.168.201.133(192.168.201.133:3306) completed successfully.
Mon Oct 16 09:14:12 2023 - [info] Sending mail..
sh: /usr/local/bin/send_report　　　　　: 没有那个文件或目录
Mon Oct 16 09:14:12 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 127:0

以上日志已经告诉我们故障转移成功了。然后我们还需验证下VIP是不是漂移到133上了：
133上执行ifconfig：

[root@mysql2 ~]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:0c:c8:31:55  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.201.133  netmask 255.255.255.0  broadcast 192.168.201.255
        inet6 fe80::29d9:6857:190d:2ad  prefixlen 64  scopeid 0x20<link>
        inet6 fe80::f11:981:ab58:83aa  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:52:09:f8  txqueuelen 1000  (Ethernet)
        RX packets 725131  bytes 75838492 (72.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1009172  bytes 229494515 (218.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.201.200  netmask 255.255.255.0  broadcast 192.168.201.255
        ether 00:0c:29:52:09:f8  txqueuelen 1000  (Ethernet)

ok没问题
再检测下现在master是谁？

[root@manager bin]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:32421) is running(0:PING_OK), master:192.168.201.133
[root@manager bin]# cat /var/log/masterha/app1/manager.log | grep "current master"
Mon Oct 16 09:08:25 2023 - [info] Checking SSH publickey authentication settings on the current master..
192.168.201.130(192.168.201.130:3306) (current master)
Mon Oct 16 09:11:21 2023 - [info] Checking SSH publickey authentication settings on the current master..
192.168.201.130(192.168.201.130:3306) (current master)
Mon Oct 16 09:14:09 2023 - [info] Forcing shutdown so that applications never connect to the current master..
192.168.201.130(192.168.201.130:3306) (current master)
[root@manager bin]#

再进去134（slave2）看下它的主是谁

show slave status;
| Waiting for master to send event | 192.168.201.133

完美~

配置mha故障转移告警邮件

1、首先确保你环境可以上外网
ping 8.8.8.8
2、如果你想修改邮件的收件人，那么需要修改manager节点的/usr/local/bin/send_report文件，将其中的lhrbest@qq.com修改为收件人的邮箱地址即可。文件/usr/local/bin/send_report内容如下所示：

1[root@MHA-LHR-Monitor-ip134 /]# cat /usr/local/bin/send_report
#!/usr/bin/perl
 
#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 
## Note: This is a sample script and is not complete. Modify the script based on your environment.
 
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.qq.com';
my $mail_from='xx@qq.com';
my $mail_user='xx@qq.com';
my $mail_pass='xxx';
#my $mail_to=['xx@aliyun.com',];
my $mail_to='xx@aliyun.com';
 
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s'  => \$new_master_host,
'new_slave_hosts=s'  => \$new_slave_hosts,
'subject=s'          => \$subject,
'body=s'             => \$body,
);
 
# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, ">/var/log/masterha/app1/mail.log"
        or die "Can't open the debug    file:$!\n";
    my $sender = new Mail::Sender {
        ctype        => 'text/plain;charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed    => '0',
        authid        => $mail_user,
        authpwd        => $mail_pass,
        to        => $mail_to,
        subject        => $subject,
        debug        => $DEBUG
    };
    $sender->MailMsg(
        {
            msg => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}
 
exit 0;

消除windows下的特殊字符

 sed -i 's/\r$//'  /usr/local/bin/send_report

成功后的，收件信息：

image.png

手动恢复宕机的旧master，将它作为从节点重新加入集群

这是我没想到的，MHA居然不支持那种宕机后主节点重新启动后自动恢复成从节点。居然还要手动操作？

# 启动旧master
service mysqld start
# manager上执行检索日志命令，得到change master命令
[root@manager ~]# grep "All other slaves should start replication from here"  /var/log/masterha/app1/manager.log
Tue Oct 17 01:25:16 2023 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.201.133', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=154, MASTER_USER='myslave', MASTER_PASSWORD='xxx';
# 旧master执行change master
CHANGE MASTER TO MASTER_HOST='192.168.201.133', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000012', MASTER_LOG_POS=154, MASTER_USER='myslave', MASTER_PASSWORD='123456';
start slave;
show slave status;

# 编辑manager的app1.cnf文件将剔除的mysql连接重进加进去
[server1]
hostname=192.168.201.130
port=3306
# manager上检查mha状态
masterha_check_repl --conf=/etc/masterha/app1.cnf
# 启动mha manager
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

手动进行故障转移（手动指定新主）

测试场景二测试的是，在主库故障后，MHA进程未启动的情况下，我们手动来切换。这种情况为MySQL主从关系中主库因为故障宕机了，但是MHA Master监控并没有开启，这个时候就需要手动来failover了。该情况下，日志打印输出和自动failover是没有什么区别的。需要注意的是，如果主库未宕机，那么不能手动执行故障切换，会报错的。

需要在manager节点中把master_ip_online_change脚本上改改，添加一个addvip和一个removevip函数

vi /usr/local/bin/master_ip_online_change

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $vip = '192.168.201.200';       #虚拟VIP地址设定
my $brdc = '192.168.201.255';
my $ifdev = 'ens33';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;
my $_running_interval = 0.1;

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}




sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
     # print current_time_us() . " Drpping app user on the orig master..\n";
      print current_time_us() . " drop vip $vip..\n";
      #drop_app_user($orig_master_handler);
     &stop_vip();

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      print current_time_us() . "Add vip $vip on $ifdev..\n";
     # create_app_user($new_master_handler);
      &start_vip();
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}

执行命令，把主节点切换为133

[root@manager ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf  --master_state=alive --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0 --new_master_host=192.168.201.133 --new_master_port=3306
Tue Oct 17 03:34:49 2023 - [info] MHA::MasterRotate version 0.58.
Tue Oct 17 03:34:49 2023 - [info] Starting online master switch..
Tue Oct 17 03:34:49 2023 - [info] 
Tue Oct 17 03:34:49 2023 - [info] * Phase 1: Configuration Check Phase..
Tue Oct 17 03:34:49 2023 - [info] 
Tue Oct 17 03:34:49 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Oct 17 03:34:49 2023 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Oct 17 03:34:49 2023 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Oct 17 03:34:50 2023 - [info] GTID failover mode = 0
Tue Oct 17 03:34:50 2023 - [info] Current Alive Master: 192.168.201.130(192.168.201.130:3306)
Tue Oct 17 03:34:50 2023 - [info] Alive Slaves:
Tue Oct 17 03:34:50 2023 - [info]   192.168.201.133(192.168.201.133:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Tue Oct 17 03:34:50 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Tue Oct 17 03:34:50 2023 - [info]   192.168.201.134(192.168.201.134:3306)  Version=5.7.43-log (oldest major version between slaves) log-bin:enabled
Tue Oct 17 03:34:50 2023 - [info]     Replicating from 192.168.201.130(192.168.201.130:3306)
Tue Oct 17 03:34:50 2023 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Oct 17 03:34:50 2023 - [info]  ok.
Tue Oct 17 03:34:50 2023 - [info] Checking MHA is not monitoring or doing failover..
Tue Oct 17 03:34:50 2023 - [info] Checking replication health on 192.168.201.133..
Tue Oct 17 03:34:50 2023 - [info]  ok.
Tue Oct 17 03:34:50 2023 - [info] Checking replication health on 192.168.201.134..
Tue Oct 17 03:34:50 2023 - [info]  ok.
Tue Oct 17 03:34:50 2023 - [info] 192.168.201.133 can be new master.
Tue Oct 17 03:34:50 2023 - [info] 
From:
192.168.201.130(192.168.201.130:3306) (current master)
 +--192.168.201.133(192.168.201.133:3306)
 +--192.168.201.134(192.168.201.134:3306)

To:
192.168.201.133(192.168.201.133:3306) (new master)
 +--192.168.201.134(192.168.201.134:3306)
 +--192.168.201.130(192.168.201.130:3306)
Tue Oct 17 03:34:50 2023 - [info] Checking whether 192.168.201.133(192.168.201.133:3306) is ok for the new master..
Tue Oct 17 03:34:50 2023 - [info]  ok.
Tue Oct 17 03:34:50 2023 - [info] 192.168.201.130(192.168.201.130:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Tue Oct 17 03:34:50 2023 - [info] 192.168.201.130(192.168.201.130:3306): Resetting slave pointing to the dummy host.
Tue Oct 17 03:34:50 2023 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Oct 17 03:34:50 2023 - [info] 
Tue Oct 17 03:34:50 2023 - [info] * Phase 2: Rejecting updates Phase..
Tue Oct 17 03:34:50 2023 - [info] 
Tue Oct 17 03:34:50 2023 - [info] Executing master ip online change script to disable write on the current master:
Tue Oct 17 03:34:50 2023 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.201.130 --orig_master_ip=192.168.201.130 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=192.168.201.133 --new_master_ip=192.168.201.133 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Tue Oct 17 03:34:50 2023 992463 Set read_only on the new master.. ok.
Tue Oct 17 03:34:50 2023 999285 drop vip 192.168.201.200..
Tue Oct 17 03:34:51 2023 182422 Set read_only=1 on the orig master.. ok.
Tue Oct 17 03:34:51 2023 195297 Killing all application threads..
Tue Oct 17 03:34:51 2023 195339 done.
Tue Oct 17 03:34:51 2023 - [info]  ok.
Tue Oct 17 03:34:51 2023 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Oct 17 03:34:51 2023 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Oct 17 03:34:51 2023 - [info]  ok.
Tue Oct 17 03:34:51 2023 - [info] Orig master binlog:pos is mysql-bin.000017:154.
Tue Oct 17 03:34:51 2023 - [info]  Waiting to execute all relay logs on 192.168.201.133(192.168.201.133:3306)..
Tue Oct 17 03:34:51 2023 - [info]  master_pos_wait(mysql-bin.000017:154) completed on 192.168.201.133(192.168.201.133:3306). Executed 0 events.
Tue Oct 17 03:34:51 2023 - [info]   done.
Tue Oct 17 03:34:51 2023 - [info] Getting new master's binlog name and position..
Tue Oct 17 03:34:51 2023 - [info]  mysql-bin.000014:154
Tue Oct 17 03:34:51 2023 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.201.133', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000014', MASTER_LOG_POS=154, MASTER_USER='myslave', MASTER_PASSWORD='xxx';
Tue Oct 17 03:34:51 2023 - [info] Executing master ip online change script to allow write on the new master:
Tue Oct 17 03:34:51 2023 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.201.130 --orig_master_ip=192.168.201.130 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=192.168.201.133 --new_master_ip=192.168.201.133 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Tue Oct 17 03:34:51 2023 314906 Set read_only=0 on the new master.
Tue Oct 17 03:34:51 2023 316411Add vip 192.168.201.200 on ens33..
Tue Oct 17 03:34:51 2023 - [info]  ok.
Tue Oct 17 03:34:51 2023 - [info] 
Tue Oct 17 03:34:51 2023 - [info] * Switching slaves in parallel..
Tue Oct 17 03:34:51 2023 - [info] 
Tue Oct 17 03:34:51 2023 - [info] -- Slave switch on host 192.168.201.134(192.168.201.134:3306) started, pid: 13607
Tue Oct 17 03:34:51 2023 - [info] 
Tue Oct 17 03:34:52 2023 - [info] Log messages from 192.168.201.134 ...
Tue Oct 17 03:34:52 2023 - [info] 
Tue Oct 17 03:34:51 2023 - [info]  Waiting to execute all relay logs on 192.168.201.134(192.168.201.134:3306)..
Tue Oct 17 03:34:51 2023 - [info]  master_pos_wait(mysql-bin.000017:154) completed on 192.168.201.134(192.168.201.134:3306). Executed 0 events.
Tue Oct 17 03:34:51 2023 - [info]   done.
Tue Oct 17 03:34:51 2023 - [info]  Resetting slave 192.168.201.134(192.168.201.134:3306) and starting replication from the new master 192.168.201.133(192.168.201.133:3306)..
Tue Oct 17 03:34:51 2023 - [info]  Executed CHANGE MASTER.
Tue Oct 17 03:34:51 2023 - [info]  Slave started.
Tue Oct 17 03:34:52 2023 - [info] End of log messages from 192.168.201.134 ...
Tue Oct 17 03:34:52 2023 - [info] 
Tue Oct 17 03:34:52 2023 - [info] -- Slave switch on host 192.168.201.134(192.168.201.134:3306) succeeded.
Tue Oct 17 03:34:52 2023 - [info] Unlocking all tables on the orig master:
Tue Oct 17 03:34:52 2023 - [info] Executing UNLOCK TABLES..
Tue Oct 17 03:34:52 2023 - [info]  ok.
Tue Oct 17 03:34:52 2023 - [info] Starting orig master as a new slave..
Tue Oct 17 03:34:52 2023 - [info]  Resetting slave 192.168.201.130(192.168.201.130:3306) and starting replication from the new master 192.168.201.133(192.168.201.133:3306)..
Tue Oct 17 03:34:52 2023 - [info]  Executed CHANGE MASTER.
Tue Oct 17 03:34:52 2023 - [info]  Slave started.
Tue Oct 17 03:34:52 2023 - [info] All new slave servers switched successfully.
Tue Oct 17 03:34:52 2023 - [info] 
Tue Oct 17 03:34:52 2023 - [info] * Phase 5: New master cleanup phase..
Tue Oct 17 03:34:52 2023 - [info] 
Tue Oct 17 03:34:52 2023 - [info]  192.168.201.133: Resetting slave info succeeded.
Tue Oct 17 03:34:52 2023 - [info] Switching master to 192.168.201.133(192.168.201.133:3306) completed successfully.

查看当前mha状态

masterha_check_repl --conf=/etc/masterha/app1.cnf

Tue Oct 17 03:37:42 2023 - [info] 
192.168.201.133(192.168.201.133:3306) (current master)
 +--192.168.201.130(192.168.201.130:3306)
 +--192.168.201.134(192.168.201.134:3306)

手动切换成功！

问题

1、 show slave status 的Slave_IO_Running为 No

Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
由于两个主机是从同一母机克隆下来的，所以mysql环境配置相同，即主从mysql的UUID相同，然而在配置主从模式要求UUID不能相同，所以出现此问题。

解决：将slave虚拟机上的UUID的配置文件（auto.cnf）删除然后重启mysql服务即可

1.查找auto.cnf文件：
locate auto.cnf
2. 删除
rm -rf /var/lib/mysql/auto.cnf
3. 重启mysql
service mysqld restart

2、manager的错误日志。找不到mysqlbinlog 命令

Mon Oct 16 07:18:15 2023 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.201.130 --slave_ip=192.168.201.130 --slave_port=3306 --workdir=/tmp --target_version=5.7.43-log --manager_version=0.58 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Mon Oct 16 07:18:15 2023 - [info]   Connecting to root@192.168.201.130(192.168.201.130:22).. 
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options
 at /usr/bin/apply_diff_relay_logs line 532.

解决：cp /usr/local/mysql/bin/mysqlbinlog /usr/bin

2、mysql命令找不到

sh: mysql: 未找到命令
mysql command failed with rc 127:0!

解决： cp /usr/local/mysql/bin/mysql /usr/bin

3、机器重启后vip会丢失，应用不可用

4、manager报错：
Mon Oct 16 21:48:30 2023 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 192.168.201.133(192.168.201.133:3306)
Master 192.168.201.130(192.168.201.130:3306), replicating from 192.168.201.130(192.168.201.130:3306)
解决：从机上必须要设置只读 set global read_only=1;

5、manager报错：
Master failover to 192.168.201.133(192.168.201.133:3306) completed successfully.
Mon Oct 16 22:12:44 2023 - [info] Sending mail..
sh: /usr/local/bin/send_report　　　　　: 没有那个文件或目录
Mon Oct 16 22:12:44 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 127:0

6、masterha_check_repl 检查mha状态报错

今天我在搭建中，不小心在主库也执行了start slave，结果在manager上执行
masterha_check_repl --conf=/etc/masterha/app1.cnf命令时出现错误，复制环境检查不成功

Sun Jun 23 17:45:58 2019 - [info] Checking replication health on 172.16.8.12..
Sun Jun 23 17:45:58 2019 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln472] Slave IO thread is not running on 172.16.8.12(172.16.8.12:3306)
Sun Jun 23 17:45:58 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln1429] failed!
Sun Jun 23 17:45:58 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 379
Sun Jun 23 17:45:58 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Sun Jun 23 17:45:58 2019 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

因为不小心在master上也执行了start slave 导致问题。再执行下面reset就行了
mysql> reset slave all;
Query OK, 0 rows affected (0.02 sec)