MGR节点加入的几种常见情况

2020-09-17  本文已影响0人  左轮Lee

OS: CentOS Linux release 7.7.1908 (Core)
DB version: 5.7.25-log
MGR MODE:单主模式 (group_replication_single_primary_mode=ON)
Node:192.168.211.128:3310(主) / 192.168.211.128:3311 / 192.168.211.128:3312

目录
一、加入节点3312,可正常从主节点拉取所有binlog
二、加入节点3311,无法从主节点拉取所有binlog
三、加入节点3311,无法从所有donor拉取所有binlog
        > 加入集群时报错
        > 通过备份恢复的方式加入集群
四、加入的节点3311有本地执行事务

一、加入节点3312,可正常从主节点拉取所有binlog

前提:
MGR部署完成且已启动节点3310,3311

主节点执行,查看是否有删除的日志:
(sit)root@localhost [(none)]> show global VARIABLES like '%gtid_purged%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_purged   |       |
+---------------+-------+
1 row in set (0.00 sec)

新节点直接加入:
## 初始化实例
[root@localhost mysql]# bin/mysqld --defaults-file=/etc/my3312.cnf --initialize-insecure --user=mysql  

##  启动
[root@localhost mysql]# mysqld --defaults-file=/etc/my3312.cnf & 

## 登录及检查gtid_executed
[root@localhost mysql]# mysql -P3312 -uroot -S /data/mysql/mgr3312/mysql.sock
(sit)root@localhost [(none)]> show databases ;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

### 查看当前节点的 gtid_executed,有值的话需要分析
(sit)root@localhost [(none)]> show global VARIABLES like '%gtid%';
+---------------------------------------------------+---------+
| Variable_name                                     | Value   |
+---------------------------------------------------+---------+
| gtid_executed                                     |         |
+---------------------------------------------------+---------+

####  安装MGR插件
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so'; 

####  指定恢复渠道
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';

#### 开启组复制
START GROUP_REPLICATION;

#### 数据同步已完成
(sit)root@localhost [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST           | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 8d1369dd-f735-11ea-a075-000c295d8ce8 | localhost.localdomain |        3312 | ONLINE       |
| group_replication_applier | be5b5510-e8ce-11ea-8010-000c295d8ce8 | localhost.localdomain |        3311 | ONLINE       |
| group_replication_applier | f48a43eb-e8ce-11ea-86bc-000c295d8ce8 | localhost.localdomain |        3310 | ONLINE       |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+

(sit)root@localhost [(none)]> show databases ;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test_mgr           |
+--------------------+

此种情况最简单,加入节点完毕!
可查看节点3312的error.log了解其加入集群的过程。

二、加入节点3311,无法从主节点拉取所有binlog

前提:
MGR部署完成且已启动节点3310,3312

主节点3310 purge部分日志,制造binlog丢失情况。(删除1f1f567e-e8f3-11ea-a254-fa163eb74263:1-13日志)
mysql> purge BINARY logs to 'mysql-bin.000004';
mysql> show global VARIABLES like '%gtid%';
+---------------------------------------------------+------------------------------------------------------------------------------------+
| Variable_name                                     | Value                                                                              |
+---------------------------------------------------+------------------------------------------------------------------------------------+
| binlog_gtid_simple_recovery                       | ON                                                                                 |
| enforce_gtid_consistency                          | ON                                                                                 |
| group_replication_allow_local_disjoint_gtids_join | OFF                                                                                |
| group_replication_gtid_assignment_block_size      | 1000000                                                                            |
| gtid_executed                                     | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-26,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| gtid_executed_compression_period                  | 1000                                                                               |
| gtid_mode                                         | ON                                                                                 |
| gtid_owned                                        |                                                                                    |
| gtid_purged                                       | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-13,
f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| session_track_gtids                               | OFF                                                                                |
+---------------------------------------------------+------------------------------------------------------------------------------------+

后面的加入步骤与第一种情况相同。
当加入节点发现主节点日志不完整后便从其他节点获取日志。
观察3312日志:
2020-09-16T14:24:45.860288+08:00 20 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2020-09-16T14:24:45.860314+08:00 20 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2020-09-16T14:24:45.860323+08:00 20 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2020-09-16T14:24:45.860376+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T14:24:45.860442+08:00 21 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2020-09-16T14:24:45.860773+08:00 21 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2020-09-16T14:24:45.870358+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='localhost.localdomain', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T14:24:45.875675+08:00 18 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2020-09-16T14:24:45.882611+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='localhost.localdomain', master_port= 3312, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T14:24:45.887683+08:00 18 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 8d1369dd-f735-11ea-a075-000c295d8ce8 at localhost.localdomain port: 3312.'
2020-09-16T14:24:45.887891+08:00 30 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-09-16T14:24:45.888591+08:00 30 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@localhost.localdomain:3312',replication started in log 'FIRST' at position 4
2020-09-16T14:24:45.897359+08:00 31 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_recovery.000001' position: 4

此种情况也简单,无需人工干预。

三、加入节点3311,无法从所有donor拉取所有binlog

前提:
MGR部署完成且已启动节点3310,3312

主节点3310 purge部分日志,制造binlog缺失情况
mysql> purge BINARY logs to 'mysql-bin.000004';
mysql> show global VARIABLES like '%gtid%';
+---------------------------------------------------+------------------------------------------------------------------------------------+
| Variable_name                                     | Value                                                                              |
+---------------------------------------------------+------------------------------------------------------------------------------------+
| binlog_gtid_simple_recovery                       | ON                                                                                 |
| enforce_gtid_consistency                          | ON                                                                                 |
| group_replication_allow_local_disjoint_gtids_join | OFF                                                                                |
| group_replication_gtid_assignment_block_size      | 1000000                                                                            |
| gtid_executed                                     | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-26,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| gtid_executed_compression_period                  | 1000                                                                               |
| gtid_mode                                         | ON                                                                                 |
| gtid_owned                                        |                                                                                    |
| gtid_purged                                       | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-13,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| session_track_gtids                               | OFF                                                                                |
+---------------------------------------------------+------------------------------------------------------------------------------------+

从节点3312 purge部分日志,制造binlog缺失情况
mysql> purge BINARY logs to 'mysql-bin.000003';   
mysql> show global VARIABLES like '%gtid%';
+---------------------------------------------------+------------------------------------------------------------------------------------+
| Variable_name                                     | Value                                                                              |
+---------------------------------------------------+------------------------------------------------------------------------------------+
| binlog_gtid_simple_recovery                       | ON                                                                                 |
| enforce_gtid_consistency                          | ON                                                                                 |
| group_replication_allow_local_disjoint_gtids_join | OFF                                                                                |
| group_replication_gtid_assignment_block_size      | 1000000                                                                            |
| gtid_executed                                     | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-27,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| gtid_executed_compression_period                  | 1000                                                                               |
| gtid_mode                                         | ON                                                                                 |
| gtid_owned                                        |                                                                                    |
| gtid_purged                                       | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-22,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 |
| session_track_gtids                               | OFF                                                                                |
+---------------------------------------------------+------------------------------------------------------------------------------------+
        > 节点3311加入集群时报错
DB初始化步骤略
(sit)root@localhost [(none)]> INSTALL PLUGIN group_replication SONAME 'group_replication.so'; 
(sit)root@localhost [(none)]> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
(sit)root@localhost [(none)]> START GROUP_REPLICATION;
(sit)root@localhost [(none)]> select * From `performance_schema`.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST           | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 834617f9-f7ec-11ea-b335-000c295d8ce8 | localhost.localdomain |        3311 | RECOVERING   |
| group_replication_applier | 8d1369dd-f735-11ea-a075-000c295d8ce8 | localhost.localdomain |        3312 | ONLINE       |
| group_replication_applier | f48a43eb-e8ce-11ea-86bc-000c295d8ce8 | localhost.localdomain |        3310 | ONLINE       |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+

此时发现节点3311一直在RECOVERING状态,查看error.log,注意 [ERROR] 报错部分。
2020-09-16T15:18:50.970126+08:00 2 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-09-16T15:18:50.970267+08:00 2 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 127.0.0.1/8,192.168.122.1/24,192.168.211.128/24 to the whitelist'
2020-09-16T15:18:50.970389+08:00 2 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-09-16T15:18:50.970443+08:00 2 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-09-16T15:18:50.970460+08:00 2 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "1f1f567e-e8f3-11ea-a254-fa163eb74263"; group_replication_local_address: "192.168.211.128:33111"; group_replication_group_seeds: "192.168.211.128:33101,192.168.211.128:33111,192.168.211.128:33121"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2020-09-16T15:18:50.970482+08:00 2 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-09-16T15:18:50.970487+08:00 2 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-09-16T15:18:50.970510+08:00 2 [Note] Plugin group_replication reported: 'Member configuration: member_id: 1283311; member_uuid: "834617f9-f7ec-11ea-b335-000c295d8ce8"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-09-16T15:18:50.994080+08:00 4 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:18:51.015131+08:00 7 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_applier.000001' position: 4
2020-09-16T15:18:51.015156+08:00 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-09-16T15:18:51.015176+08:00 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2020-09-16T15:18:51.015180+08:00 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1283311'
2020-09-16T15:18:51.107805+08:00 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-09-16T15:18:51.107847+08:00 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33111'
2020-09-16T15:18:52.588328+08:00 2 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address localhost.localdomain:3310.'
2020-09-16T15:18:52.588689+08:00 18 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2020-09-16T15:18:52.589047+08:00 0 [Note] Plugin group_replication reported: 'Group membership changed to localhost.localdomain:3311, localhost.localdomain:3312, localhost.localdomain:3310 on view 15994453452217269:13.'
2020-09-16T15:18:52.602940+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='localhost.localdomain', master_port= 3312, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:18:52.612319+08:00 18 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 8d1369dd-f735-11ea-a075-000c295d8ce8 at localhost.localdomain port: 3312.'
2020-09-16T15:18:52.612666+08:00 20 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-09-16T15:18:52.613377+08:00 20 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@localhost.localdomain:3312',replication started in log 'FIRST' at position 4
2020-09-16T15:18:52.666024+08:00 21 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_recovery.000001' position: 4
2020-09-16T15:18:52.668118+08:00 20 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2020-09-16T15:18:52.668184+08:00 20 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2020-09-16T15:18:52.668199+08:00 20 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2020-09-16T15:18:52.668392+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T15:18:52.668698+08:00 21 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2020-09-16T15:18:52.669409+08:00 21 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2020-09-16T15:18:52.686314+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='localhost.localdomain', master_port= 3312, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:18:52.695158+08:00 18 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2020-09-16T15:18:52.707293+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='localhost.localdomain', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:18:52.715254+08:00 18 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor f48a43eb-e8ce-11ea-86bc-000c295d8ce8 at localhost.localdomain port: 3310.'
2020-09-16T15:18:52.715628+08:00 30 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-09-16T15:18:52.716433+08:00 30 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@localhost.localdomain:3310',replication started in log 'FIRST' at position 4
2020-09-16T15:18:52.732114+08:00 31 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_recovery.000001' position: 4
2020-09-16T15:18:52.736314+08:00 30 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2020-09-16T15:18:52.736412+08:00 30 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2020-09-16T15:18:52.736421+08:00 30 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2020-09-16T15:18:52.736587+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T15:18:52.736639+08:00 31 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2020-09-16T15:18:52.737083+08:00 31 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2020-09-16T15:18:52.753778+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='localhost.localdomain', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:18:52.762847+08:00 18 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
...
...
...
2020-09-16T15:22:53.213727+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='localhost.localdomain', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:22:53.222738+08:00 18 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor f48a43eb-e8ce-11ea-86bc-000c295d8ce8 at localhost.localdomain port: 3310.'
2020-09-16T15:22:53.223098+08:00 100 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-09-16T15:22:53.225062+08:00 100 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@localhost.localdomain:3310',replication started in log 'FIRST' at position 4
2020-09-16T15:22:53.238322+08:00 101 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_recovery.000001' position: 4
2020-09-16T15:22:53.241141+08:00 100 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2020-09-16T15:22:53.241184+08:00 100 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2020-09-16T15:22:53.241189+08:00 100 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2020-09-16T15:22:53.241595+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T15:22:53.241692+08:00 101 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2020-09-16T15:22:53.242487+08:00 101 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2020-09-16T15:22:53.259592+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='localhost.localdomain', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:22:53.269993+08:00 18 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/10'
2020-09-16T15:22:53.285775+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='localhost.localdomain', master_port= 3312, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:22:53.294631+08:00 18 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 8d1369dd-f735-11ea-a075-000c295d8ce8 at localhost.localdomain port: 3312.'
2020-09-16T15:22:53.295100+08:00 110 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-09-16T15:22:53.296353+08:00 110 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@localhost.localdomain:3312',replication started in log 'FIRST' at position 4
2020-09-16T15:22:53.316454+08:00 111 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_recovery.000001' position: 4
2020-09-16T15:22:53.319438+08:00 110 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2020-09-16T15:22:53.319518+08:00 110 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2020-09-16T15:22:53.319525+08:00 110 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2020-09-16T15:22:53.319677+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T15:22:53.319773+08:00 111 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2020-09-16T15:22:53.320719+08:00 111 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2020-09-16T15:22:53.341186+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='localhost.localdomain', master_port= 3312, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:22:53.353728+08:00 18 [ERROR] Plugin group_replication reported: 'Maximum number of retries when trying to connect to a donor reached. Aborting group replication recovery.'
2020-09-16T15:22:53.353802+08:00 18 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2020-09-16T15:22:53.368849+08:00 18 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T15:22:53.380519+08:00 18 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.'
2020-09-16T15:22:53.380628+08:00 18 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2020-09-16T15:22:56.670091+08:00 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

大致过程:首先找主节点3310拉取日志,发现日志不完整,于是从其他节点3312中拉取,仍无法拉取。重复尝试10次后节点加入失败,剔除节点3311。
尝试重复连接的次数及重连间隔时间由参数group_replication_recovery_retry_count和group_replication_recovery_reconnect_interval决定。
此时节点已经无法自动加入集群,需通过备份重新恢复DB后再加入集群。

        > 通过备份恢复的方式加入集群
> 主节点备份全部数据,DB较大的话可通过xtrabackup备份
mysqldump -uroot -P3310 -proot123  --single-transaction --master-data=2  -R --default-character-set=utf8  --all-databases  > /tmp/3310.sql  

> 待加入的节点3311 注释参数  ## disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY",重启

> 待加入的节点3311 全量恢复数据库
mysql> source /tmp/3310.sql

> 待加入的节点3311 打开参数  disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY",重启

> 待加入的节点3311 加入集群
(sit)root@localhost [(none)]> INSTALL PLUGIN group_replication SONAME 'group_replication.so'; 
(sit)root@localhost [(none)]> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
(sit)root@localhost [(none)]> show databases ;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test_mgr           |
+--------------------+
(sit)root@localhost [(none)]> START GROUP_REPLICATION;
(sit)root@localhost [(none)]> show databases ;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| lzh                |
| mysql              |
| performance_schema |
| sys                |
| test_mgr           |
+--------------------+
(sit)root@localhost [(none)]> select * From `performance_schema`.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST           | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 834617f9-f7ec-11ea-b335-000c295d8ce8 | localhost.localdomain |        3311 | ONLINE       |
| group_replication_applier | 8d1369dd-f735-11ea-a075-000c295d8ce8 | localhost.localdomain |        3312 | ONLINE       |
| group_replication_applier | f48a43eb-e8ce-11ea-86bc-000c295d8ce8 | localhost.localdomain |        3310 | ONLINE       |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+

四、加入的节点3311有本地执行事务

从上面步骤得到的集群中剔除3311节点,然后在节点3311上执行事务,再加入集群。

> 停止节点3311的组复制
(sit)root@localhost [(none)]> stop group_replication ;
(sit)root@localhost [(none)]> select * From `performance_schema`.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST           | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 7fc69a7d-f7fb-11ea-b918-000c295d8ce8 | localhost.localdomain |        3311 | OFFLINE      |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+

> 打开超级管理员read_only限制,执行事务,制造本地事务
(sit)root@localhost [(none)]> set global super_read_only=0 ;
(sit)root@localhost [(none)]> create table lzh.test_tab(id int); 

> 查看执行事务后的节点3311 gtid值,如所料,本地节点多了一个事务 7fc69a7d-f7fb-11ea-b918-000c295d8ce8:1
(sit)root@localhost [(none)]> select @@server_uuid;
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| 7fc69a7d-f7fb-11ea-b918-000c295d8ce8 |
+--------------------------------------+
(sit)root@localhost [(none)]> show global variables like '%gtid%';
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| Variable_name                                     | Value                                                                                                                       |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| binlog_gtid_simple_recovery                       | ON                                                                                                                          |
| enforce_gtid_consistency                          | ON                                                                                                                          |
| group_replication_allow_local_disjoint_gtids_join | OFF                                                                                                                         |
| group_replication_gtid_assignment_block_size      | 1000000                                                                                                                     |
| gtid_executed                                     | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-30,7fc69a7d-f7fb-11ea-b918-000c295d8ce8:1,f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3   |
| gtid_executed_compression_period                  | 1000                                                                                                                        |
| gtid_mode                                         | ON                                                                                                                          |
| gtid_owned                                        |                                                                                                                             |
| gtid_purged                                       | 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-28,
f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3                                         |
| session_track_gtids                               | OFF                                                                                                                         |
+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+

> 节点3311 加入集群
(sit)root@localhost [(none)]> start group_replication ;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.

> 加入失败,查看日志
提示此节点包含了组之外的事务,节点将被组排除。
若想强制加入,可使用参数 group_replication_allow_local_disjoint_gtids_join ,此参数在5.7.21版本中已弃用,并在后续版本中移除此参数。
2020-09-16T18:00:12.905134+08:00 32 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-09-16T18:00:12.905287+08:00 32 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 127.0.0.1/8,192.168.122.1/24,192.168.211.128/24 to the whitelist'
2020-09-16T18:00:12.905411+08:00 32 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-09-16T18:00:12.905481+08:00 32 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-09-16T18:00:12.905504+08:00 32 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "1f1f567e-e8f3-11ea-a254-fa163eb74263"; group_replication_local_address: "192.168.211.128:33111"; group_replication_group_seeds: "192.168.211.128:33101,192.168.211.128:33111,192.168.211.128:33121"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2020-09-16T18:00:12.905527+08:00 32 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-09-16T18:00:12.905535+08:00 32 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-09-16T18:00:12.905558+08:00 32 [Note] Plugin group_replication reported: 'Member configuration: member_id: 1283311; member_uuid: "7fc69a7d-f7fb-11ea-b918-000c295d8ce8"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-09-16T18:00:12.906078+08:00 35 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 373, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-09-16T18:00:12.939699+08:00 38 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/mgr3311/logs/mysql-relay-bin-group_replication_applier.000002' position: 4
2020-09-16T18:00:12.940142+08:00 32 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-09-16T18:00:12.940169+08:00 32 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2020-09-16T18:00:12.940174+08:00 32 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1283311'
2020-09-16T18:00:12.940404+08:00 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-09-16T18:00:12.940444+08:00 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33111'
2020-09-16T18:00:14.470626+08:00 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-30,7fc69a7d-f7fb-11ea-b918-000c295d8ce8:1,
f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3 > Group transactions: 1f1f567e-e8f3-11ea-a254-fa163eb74263:1-30,
f48a43eb-e8ce-11ea-86bc-000c295d8ce8:1-3'
2020-09-16T18:00:14.470686+08:00 0 [ERROR] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.'
2020-09-16T18:00:14.470697+08:00 0 [Note] Plugin group_replication reported: 'To force this member into the group you can use the group_replication_allow_local_disjoint_gtids_join option'
2020-09-16T18:00:14.470719+08:00 0 [Note] Plugin group_replication reported: 'Group membership changed to localhost.localdomain:3311, localhost.localdomain:3312, localhost.localdomain:3310 on view 15994453452217269:19.'
2020-09-16T18:00:14.470878+08:00 32 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2020-09-16T18:00:18.001589+08:00 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2020-09-16T18:00:23.002210+08:00 32 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2020-09-16T18:00:23.002237+08:00 32 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2020-09-16T18:00:23.002331+08:00 38 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2020-09-16T18:00:23.002789+08:00 38 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 65
2020-09-16T18:00:23.010274+08:00 35 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'

> 根据提示修改节点3311的参数,再加入集群
(sit)root@localhost [(none)]> set global group_replication_allow_local_disjoint_gtids_join=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
(sit)root@localhost [(none)]> show warnings ;
+---------+------+------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                    |
+---------+------+------------------------------------------------------------------------------------------------------------+
| Warning | 1681 | 'group_replication_allow_local_disjoint_gtids_join' is deprecated and will be removed in a future release. |
+---------+------+------------------------------------------------------------------------------------------------------------+
(sit)root@localhost [(none)]> start group_replication ;
Query OK, 0 rows affected, 1 warning (3.29 sec)

(sit)root@localhost [(none)]> select * From `performance_schema`.replication_group_members;
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST           | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+
| group_replication_applier | 7fc69a7d-f7fb-11ea-b918-000c295d8ce8 | localhost.localdomain |        3311 | ONLINE       |
| group_replication_applier | 8d1369dd-f735-11ea-a075-000c295d8ce8 | localhost.localdomain |        3312 | ONLINE       |
| group_replication_applier | f48a43eb-e8ce-11ea-86bc-000c295d8ce8 | localhost.localdomain |        3310 | ONLINE       |
+---------------------------+--------------------------------------+-----------------------+-------------+--------------+

强烈不建议用此种方式将节点加入集群,官方也不推荐,而且后续此参数会被移除。
遇到数据不一致时建议通过备份重新恢复数据库,再加入集群。

上一篇 下一篇

猜你喜欢

热点阅读