015.Redis Cluster集群扩容缩容原理及实战

2020-03-24 本文已影响0人 CoderJed

1. Redis Cluster集群扩容

1.1 扩容原理

redis cluster可以实现对节点的灵活上下线控制
3个主节点分别维护自己负责的槽和对应的数据，如果希望加入一个节点实现扩容，就需要把一部分槽和数据迁移和新节点

每个master把一部分槽和数据迁移到新的节点node04

1.2 扩容过程

准备新节点

准备两个配置文件redis_6379.conf和redis_6380.conf

daemonize yes
port 6379
logfile "/var/log/redis/redis_6379.log"
pidfile /var/run/redis/redis_6379.pid
dir /data/redis/6379
bind 10.0.0.103
protected-mode no
# requirepass 123456
appendonly yes
cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file /opt/redis/conf/nodes-6379.conf

# 另一份配置文件
daemonize yes
port 6380
logfile "/var/log/redis/redis_6380.log"
pidfile /var/run/redis/redis_6380.pid
dir /data/redis/6379
bind 10.0.0.103
protected-mode no
# requirepass 123456
appendonly yes
cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file /opt/redis/conf/nodes-6380.onf

创建目录

mkdir -p /var/log/redis
touch /var/log/redis/redis_6379.log
touch /var/log/redis/redis_6380.log
mkdir -p /var/run/redis
mkdir -p /data/redis/6379
mkdir -p /data/redis/6380
mkdir -p /opt/redis/conf

新节点启动redis服务

[root@node04 redis]# bin/redis-server conf/redis_6379.conf
[root@node04 redis]# bin/redis-server conf/redis_6380.conf
[root@node04 opt]# ps -ef | grep redis
root       1755      1  0 19:06 ?        00:00:00 redis-server 10.0.0.103:6379 [cluster]
root       1757      1  0 19:06 ?        00:00:00 redis-server 10.0.0.103:6380 [cluster]

新节点加入集群

在原有集群任意节点内执行以下命令

root@node01 opt]# redis-cli -c -h 10.0.0.100 -p 6380
10.0.0.100:6380> cluster meet 10.0.0.103 6379
OK
10.0.0.100:6380> cluster meet 10.0.0.103 6380
OK

集群内新旧节点经过一段时间的通信之后，所有节点会更新它们的状态并保存到本地

10.0.0.100:6380> cluster nodes
# 可以看到新加入两个服务(10.0.0.103:6379/10.0.0.103:6380)都是master，它们还没有管理slot
4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380@16380 master - 0 1585048391000 7 connected 0-5460
690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379@16379 master - 0 1585048389000 3 connected 10923-16383
1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379@16379 master - 0 1585048392055 2 connected 5461-10922
724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379@16379 master - 0 1585048388000 8 connected
ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380@16380 master - 0 1585048391046 0 connected
89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379@16379 slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585048388000 7 connected
86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380@16380 slave 1be5d1aaaa9e9542224554f461694da9cba7c2b8 0 1585048389033 6 connected
8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380@16380 myself,slave 690b2e1f604a0227068388d3e5b1f1940524c565 0 1585048390000 4 connected

新节点刚开始都是master节点，但是由于没有负责的槽，所以不能接收任何读写操作，对新节点的后续操作，一般有两种选择：
- 从其他的节点迁移槽和数据给新节点
- 作为其他节点的slave负责故障转移
redis-trib.rb工具也实现了为现有集群添加新节点的命令，同时也实现了直接添加为slave的支持：
```
# 新节点加入集群
redis-trib.rb add-node new_host:new_port old_host:old_port
# 新节点加入集群并作为指定master的slave
redis-trib.rb add-node new_host:new_port old_host:old_port --slave --master-id <master-id>
```
建议使用redis-trib.rb add-node将新节点添加到集群中，该命令会检查新节点的状态，如果新节点已经加入了其他集群或者已经包含数据，则会报错，而使用cluster meet命令则不会做这样的检查，假如新节点已经存在数据，则会合并到集群中，造成数据不一致

迁移slot和数据

slot迁移是集群伸缩的最核心步骤
假设原有3个master，每个master负责10384 / 3 ≈ 5461个slot
加入一个新的master之后，每个master负责10384 / 4 = 4096个slot
确定好迁移计划之后，例如，每个master将超过4096个slot的部分迁移到新的master中，然后开始以slot为单位进行迁移
每个slot的迁移过程如下所示：
- 对目标节点发送cluster setslot {slot_id} importing {sourceNodeId}命令，目标节点的状态被标记为"importing"，准备导入这个slot的数据
- 对源节点发送cluster setslot {slot_id} migrating {targetNodeID}命令，源节点的状态被标记为"migrating"，准备迁出slot的数据
- 源节点执行cluster getkeysinslot {slot_id} {count}命令，获取这个slot的所有的key列表(分批获取，count指定一次获取的个数)，然后针对每个key进行迁移
- 在源节点执行migrate {targetIp} {targetPort} "" 0 {timeout} keys {keys}命令，把一批批key迁移到目标节点(redis-3.0.6之前一次只能迁移一个key)，具体来说，源节点对迁移的key执行dump指令得到序列化内容，然后通过客户端向目标节点发送携带着序列化内容的restore指令，目标节点进行反序列化后将接收到的内容存入自己的内存中，目标节点给客户端返回"OK"，然后源节点删除这个key，这样，一个key的迁移过程就结束了
- 所有的key都迁移完成后，一个slot的迁移就结束了
- 迁移所有的slot(应该被迁移的那些)，所有的slot迁移完成后，新的集群的slot就重新分配完成了，向集群内所有master发送cluster setslot {slot_id} node {targetNodeId}命令，通知他们哪些槽被迁移到了哪些master上，让它们更新自己的信息
slot迁移的其他说明
- 迁移过程是同步的，在目标节点执行restore指令到原节点删除key之间，原节点的主线程处于阻塞状态，直到key被删除成功
- 如果迁移过程突然出现网路故障，整个slot迁移只进行了一半，这时两个节点仍然会被标记为中间过滤状态，即"migrating"和"importing"，下次迁移工具连接上之后，会继续进行迁移
- 在迁移过程中，如果每个key的内容都很小，那么迁移过程很快，不会影响到客户端的正常访问
- 如果key的内容很大，由于迁移一个key的迁移过程是阻塞的，就会同时导致原节点和目标节点的卡顿，影响集群的稳定性，所以，集群环境下，业务逻辑要尽可能的避免大key的产生

手动完成slot迁移的过程

# 目标节点690b2e1f604a0227068388d3e5b1f1940524c565准备导入4096号slot
# 节点ID通过cluster nodes命令查看
cluster setslot 4096 importing 690b2e1f604a0227068388d3e5b1f1940524c565

# 源节点86e1881611440012c87fbf3fa98b7b6d79915e25准备导出4096号slot
cluster setslot 4096 migrating 86e1881611440012c87fbf3fa98b7b6d79915e25

# 批量获取4096号槽的100个key
cluster getkeysinslot 4096 100

# 批量迁移这些key
migrate 10.0.0.100 6379 "" 0 5000 keys key1 key2 ... key100

# 通过所有master，4096号槽被迁移到目标节点690b2e1f604a0227068388d3e5b1f1940524c565
10.0.0.100:6379> cluster setslot 4096 node 690b2e1f604a0227068388d3e5b1f1940524c565
10.0.0.101:6379> cluster setslot 4096 node 690b2e1f604a0227068388d3e5b1f1940524c565
10.0.0.102:6379> cluster setslot 4096 node 690b2e1f604a0227068388d3e5b1f1940524c565
10.0.0.103:6379> cluster setslot 4096 node 690b2e1f604a0227068388d3e5b1f1940524c565

使用redis-trib.rb工具完成slot迁移

redis-trib.rb reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg>

host:port：随便指定一个集群中的host:port，用以获取全部集群的信息
--from：源节点的id，提示用户输入
--to：目标节点的id，提示用户输入
--slots：需要迁移的slot的总数量，提示用户输入
--yes：当打印出slot迁移计划后是否需要用户输入yes确认后执行
--timeout：控制每次migrate操作的超时时间，默认60000ms
--pipeline：控制每次批量迁移的key的数量，默认10

[root@node01 redis]# redis-trib.rb reshard 10.0.0.100:6379

>>> Performing Cluster Check (using node 10.0.0.100:6379)
S: 89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379
   slots: (0 slots) slave
   replicates 4fb4c538d5f29255f6212f2eae8a761fbe364a89
S: 8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380
   slots: (0 slots) slave
   replicates 690b2e1f604a0227068388d3e5b1f1940524c565
M: 690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380
   slots: (0 slots) master
   0 additional replica(s)
S: 86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380
   slots: (0 slots) slave
   replicates 1be5d1aaaa9e9542224554f461694da9cba7c2b8
M: 1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379
   slots: (0 slots) master
   0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
# 要迁移多少个slot？
How many slots do you want to move (from 1 to 16384)? 4096
# 迁移到那个master？
What is the receiving node ID? 724a8a15f4efe5a01454cb971d7471d6e84279f3
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
# 从哪里迁移？
Source node #1:4fb4c538d5f29255f6212f2eae8a761fbe364a89
Source node #2:690b2e1f604a0227068388d3e5b1f1940524c565
Source node #3:1be5d1aaaa9e9542224554f461694da9cba7c2b8
Source node #4:done

Ready to move 4096 slots.
  Source nodes:
    M: 4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
    M: 690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
    M: 1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
  Destination node:
    M: 724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379
   slots: (0 slots) master
   0 additional replica(s)
  Resharding plan:
    Moving slot 5461 from 1be5d1aaaa9e9542224554f461694da9cba7c2b8
    Moving slot 5462 from 1be5d1aaaa9e9542224554f461694da9cba7c2b8
    Moving slot 5463 from 1be5d1aaaa9e9542224554f461694da9cba7c2b8
    ......
    
10.0.0.100:6380> cluster nodes
1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379@16379 master - 0 1585053959158 2 connected 6827-10922
# 可以看到新加入的一个节点已经分配到了slot
724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379@16379 master - 0 1585053957000 8 connected 0-1364 5461-6826 10923-12287
4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380@16380 master - 0 1585053960166 7 connected 1365-5460
ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380@16380 master - 0 1585053957000 0 connected
690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379@16379 master - 0 1585053959000 3 connected 12288-16383
89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379@16379 slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585053958149 7 connected
86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380@16380 slave 1be5d1aaaa9e9542224554f461694da9cba7c2b8 0 1585053958000 6 connected
8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380@16380 myself,slave 690b2e1f604a0227068388d3e5b1f1940524c565 0 1585053954000 4 connected

无需要求每个master的slot编号是连续的，只要每个master管理的slot的数量均衡就可以。

添加slave

我们刚开始添加10.0.0.103:6379和10.0.0.103:6380，现在他们都是master，应该让10.0.0.103:6380成为10.0.0.103:6379的slave

# 首先进入10.0.0.103:6380客户端
redis-cli -c -h 10.0.0.103 -p 6380
# 然后设置为10.0.0.103:6379的slave节点
10.0.0.103:6380> cluster replicate 724a8a15f4efe5a01454cb971d7471d6e84279f3
OK
10.0.0.103:6380> cluster nodes
1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379@16379 master - 0 1585054332556 2 connected 6827-10922
4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380@16380 master - 0 1585054332000 7 connected 1365-5460
690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379@16379 master - 0 1585054332000 3 connected 12288-16383
724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379@16379 master - 0 1585054334000 8 connected 0-1364 5461-6826 10923-12287
89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379@16379 slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585054333565 7 connected
8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380@16380 slave 690b2e1f604a0227068388d3e5b1f1940524c565 0 1585054334574 3 connected
86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380@16380 slave 1be5d1aaaa9e9542224554f461694da9cba7c2b8 0 1585054332000 2 connected
# 10.0.0.103:6380已经成为slave
ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380@16380 myself,slave 724a8a15f4efe5a01454cb971d7471d6e84279f3 0 1585054333000 0 connected

检查slot的负载均衡

[root@node01 redis]# redis-trib.rb rebalance 10.0.0.100:6379
>>> Performing Cluster Check (using node 10.0.0.100:6379)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
# 所有master节点管理的slot数量的差异在2%之内，不需要重新均衡！
*** No rebalancing needed! All nodes are within the 2.0% threshold.

2. Redis Cluster集群缩容

2.1 缩容原理

如果下线的是slave，那么通知其他节点忘记下线的节点
如果下线的是master，那么将此master的slot迁移到其他master之后，通知其他节点忘记此master节点
其他节点都忘记了下线的节点之后，此节点就可以正常停止服务了

2.2 缩容过程

我们在上面添加了10.0.0.103:6379和10.0.0.103:6380两个节点，现在把这两个节点下线

确认下线节点的角色

10.0.0.103:6380> cluster nodes
...
# 10.0.0.103:6380是slave
# 10.0.0.103:6379是master
724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379@16379 master - 0 1585055101000 8 connected 0-1364 5461-6826 10923-12287
ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380@16380 slave 724a8a15f4efe5a01454cb971d7471d6e84279f3 0 1585055099000 0 connected

下线master节点的slot迁移到其他master

[root@node01 redis]# redis-trib.rb reshard 10.0.0.100:6379
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1364
What is the receiving node ID? 1be5d1aaaa9e9542224554f461694da9cba7c2b8
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:724a8a15f4efe5a01454cb971d7471d6e84279f3
Source node #2:done

Ready to move 1364 slots.
  Source nodes:
    M: 724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   1 additional replica(s)
  Destination node:
    M: 1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
  Resharding plan:
  .......
  
  
 [root@node01 redis]# redis-trib.rb reshard 10.0.0.100:6379
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1364
What is the receiving node ID? 4fb4c538d5f29255f6212f2eae8a761fbe364a89
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:724a8a15f4efe5a01454cb971d7471d6e84279f3
Source node #2:done
  .......
  
 [root@node01 redis]# redis-trib.rb reshard 10.0.0.100:6379
......
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1365
What is the receiving node ID? 690b2e1f604a0227068388d3e5b1f1940524c565
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:724a8a15f4efe5a01454cb971d7471d6e84279f3
Source node #2:done
  .......
  
10.0.0.103:6380> cluster nodes
1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379@16379 master - 0 1585056902000 9 connected 0-1363 6827-10922
4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380@16380 master - 0 1585056903544 12 connected 2729-6826 10923-12287
690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379@16379 master - 0 1585056903000 11 connected 1364-2728 12288-16383
# 10.0.0.103:6379的slot已经迁移完成
724a8a15f4efe5a01454cb971d7471d6e84279f3 10.0.0.103:6379@16379 master - 0 1585056903000 10 connected
ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 10.0.0.103:6380@16380 myself,slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585056898000 0 connected
89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379@16379 slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585056901000 12 connected
8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380@16380 slave 690b2e1f604a0227068388d3e5b1f1940524c565 0 1585056900000 11 connected
86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380@16380 slave 1be5d1aaaa9e9542224554f461694da9cba7c2b8 0 1585056904551 9 connected

忘记节点

Redis提供了cluster forget{downNodeId}命令来通知其他节点忘记下线节点，当节点接收到cluster forget {down NodeId}命令后，会把nodeId指定的节点加入到禁用列表中，在禁用列表内的节点不再与其他节点发送消息，禁用列表有效期是60秒，超过60秒节点会再次参与消息交换。也就是说当第一次forget命令发出后，我们有60秒的时间让集群内的所有节点忘记下线节点

线上操作不建议直接使用cluster forget命令下线节点，这需要跟大量节点进行命令交互，建议使用redis- trib.rb del-node {host:port} {downNodeId}命令

另外，先下线slave，再下线master可以防止不必要的数据复制

# 先下线slave 10.0.0.103:6380
[root@node01 redis]# redis-trib.rb del-node 10.0.0.100:6379 ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011
>>> Removing node ed9b72fffd04b8a7e5ad20afdaf1f53e0eb95011 from cluster 10.0.0.100:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
# 再下线slave 10.0.0.103:6379
[root@node01 redis]# redis-trib.rb del-node 10.0.0.100:6379 724a8a15f4efe5a01454cb971d7471d6e84279f3
>>> Removing node 724a8a15f4efe5a01454cb971d7471d6e84279f3 from cluster 10.0.0.100:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

10.0.0.100:6379> cluster nodes
8c13a2afa76194ef9582bb06675695bfef76b11d 10.0.0.100:6380@16380 slave 690b2e1f604a0227068388d3e5b1f1940524c565 0 1585057049247 11 connected
690b2e1f604a0227068388d3e5b1f1940524c565 10.0.0.102:6379@16379 master - 0 1585057048239 11 connected 1364-2728 12288-16383
4fb4c538d5f29255f6212f2eae8a761fbe364a89 10.0.0.101:6380@16380 master - 0 1585057048000 12 connected 2729-6826 10923-12287
89f52bfbb8803db19ab0c5a90adc4099df8287f7 10.0.0.100:6379@16379 myself,slave 4fb4c538d5f29255f6212f2eae8a761fbe364a89 0 1585057047000 1 connected
86e1881611440012c87fbf3fa98b7b6d79915e25 10.0.0.102:6380@16380 slave 1be5d1aaaa9e9542224554f461694da9cba7c2b8 0 1585057048000 9 connected
1be5d1aaaa9e9542224554f461694da9cba7c2b8 10.0.0.101:6379@16379 master - 0 1585057048000 9 connected 0-1363 6827-10922

redis-trib.rb del-node还可以自动停止下线节点的服务。