Consul节点创建与集群的搭建(含单机多节点集群)
一、关于Consul
1、Consul简介
Consul是 HashiCorp 公司的一个用于实现分布式系统的分布式、高可用、高可横向扩展的服务发现与配置工具。Consul内置了服务注册与发现框 架、分布一致性协议实现、健康检查、Key/Value存储、多数据中心方案。Consul具有功能完善、部署简单、使用方便等特点。
类似的做服务发现与注册的框架还有:
-
Etcd
-
Apache ZooKeeper
-
Eureka
2、Consul架构
consul-arch.png3、Consul特性
-
服务发现(Service Discovery):Consul提供了通过DNS或者HTTP接口的方式来注册服务和发现服务。一些外部的服务通过Consul很容易的找到它所依赖的服务。
-
健康检查(Health Checking):Consul的Client可以提供任意数量的健康检查,既可以与给定的服务相关联(“webserver是否返回200 OK”),也可以与本地节点相关联(“CPU使用率是否大于90%”)。操作员可以使用这些信息来监视集群的健康状况,服务发现组件可以使用这些信息将流量从不健康的主机路由出去。
-
Key/Value存储:应用程序可以根据自己的需要使用Consul提供的Key/Value存储。 Consul提供了简单易用的HTTP接口,结合其他工具可以实现动态配置、功能标记、leader选举等等功能。
-
安全服务通信:Consul可以为服务生成和分发TLS证书,以建立相互的TLS连接。意图可用于定义允许哪些服务通信。服务分割可以很容易地进行管理,其目的是可以实时更改的,而不是使用复杂的网络拓扑和静态防火墙规则。
-
多数据中心:Consul支持开箱即用的多数据中心. 这意味着用户不需要担心需要建立额外的抽象层让业务扩展到多个区域。
4、Consul常见使用场景
Consul的应用场景包括服务发现、服务隔离、服务配置:
-
服务发现场景中consul作为注册中心,服务地址被注册到consul中以后,可以使用consul提供的dns、http接口查询,consul支持health check。
-
服务隔离场景中consul支持以服务为单位设置访问策略,能同时支持经典的平台和新兴的平台,支持tls证书分发,service-to-service加密。
-
服务配置场景中consul提供key-value数据存储功能,并且能将变动迅速地通知出去,借助Consul可以实现配置共享,需要读取配置的服务可以从Consul中读取到准确的配置信息。
-
Consul可以帮助系统管理者更清晰的了解复杂系统内部的系统架构,运维人员可以将Consul看成一种监控软件,也可以看成一种资产(资源)管理系统。
二、启动dev模式单节点
此模式适合日常开发环境中调试使用,如果您对数据持久化与服务可靠性有较高要求,请跳过此模式。
1、下载并安装Consul
使用wget
下载安装包至/data/pkgs目录:
cd /data/pkgs
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip
将安装包解压至/data/services
目录,并重命名为consul
:
unzip consul_1.7.3_linux_amd64.zip -d /data/services/consul
Archive: consul_1.7.3_linux_amd64.zip
inflating: /data/services/consul/consul
查看目录信息:
ls -l /data/services/consul
total 105444
-rwxr-xr-x 1 centos centos 107970750 May 6 06:50 consul
2、配置Consul
通过上一步的操作,已经基本完成了consul软件的安装,但为了使用方便,我们还需要将consul可执行文件所在目录加入PATH
,以方便在任何地方调用还需要创建配置文件目录来避免启动时需要加很多参数,另外需要将日志写入特定目录,避免consul日志大量写入系统日志。
a.创建相应目录,匹配环境变量
创建bin目录、配置文件目录、日志目录:
cd /data/services/consul
mkdir {bin,log,conf,data}
将consul可执行文件移动到bin
目录:
mv consul bin/
将consul的bin
目录添加到PATH
,以下操作需要sudo
或root
权限:
sudo vim /etc/profile.d/consul.sh #为了避免误操作或环境变量配置不当导致系统命令失效,建议以服务或软件名的形式在/etc/profile.d目录下创建环境变量配置脚本,便于维护,不用时移除特定的脚本即可,对系统影响较小
脚本内容如下:
export CONSUL_HOME=/data/services/consul
export PATH=${PATH}:${CONSUL_HOME}/bin
#使用source执行脚本,使配置生效
source /etc/profile.d/consul.sh
至此,consul已经被添加到系统PATH
,可以在任意目录下进行调用consul命令。如下:
cd /data/services
consul help
Usage: consul [--version] [--help] <command> [<args>]
Available commands are:
acl Interact with Consul's ACLs
agent Runs a Consul agent
catalog Interact with the catalog
config Interact with Consul's Centralized Configurations
connect Interact with Consul Connect
debug Records a debugging archive for operators
event Fire a new event
exec Executes a command on Consul nodes
force-leave Forces a member of the cluster to enter the "left" state
info Provides debugging information for operators.
intention Interact with Connect service intentions
join Tell Consul agent to join cluster
keygen Generates a new encryption key
keyring Manages gossip layer encryption keys
kv Interact with the key-value store
leave Gracefully leaves the Consul cluster and shuts down
lock Execute a command holding a lock
login Login to Consul using an auth method
logout Destroy a Consul token created with login
maint Controls node or service maintenance mode
members Lists the members of a Consul cluster
monitor Stream logs from a Consul agent
operator Provides cluster-level tools for Consul operators
reload Triggers the agent to reload configuration files
rtt Estimates network round trip time between nodes
services Interact with services
snapshot Saves, restores and inspects snapshots of Consul server state
tls Builtin helpers for creating CAs and certificates
validate Validate config files/directories
version Prints the Consul version
watch Watch for changes in Consul
b.创建配置文件
cd /data/services/consul/conf
vim dev.json
{
"bind_addr": "10.100.0.2",
"client_addr": "10.100.0.2",
"datacenter": "dc1",
"data_dir": "/data/services/consul/data",
"log_level": "INFO",
"log_file": "/data/services/consul/log/consul.log", #配置日志文件与目录
"log_rotate_duration": "24h", #设置日志轮转
"enable_syslog": false, #禁止consul日志写入系统日志
"enable_debug": true,
"node_name": "Consul",
"ui": true
}
3、以dev模式启动Consul服务
consul agent -dev -config-dir=/data/services/consul/conf
==> Starting Consul agent...
Version: 'v1.7.3'
Node ID: '0e2d44c2-af33-e222-5eb5-58b2c1f903d5'
Node name: 'Consul'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [10.100.0.2] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 10.100.0.2 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0e2d44c2-af33-e222-5eb5-58b2c1f903d5 Address:10.100.0.2:8300}]"
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 10.100.0.2:8300 [Follower]" leader=
2020-06-18T15:11:48.435+0800 [INFO] agent.server.serf.wan: serf: EventMemberJoin: Consul.dc1 10.100.0.2
2020-06-18T15:11:48.436+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
2020-06-18T15:11:48.436+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
2020-06-18T15:11:48.436+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
2020-06-18T15:11:48.436+0800 [INFO] agent: Started DNS server: address=10.100.0.2:8600 network=tcp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started DNS server: address=10.100.0.2:8600 network=udp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started HTTP server: address=10.100.0.2:8500 network=tcp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started gRPC server: address=10.100.0.2:8502 network=tcp
2020-06-18T15:11:48.437+0800 [INFO] agent: started state syncer
==> Consul agent running!
2020-06-18T15:11:48.489+0800 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 10.100.0.2:8300 [Candidate]" term=2
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: election won: tally=1
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: entering leader state: leader="Node at 10.100.0.2:8300 [Leader]"
2020-06-18T15:11:48.490+0800 [INFO] agent.server: cluster leadership acquired
2020-06-18T15:11:48.490+0800 [INFO] agent.server: New leader elected: payload=Consul
2020-06-18T15:11:48.501+0800 [INFO] agent.server.connect: initialized primary datacenter CA with provider: provider=consul
2020-06-18T15:11:48.501+0800 [INFO] agent.leader: started routine: routine="CA root pruning"
2020-06-18T15:11:48.501+0800 [INFO] agent.server: member joined, marking health alive: member=Consul
2020-06-18T15:11:48.615+0800 [INFO] agent: Synced node info
看到上述信息,则说明已经成功以dev模式运行了consul服务。
4、配置Consul优雅启动与重启
在上一步的启动操作中,需要使用命令行带参数启动,为了方便管理,将consul服务添加到systemd
,可以进行优雅启动与停止。为了服务安全,可以先创建一个不可登录shell类型的用户来对consul服务进行管理:
sudo useradd -M -s /sbin/nologin consul
更改consul服务目录属主:
sudo chown -R consul.consul /data/services/consul
添加systemd
管理单元:
sudo vim /usr/lib/systemd/system/consul.service
[Unit]
Description=Consul-node1
Documentation=https://www.consul.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
User=consul
Group=consul
Type=simple
ExecStart=/data/services/consul/bin/consul agent -dev -config-dir=/data/services/consul/conf >/dev/null 2>&1
[Install]
WantedBy=multi-user.target
重载systemd
配置:
sudo systemctl daemon-reload
5、Consul服务优雅启动、停止、重启
使用systemd
启动consul服务:
sudo systemctl start consul
使用systemd
查看consul服务的状态:
sudo systemctl status consul
● consul.service - Consul
Loaded: loaded (/usr/lib/systemd/system/consul-node1.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-06-18 15:41:48 CST; 18s ago
Docs: https://www.consul.io/docs/
Main PID: 2217 (consul)
CGroup: /system.slice/consul.service
└─2217 /data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.732+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.733+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
Jun 18 15:41:52 localhost consul[2217]: 2020-06-18T15:41:52.582+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
Hint: Some lines were ellipsized, use -l to show in full.
使用systemd
停止consul服务:
sudo systemctl stop consul
使用systemd
重启consul服务:
sudo systemctl restart consul
二、启动单节点server
此模式比较适合测试环境、对consul数据持久化有要求的开发环境。如果您需要集群模式,请直接跳过此部分内容。
1、下载安装Consul
使用wget
下载安装包至/data/pkgs
目录:
cd /data/pkgs
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip
将安装包解压至/data/services
目录,并重命名为consul
:
unzip consul_1.7.3_linux_amd64.zip -d /data/services/consul
Archive: consul_1.7.3_linux_amd64.zip
inflating: /data/services/consul/consul
查看目录信息:
ls -l /data/services/consul
total 105444
-rwxr-xr-x 1 centos centos 107970750 May 6 06:50 consul
2、配置Consul
通过上一步的操作,已经基本完成了consul软件的安装,但为了使用方便,我们还需要将consul可执行文件所在目录加入PATH
,以方便在任何地方调用还需要创建配置文件目录来避免启动时需要加很多参数,另外需要将日志写入特定目录,避免consul日志大量写入系统日志。
a.创建相应目录,匹配环境变量
创建bin目录、配置文件目录、日志目录:
cd /data/services/consul
mkdir {bin,log,conf,data}
将consul可执行文件移动到bin
目录:
mv consul bin/
将consul的bin
目录添加到PATH
,以下操作需要sudo
或root
权限:
sudo vim /etc/profile.d/consul.sh #为了避免误操作或环境变量配置不当导致系统命令失效,建议以服务或软件名的形式在/etc/profile.d目录下创建环境变量配置脚本,便于维护,不用时移除特定的脚本即可,对系统影响较小
脚本内容如下:
export CONSUL_HOME=/data/services/consul
export PATH=${PATH}:${CONSUL_HOME}/bin
#使用source执行脚本,使配置生效
source /etc/profile.d/consul.sh
至此,consul已经被添加到系统PATH
,可以在任意目录下进行调用consul命令。
cd /data/services
consul version
Consul v1.7.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
b.创建配置文件
cd /data/services/consul/conf
vim server.json
{
"bind_addr": "10.100.0.2",
"client_addr": "10.100.0.2",
"datacenter": "dc1",
"data_dir": "/data/services/consul/data",
"encrypt": "EXz7LFN8hpQ4id8EDYiFoQ==",
"log_level": "INFO",
"log_file": "/data/services/consul/log/consul.log", #配置日志文件与目录
"log_rotate_duration": "24h", #设置日志轮转
"enable_syslog": false, #禁止consul日志写入系统日志
"enable_debug": true,
"node_name": "Consul",
"server": true,
"ui": true,
"bootstrap_expect": 1, #此处设置为1,标识只需要一个投票即可成为leader,数字改太大会报错,提示集群中没有leader
"leave_on_terminate": false,
"skip_leave_on_interrupt": true,
"rejoin_after_leave": true,
"retry_join": [
"10.100.0.2:8301"
]
}
3、以server模式启动Consul
启动命令与dev模式
类似,只需要去掉dev模式
中的-dev
参数即可,如下:
consul agent -config-dir=/data/services/consul/conf
==> Starting Consul agent...
Version: 'v1.7.3'
Node ID: '0e2d44c2-af33-e222-5eb5-58b2c1f903d5'
Node name: 'Consul'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [10.100.0.2] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 10.100.0.2 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0e2d44c2-af33-e222-5eb5-58b2c1f903d5 Address:10.100.0.2:8300}]"
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 10.100.0.2:8300 [Follower]" leader=
2020-06-18T15:11:48.435+0800 [INFO] agent.server.serf.wan: serf: EventMemberJoin: Consul.dc1 10.100.0.2
2020-06-18T15:11:48.436+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
2020-06-18T15:11:48.436+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
2020-06-18T15:11:48.436+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
2020-06-18T15:11:48.436+0800 [INFO] agent: Started DNS server: address=10.100.0.2:8600 network=tcp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started DNS server: address=10.100.0.2:8600 network=udp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started HTTP server: address=10.100.0.2:8500 network=tcp
2020-06-18T15:11:48.436+0800 [INFO] agent: Started gRPC server: address=10.100.0.2:8502 network=tcp
2020-06-18T15:11:48.437+0800 [INFO] agent: started state syncer
==> Consul agent running!
2020-06-18T15:11:48.489+0800 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 10.100.0.2:8300 [Candidate]" term=2
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: election won: tally=1
2020-06-18T15:11:48.490+0800 [INFO] agent.server.raft: entering leader state: leader="Node at 10.100.0.2:8300 [Leader]"
2020-06-18T15:11:48.490+0800 [INFO] agent.server: cluster leadership acquired
2020-06-18T15:11:48.490+0800 [INFO] agent.server: New leader elected: payload=Consul
2020-06-18T15:11:48.501+0800 [INFO] agent.server.connect: initialized primary datacenter CA with provider: provider=consul
2020-06-18T15:11:48.501+0800 [INFO] agent.leader: started routine: routine="CA root pruning"
2020-06-18T15:11:48.501+0800 [INFO] agent.server: member joined, marking health alive: member=Consul
2020-06-18T15:11:48.615+0800 [INFO] agent: Synced node info
4、配置Consul优雅启动与重启
在上一步的启动操作中,需要使用命令行带参数启动,为了方便管理,将consul服务添加到systemd
,可以进行优雅启动与停止。为了服务安全,可以先创建一个不可登录shell类型的用户来对consul服务进行管理:
sudo useradd -M -s /sbin/nologin consul
更改consul服务目录属主:
sudo chown -R consul.consul /data/services/consul
添加systemd
管理单元,此处与dev模式
配置相似,只需要去除dev模式
下启动命令中的-dev
参数即可:
sudo vim /usr/lib/systemd/system/consul.service
[Unit]
Description=Consul-node1
Documentation=https://www.consul.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
User=consul
Group=consul
Type=simple
ExecStart=/data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf >/dev/null 2>&1
[Install]
WantedBy=multi-user.target
重载systemd
配置:
sudo systemctl daemon-reload
5、Consul服务优雅启动、停止、重启
使用systemd
启动consul服务:
sudo systemctl start consul
使用systemd
查看consul服务的状态:
sudo systemctl status consul
● consul.service - Consul
Loaded: loaded (/usr/lib/systemd/system/consul-node1.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-06-18 15:41:48 CST; 18s ago
Docs: https://www.consul.io/docs/
Main PID: 2217 (consul)
CGroup: /system.slice/consul.service
└─2217 /data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.732+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.733+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
Jun 18 15:41:52 localhost consul[2217]: 2020-06-18T15:41:52.582+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
Hint: Some lines were ellipsized, use -l to show in full.
使用systemd
停止consul服务:
sudo systemctl stop consul
使用systemd
重启consul服务:
sudo systemctl restart consul
三、搭建3节点集群
此模式比较适合对服务可靠性要求较高的生产环境,如您并不打算直接用于生产环境或只是学习体验,请跳过此部分内容。此部分也是整个文档中消耗成本比较高的搭建方法。
1、规划与准备
主机规划:
主机用途 | 主机IP |
---|---|
Consul-server1 | 10.100.0.2 |
Consul-server2 | 10.100.0.3 |
Consul-server3 | 10.100.0.4 |
Consul-agent | 10.100.0.5 |
以上主机均需要在安全组
或防火墙
中配置能互相访问。
2、下载安装Consul
使用wget
下载安装包至/data/pkgs
目录:
cd /data/pkgs
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip
将安装包解压至/data/services
目录,并重命名为consul
:
unzip consul_1.7.3_linux_amd64.zip -d /data/services/consul
Archive: consul_1.7.3_linux_amd64.zip
inflating: /data/services/consul/consul
查看目录信息:
ls -l /data/services/consul
total 105444
-rwxr-xr-x 1 centos centos 107970750 May 6 06:50 consul
注意:
以上操作需要在每一台机器上操作,每台机器都需要安装Consul服务。
3、配置Consul
通过上一步的操作,已经基本完成了consul软件的安装,但为了使用方便,我们还需要将consul可执行文件所在目录加入PATH
,以方便在任何地方调用还需要创建配置文件目录来避免启动时需要加很多参数,另外需要将日志写入特定目录,避免consul日志大量写入系统日志。
a.创建相应目录,匹配环境变量
创建bin目录、配置文件目录、日志目录:
cd /data/services/consul
mkdir {bin,log,conf,data}
将consul可执行文件移动到bin
目录
mv consul bin/
将consul的bin
目录添加到PATH
,以下操作需要sudo
或root
权限:
sudo vim /etc/profile.d/consul.sh #为了避免误操作或环境变量配置不当导致系统命令失效,建议以服务或软件名的形式在/etc/profile.d目录下创建环境变量配置脚本,便于维护,不用时移除特定的脚本即可,对系统影响较小
脚本内容如下:
export CONSUL_HOME=/data/services/consul
export PATH=${PATH}:${CONSUL_HOME}/bin
#使用source执行脚本,使配置生效
source /etc/profile.d/consul.sh
至此,consul已经被添加到系统PATH
,可以在任意目录下进行调用consul命令。
cd /data/services
consul version
Consul v1.7.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
b.创建Consul-Server服务配置文件
cd /data/services/consul/conf
vim server.json
{
"bind_addr": "10.100.0.2", #写server所在服务器的IP
"client_addr": "10.100.0.2", #写serve所在服务器的IP,或者直接写127.0.0.1,如果写127.0.0.1,就不能直接从外部使用该server提供的客户端访问集群
"datacenter": "dc1",
"data_dir": "/data/services/consul/data",
"encrypt": "EXz7LFN8hpQ4id8EDYiFoQ==", #此处配置的加密字符串所有节点必须统一,否则通讯会异常
"log_level": "INFO",
"log_file": "/data/services/consul/log/consul.log", #配置日志文件与目录
"log_rotate_duration": "24h", #设置日志轮转
"enable_syslog": false, #禁止consul日志写入系统日志
"enable_debug": true,
"node_name": "Consul",
"server": true,
"ui": true,
"bootstrap_expect": 3,
"leave_on_terminate": false,
"skip_leave_on_interrupt": true,
"rejoin_after_leave": true,
"retry_join": [
"10.100.0.2",
"10.100.0.3",
"10.100.0.4"
]
}
注意:
以上操作仅在server节点配置,agent配置与此处有不同的地方。
c、创建Consul-Agent配置文件
cd /data/services/consul/conf
vim agent.json
{
"bind_addr": "10.100.0.5", #此处为服务的监听地址,可以写127.0.0.1
"client_addr": "10.100.0.5", #此处写节点的网卡地址,便于外部访问,此IP将会是访问集群的统一入口
"datacenter": "dc1",
"data_dir": "/data/services/consul/agent/data",
"encrypt": "EXz7LFN8hpQ4id8EDYiFoQ==", #此处加密字符串应当与server端保持一致,不然会导致通讯异常
"log_level": "INFO",
"log_file": "/data/services/consul/agent/log/consul.log",
"log_rotate_duration": "24h",
"enable_syslog": false,
"enable_debug": true,
"node_name": "ConsulClient",
"ui": true,
"server": false,
"rejoin_after_leave": true,
"retry_join": [
"10.100.0.2",
"10.100.0.3",
"10.100.0.4"
]
}
4、启动server节点与agent节点
使用如下命令启动:
consul agent -config-dir=/data/services/consul/conf
==> Starting Consul agent...
Version: 'v1.7.3'
Node ID: '0e2d44c2-af33-e222-5eb5-58b2c1f903d5'
Node name: 'Consul'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [10.100.0.2] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 10.100.0.2 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0e2d44c2-af33-e222-5eb5-58b2c1f903d5 Address:10.100.0.2:8300}]"
#部分内容因篇幅被删除,仅保留开头部分
注意:
以上操作需要在每一台节点执行。
5、配置Consul优雅启动与重启
在上一步的启动操作中,需要使用命令行带参数启动,为了方便管理,将consul服务添加到systemd
,可以进行优雅启动与停止。为了服务安全,可以先创建一个不可登录shell类型的用户来对consul服务进行管理:
sudo useradd -M -s /sbin/nologin consul
更改consul服务目录属主:
sudo chown -R consul.consul /data/services/consul
添加systemd
管理单元,此处与dev模式配置相似,只需要去除dev模式下启动命令中的-dev
参数即可:
sudo vim /usr/lib/systemd/system/consul.service
[Unit]
Description=Consul-node1
Documentation=https://www.consul.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
User=consul
Group=consul
Type=simple
ExecStart=/data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf >/dev/null 2>&1
[Install]
WantedBy=multi-user.target
重载systemd
配置
sudo systemctl daemon-reload
6、Consul服务优雅启动、停止、重启
使用systemd
启动consul服务:
sudo systemctl start consul
使用systemd
查看consul服务的状态:
sudo systemctl status consul
● consul.service - Consul
Loaded: loaded (/usr/lib/systemd/system/consul-node1.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-06-18 15:41:48 CST; 18s ago
Docs: https://www.consul.io/docs/
Main PID: 2217 (consul)
CGroup: /system.slice/consul.service
└─2217 /data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.732+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.733+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
Jun 18 15:41:52 localhost consul[2217]: 2020-06-18T15:41:52.582+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
Hint: Some lines were ellipsized, use -l to show in full.
使用systemd
停止consul服务:
sudo systemctl stop consul
使用systemd
重启consul服务:
sudo systemctl restart consul
四、搭建单机3节点集群
在上一步的过程中,搭建了一个3节点集群,但是这种方式需要较多数量的服务器,成本方面来说不太友好。在使用过程中出于成本考虑,需要使用一个3节点集群,但因为在网上没有找到类似的教程来搭建单机3节点的教程,只能查看官方文档中一些配置详解来实现单机3节点,以下是搭建方式。
1、简单规划
节点用途 | 节点主机IP | 节点客户端HTTP端口 | 节点DNS端口 | 节点serf_lan端口 | 节点serf_wan端口 | 节点server端口 |
---|---|---|---|---|---|---|
Consul-Server1 | 10.100.0.2 | 8501 | 8601 | 8001 | 8002 | 8000 |
Consul-Server2 | 10.100.0.2 | 8502 | 8602 | 8101 | 8102 | 8100 |
Consul-Server3 | 10.100.0.2 | 8503 | 8603 | 8201 | 8202 | 8200 |
Consul-Agent | 10.100.0.2 | 8500(默认) | 8600(默认) | - | - | - |
2、下载安装Consul
使用wget
下载安装包至/data/pkgs
目录:
cd /data/pkgs
wget https://releases.hashicorp.com/consul/1.7.3/consul_1.7.3_linux_amd64.zip
将安装包解压至/data/services目录,并重命名为consul:
unzip consul_1.7.3_linux_amd64.zip -d /data/services/consul
Archive: consul_1.7.3_linux_amd64.zip
inflating: /data/services/consul/consul
查看目录信息:
ls -l /data/services/consul
total 105444
-rwxr-xr-x 1 centos centos 107970750 May 6 06:50 consul
3、配置Consul
a、创建相应目录,配置多节点
创建各节点目录:
cd /data/services/consul
mkdir -p node{1..3}/{bin,conf,data,log}
mkdir -p agent/{bin,conf,data,log}
创建完成后,目录结构大致如下:
tree
.
├── agent
│ ├── bin
│ ├── conf
│ ├── data
│ │ └── serf
│ └── log
├── node1
│ ├── bin
│ ├── conf
│ ├── data
│ │ ├── raft
│ │ │ └── snapshots
│ │ └── serf
│ └── log
├── node2
│ ├── bin
│ ├── conf
│ ├── data
│ │ ├── raft
│ │ │ └── snapshots
│ │ └── serf
│ └── log
└── node3
├── bin
├── conf
├── data
│ ├── raft
│ │ └── snapshots
| └── serf
└── log
将可执行文件复制到各节点的bin
目录:
cd /data/services/consul
cp consul node1/bin/
cp consul node1/bin/
cp consul node1/bin/
cp consul agent/bin/
b、创建Consul-Server服务配置文件
以Server1节点为例:
cd /data/services/consul/node1/conf
vim server.json
{
"bind_addr": "10.100.0.2",
"client_addr": "127.0.0.1",
"ports": {
"http": 8501, #其余server节点需要按照规划的端口进行配置
"dns": 8601, #其余server节点需要按照规划的端口进行配置
"serf_lan": 8001, #其余server节点需要按照规划的端口进行配置
"serf_wan": 8002, #其余server节点需要按照规划的端口进行配置
"server": 8000 #其余server节点需要按照规划的端口进行配置
},
"datacenter": "dc1",
"data_dir": "/data/services/consul/node1/data", #此处注意目录名称,写对应server节点的目录名称,如:/data/services/consul/node2/data
"encrypt": "EXz7LFN8hpQ4id8EDYiFoQ==", #此处需要与其他节点一致
"log_level": "INFO",
"log_file": "/data/services/consul/node1/log/consul.log", #此处注意目录名称,每个节点目录名称不一样
"log_rotate_duration": "24h",
"enable_syslog": false,
"enable_debug": true,
"node_name": "ConsulServer1", #此处需要注意,按照规划的名称填写即可
"disable_host_node_id": true, #禁用主机信息生成节点ID
"server": true,
"ui": true,
"bootstrap_expect": 3,
"leave_on_terminate": false,
"skip_leave_on_interrupt": true,
"rejoin_after_leave": true,
"retry_join": [
"10.100.0.2:8001",
"10.100.0.2:8101",
"10.100.0.2:8201"
]
}
c、创建Consul-Agent配置文件
cd /data/services/consul/agent/conf
vim agent.json
{
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"datacenter": "dc1",
"data_dir": "/data/services/consul/agent/data",
"encrypt": "EXz7LFN8hpQ4id8EDYiFoQ==",
"log_level": "INFO",
"log_file": "/data/services/consul/agent/log/consul.log",
"log_rotate_duration": "24h",
"enable_syslog": false,
"enable_debug": true,
"node_name": "ConsulClient",
"ui": true,
"disable_host_node_id": true, #禁用主机信息生成的节点ID
"server": false,
"rejoin_after_leave": true,
"retry_join": [
"10.100.0.2:8001",
"10.100.0.2:8101",
"10.100.0.2:8201"
]
}
4、启动Consul节点
以Server1节点为例,使用如下命令启动:
cd /data/services/consul/node1/bin #此处注意目录,启动相应的节点需要切换到相应的目录
./consul agent -config-dir=/data/services/consul/node1/conf
==> Starting Consul agent...
Version: 'v1.7.3'
Node ID: '0e2d44c2-af33-e222-5eb5-58b2c1f903d5'
Node name: 'Consul'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [10.100.0.2] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 10.100.0.2 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2020-06-18T15:11:48.435+0800 [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0e2d44c2-af33-e222-5eb5-58b2c1f903d5 Address:10.100.0.2:8300}]"
#部分内容因篇幅被删除,仅保留开头部分
注意:
以上操作需要在每一台节点执行,执行的时候注意切换到相应节点的目录。
5、配置Consul优雅启动与重启
在上一步的启动操作中,需要使用命令行带参数启动,为了方便管理,将consul服务添加到systemd
,可以进行优雅启动与停止。为了服务安全,可以先创建一个不可登录shell类型的用户来对consul服务进行管理:
sudo useradd -M -s /sbin/nologin consul
更改consul服务目录属主:
sudo chown -R consul.consul /data/services/consul
以Server1节点为例,添加systemd
管理单元,此处与dev模式配置相似,只需要去除dev模式下启动命令中的-dev
参数即可:
sudo vim /usr/lib/systemd/system/consul-node1.service #此处注意文件名,如果是agent,则文件名改为consul-agent.service(Server1对应consul-node1.service,Server2对应consul-node2.service,依次类推)
[Unit]
Description=Consul-node1 #服务描述
Documentation=https://www.consul.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
User=consul
Group=consul
Type=simple
ExecStart=/data/services/consul/node1/bin/consul agent -config-dir=/data/services/consul/node1/conf >/dev/null 2>&1 #注意此处目录
[Install]
WantedBy=multi-user.target
重载systemd
配置:
sudo systemctl daemon-reload
注意:
在添加
systemd
管理单元时需要将创建的文件名与要管理的节点匹配,在此文档中,Server1节点对应的文件名是consul-node1.service,Server2对应的文件名是consul-node2.service,Server3对应的文件名是consul-node3.service,Agent对应的文件名是consul-agent.service。
6、Consul服务的优雅启动、停止与重启
以Server1为例,使用systemd
启动consul服务:
sudo systemctl start consul-node1 #如果要启动Serve2,则使用sudo systemctl start consul-node2
使用systemd
查看consul服务的状态:
sudo systemctl status consul-node1 #如果要查看Serve2,则使用sudo systemctl status consul-node2
● consul.service - Consul
Loaded: loaded (/usr/lib/systemd/system/consul-node1.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-06-18 15:41:48 CST; 18s ago
Docs: https://www.consul.io/docs/
Main PID: 2217 (consul)
CGroup: /system.slice/consul.service
└─2217 /data/services/consul/bin/consul agent -config-dir=/data/services/consul/conf
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.732+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=Consul.dc1 area=wan
Jun 18 15:41:50 localhost consul[2217]: 2020-06-18T15:41:50.733+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: Consul 10.100.0.2
Jun 18 15:41:52 localhost consul[2217]: 2020-06-18T15:41:52.582+0800 [INFO] agent.server: Adding LAN server: server="Consul (Addr: tcp/10.100.0.2:8300) (DC: dc1)"
Hint: Some lines were ellipsized, use -l to show in full.
使用systemd
停止consul服务:
sudo systemctl stop consul-node1 #如果要停止Serve2,则使用sudo systemctl stop consul-node2
使用systemd
重启consul服务:
sudo systemctl restart consul-node1 #如果要重启Serve2,则使用sudo systemctl restart consul-node2
五、踩坑总结
1、单节点启动server时不能选举出leader
针对此问题,需要在配置文件中添加如下参数(案例中已经添加):
bootstrap-expect
并且将该参数的值设置为1,如下:
"bootstrap_expect": 1
2、搭建单机3节点能成功启动,但日志中提示未选举出leader
,客户端访问报500
此问题的原因是集群中的node节点都使用了同一个node_id
(通过分析日志发现的,在节点的通信中都标识了同一个node_id
),但在配置文件中又设定了bootstrap-expect
的值为3,此时集群中没有足够的投票选举出leader
。针对此问题有两种解决方法。
a、方法一:
修改节点data
目录下的node-id
文件,以Server1节点为例:
cd /data/services/consul/node1/data
使用tree命令查看,目录结构如下:
tree
.
├── checkpoint-signature
├── node-id
├── raft
│ ├── peers.info
│ ├── raft.db
│ └── snapshots
│ ├── 7-131089-1592693864841
│ │ ├── meta.json
│ │ └── state.bin
│ └── 7-147478-1592808873974
│ ├── meta.json
│ └── state.bin
└── serf
编辑node-id文件:
vim node-id
6905298b-fd50-6423-2c42-1ddaf123e120
注意:
需要将每个Server节点的id改成唯一的,不可与其他Server节点重复。
b、方法二:
从产生问题的根本原因入手,之所以所有节点会有相同的node_id
是因为Consul默认使用服务器的主机硬件信息等经过特定的算法生成一个node_id
,因为3个Server节点都部署在同一台主机上,所以其node_id
都使用了同一个。
解决此问题需要在服务启动时加入如下参数:
disable-host-node-id
在配置文件中添加该参数,并将其值设置为true
,如下:
"disable_host_node_id": true
在本文档中搭建Consul单机3节点集群的配置中,已经加入了该配置。
3、使用agent客户端访问webUI查看节点信息,发现每一个节点在webUI中都被标记为leader
原因:webUI中默认使用IP标记leader
,由于我们三个节点都在同一主机上,且只有一张网卡,服务监听的IP都是同一个IP,所以在webUI上显示每个节点都被标记成了leader
。不影响使用。
排错与确定问题方法:
在主机的agent
节点上执行命令行,通过命令行查看集群信息,命令如下:
cd /data/services/consul/agent/bin
./consul operator raft list-peers
Node ID Address State Voter RaftProtocol
ConsulServer1 6905298b-fd50-6423-2c42-1ddaf123e120 10.100.0.2:8000 follower true 3
ConsulServer3 e927bbfa-e067-a84f-93ea-6712cf1db7f8 10.100.0.2:8200 follower true 3
ConsulServer2 38e5b263-b848-dfb6-d197-115ca2da40e7 10.100.0.2:8100 leader true 3
通过命令行可以看到,集群中3个节点,只有一个节点是leader
。
其他说明:
因本人认知范围和技能水平有限,文档中难免存在描述不当或表达有误的地方,如有此类问题,尽请谅解。