Linux - drbd9+corosync+pacemaker

2020-03-03 本文已影响0人找呀找提莫

1.1部署drbd9

1. 安装drbd-utils

yum install -y gcc gcc-c++ make autoconf automake flex kernel-devel libxslt libxslt-devel asciidoc

tar -zxvf drbd-utils-9.10.0.tar.gz

cd drbd-utils-9.10.0

./autogen.sh

./configure --prefix=/usr/local/drbd-utils-9.10.0 \
--localstatedir=/var \
--sysconfdir=/etc \
--without-83support \
--without-84support \
--without-manual

make KDIR=/usr/src/kernels/3.10.0-693.el7.x86_64

make install

cp scripts/drbd-overview.pl /usr/bin/drbd-overview

2. 安装drbd


tar -xf drbd-9.0.19-1.tar.gz

cd drbd-9.0.19-1

make

make install

3. 加载模块

depmod

modprobe drbd

lsmod | grep drbd
drbd 555120 0
libcrc32c 12644 2 xfs,drbd

4. 配置drbd

vim /etc/drbd.d/global_common.conf
global {
 usage-count yes;
 udev-always-use-vnr;
}
common {
 handlers {
 }
 startup {
 wfc-timeout 100;
 degr-wfc-timeout 120;
 }
 options {
 auto-promote yes;
 }
 disk {
 }
 net {
 protocol C;
 # trasport "tcp";
 }
}

vim /etc/drbd.d/data.res
resource data {
 on node1 {
 node-id 0;
 device /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-disk internal;
 address ipv4 192.168.111.129:7788;
 }
 on node2 {
 node-id 1;
 device /dev/drbd0 minor 0;
 disk /dev/sdb;
 meta-disk internal;
 address ipv4 192.168.111.130:7788;
 }
 connection-mesh {
 hosts node1 node2;
 }
 disk {
 resync-rate 100M;
 }
}

5. 启动drbd

systemctl start drbd
systemctl enable drbd

6. 初始化drbd分区

1）创建元数据

dd if=/dev/zero of=/dev/sdb bs=1M count=100
drbdadm create-md data
drbdadm primary --force data

2）查看初始化进度

drbdadm status data

资源同步中：
data role:Primary
 disk:UpToDate
 node2 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:20.30

资源同步结束：
data role:Secondary
 disk:UpToDate
 node1 role:Primary
 peer-disk:UpToDate

3）格式化drbd分区

mkfs.xfs /dev/drbd0

1.2 部署crorsync

1. 安装corosync

yum install -y corosync

2. 配置corosync(centos7使用 pcs服务会自动生成配置文件，不用做这一步)

1）修改corosync配置文件

cd /etc/corosync
cp corosync.conf.example.udpu corosync.conf

说明：
corosync.conf.example corosync使用组播通信样例
corosync.conf.example.udpu corosync使用单播通信样例

vim corosync.conf
totem {
 version: 2
 crypto_cipher: aes256
 crypto_hash: sha1
 token: 10000
 interface {
 ringnumber: 0
 bindnetaddr: 192.168.111.0
 mcastport: 5405
 ttl: 1
 }
 transport: udpu
}
logging {
 fileline: off
 to_logfile: no
 to_syslog: yes
 logfile: /var/log/cluster/corosync.log
 debug: off
 timestamp: on
 logger_subsys {
 subsys: QUORUM
 debug: off
 }
}
nodelist {
 node {
 ring0_addr: 192.168.111.129
 nodeid: 1
 }
 node {
 ring0_addr: 192.168.111.130
 nodeid: 2
 }
}
quorum {
 provider: corosync_votequorum
}

2）生成corosync认证文件

corosync-keygen -l
scp corosync.conf authkey node2:/etc/corosync/

3. 启动corosync

systemctl start corosync
systemctl enable corosync

1.3 部署pacemaker

1. 安装pacemaker

yum install -y pacemaker

2. 启动pacemaker

systemctl start pacemaker
systemctl enable pacemaker

1.4 部署pcs

1. 说明

使用pcs管理corosync+pacemaker需要关闭corosync和pacemaker的启动和自启动；corosync和pacemaker的启停由pcsd服务接管。

2. 安装pcs

yum install -y pcs

3. 启动pcsd

systemctl start pcsd
systemctl enable pcsd

4. 配置pcsd

1）设置hacluster账户密码

passwd hacluster
Changing password for user hacluster.
New password: Password@123
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: Password@123
passwd: all authentication tokens updated successfully.

2）用户认证

pcs cluster auth node1 node2
Username: hacluster
Password: Password@123
node1: Authorized
node2: Authorized

3）同步corosync配置

pcs cluster setup --name hacluster node1 node2
Error: node1: node is already in a cluster
Error: node2: node is already in a cluster
Error: nodes availability check failed, use --force to override. WARNING: This will destroy existing cluster on the nodes.

pcs cluster setup --name hacluster node1 node2 --force
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node2: Successfully destroyed cluster
node1: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

4）启动集群

pcs cluster start --all
node1: Starting Cluster...
node2: Starting Cluster...

5）集群开机自启动

pcs cluster enable --all
node1: Cluster Enabled
node2: Cluster Enabled

6）检查corosync安装

corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
 id = 192.168.111.129
 status = ring 0 active with no faults
Printing ring status.
Local node ID 2
RING ID 0
 id = 192.168.111.130
 status = ring 0 active with no faults

7）禁用stonish设备
如果没有Fence，建议禁用STONITH；

pcs property set stonith-enabled=false

8）无仲裁时选择忽略
正常集群Quorum（法定）需要半数以上的票数，如果是双节点的集群则配置忽略；

pcs property set no-quorum-policy=ignore

9）检查配置，没有报错

crm_verify -L -V

10）查看pcs设备配置

pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: hacluster
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

5. 添加pcsd资源

1）添加VIP

pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.111.131 cidr_netmask=24 op monitor interval=30s --group group

使用资源组约束 <--group 组名>

2）添加drbd挂载

pcs resource create drbdstone ocf:heartbeat:Filesystem device=/dev/drbd0 directory=/mydata fstype=xfs op monitor interval=30s --group pachira

3）添加mysql资源
mysql的安装过程就不再赘述；

pcs resource create mysqld service:mysqld op monitor interval=30s --group group

pcs resource create pgsql service:postgresql op monitor interval=30s --group group

6. 问题说明

1）查看集群状态

pcs status
Cluster name: hacluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Jul 18 01:48:50 2019
Last change: Thu Jul 18 01:34:24 2019 by hacluster via crmd on node1
2 nodes configured
3 resources configured
Online: [ node1 node2 ]
Full list of resources:
 Resource Group: group
 vip  (ocf::heartbeat:IPaddr2): Started node2
 drbdstone (ocf::heartbeat:Filesystem): Started node2
 mysqld  (service:mysqld): Started node2
Daemon Status:
 corosync: active/disabled
 pacemaker: active/disabled
 pcsd: active/enabled

2）资源unknown error
现象：

pcs status
Cluster name: hacluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Jul 18 01:32:22 2019
Last change: Thu Jul 18 01:31:57 2019 by root via cibadmin on node1
2 nodes configured
3 resources configured
Node node1: standby
Online: [ node2 ]
Full list of resources:
 Resource Group: pachira
 vip  (ocf::heartbeat:IPaddr2): Started node2
 drbdstone (ocf::heartbeat:Filesystem): Started node2
 mysqld  (service:mysqld): Stopped
Failed Actions:
* mysqld_start_0 on node2 'unknown error' (1): call=56, status=complete, exitreason='none',
 last-rc-change='Thu Jul 18 01:32:05 2019', queued=0ms, exec=1036ms
Daemon Status:
 corosync: active/disabled
 pacemaker: active/disabled
 pcsd: active/enabled

解决方法：

（1）找到资源为正确启动的原因；

（2）执行 pcs resource cleanup mysqld 即可自动恢复；
Cleaning up vip on node1, removing fail-count-vip
Cleaning up vip on node2, removing fail-count-vip
Cleaning up drbdstone on node1, removing fail-count-drbdstone
Cleaning up drbdstone on node2, removing fail-count-drbdstone
Cleaning up mysqld on node1, removing fail-count-mysqld
Cleaning up mysqld on node2, removing fail-count-mysqld
Waiting for 6 replies from the CRMd...... OK

3）模拟切机和恢复

将集群的一个节点设置为standby
pcs node standby node1
pcs node unstandby node1

4）删除配置错误资源

pcs resource delete mysqld

1.5 pcsd图形界面

1. 修改pcsd服务图形界面端口

pcsd默认端口是在tcp6的2224，需要手动配置为ipv4端口；

netstat -antp | grep 2224 | grep LIST
tcp6 0 0 :::2224 :::*              LISTEN 3831/ruby

vim /usr/lib/pcsd/ssl.rb
webrick_options = {
 :Port => 2224,
 #:BindAddress => primary_addr,
 #:Host => primary_addr,
 :BindAddress => '0.0.0.0',
 :Host => '0.0.0.0',

systemctl restart pcsd

netstat -antp | grep 2224 | grep LIST
tcp 0 0 0.0.0.0:2224 0.0.0.0:* LISTEN 96985/ruby

2. 访问https://IP:2224

pcs1.png

pcs2.png

pcs3.png

pcs4.png

然后就能看到之前命令行配置的集群了