小型架构实践--Mysql双主+corosync+NFS

2018-01-30 本文已影响32人飞翔的Tallgeese

IP规划：

Master01:192.168.40.100

Master02:192.168.40.101

NFS：192.168.40.110

VIP：192.168.40.150

前置条件：

1.NFS搭建完成

2.完成防火墙配置（打开5404,5405,5406三个端口）

iptables -I INPUT -i eth0 -p udp -m multiport --dports 5404,5405,5406 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT

3.Selinux关闭

setenforce 0

sed -i 's/enforcing/disabled/g' /etc/sysconfig/selinux

参考文章：小型架构实践--NFS环境搭建

4.务必阅读CRM配置

参考文章：小型架构实践--corosync配置及注意事项

CRM配置可以留待后面阅读，但该文前面的重要补充关系到一个因为安装顺序引起的报错，务必先行阅读

#############################################

插一点mysql版本的知识

mysql分三种版本

源码，二进制，rpm包

源码需要配合cmake进行编译，据说耦合度最好，性能最好

二进制解压后进行简单配置几乎可以直接使用，相当于linux上面的绿色版，移植性最好

RPM包采用yum/rpm 进行安装，省心

参考目前网上的安装教程，采用共享存储的安装必须使用二进制包！！！

提供2个mysql的下载地址

下载源1

下载源2

源1的版本要比源2更新一些

#############################################

（考虑到后面2个节点有很多文件需要互传，因此优先配置SSH互信是一个比较不错的选择）

1.配置SSH互信

Master01端

echo '192.168.40.100 Master01' >> /etc/hosts

echo '192.168.40.101 Master02' >> /etc/hosts

Master02端

echo '192.168.40.100 Master01' >> /etc/hosts

echo '192.168.40.101 Master02' >> /etc/hosts

切回Master01端

ssh-keygen -P '' -f /root/.ssh/id_rsa -t rsa

ssh-copy-id -i /root/.ssh/id_rsa.pub Master02

Master02端

ssh-keygen -P '' -f /root/.ssh/id_rsa -t rsa

ssh-copy-id -i /root/.ssh/id_rsa.pub Master01

SSH互信配置完毕

2.Mysql部署部分

Master01端

Mysql共用NFS的必要条件之一就是id mysql的值，无论Uid还是gid、Gid都必须一致，所以这里统一设定为502了；在NFS和Master02端也做同样的设置，不再赘述

groupadd -g 502 mysql

useradd -u 502 mysql -g mysql -G mysql

mkdir /data

chown -R mysql.mysql /data

mount -t nfs 192.168.40.110:/data /data

（把192.168.40.110:/data /data nfs rw 0 0加到/etc/fstab里）

mkdir -p /root/qimo/tools/

cd /root/qimo/tools/

wget http://ftp.ntu.edu.tw/MySQL/Downloads/MySQL-5.6/mysql-5.6.38-linux-glibc2.12-x86_64.tar.gz

tar -xf mysql-5.6.38-linux-glibc2.12-x86_64.tar.gz

mv mysql-5.6.38-linux-glibc2.12-x86_64 mysql

cp -r mysql /usr/local/

cd /usr/local/mysql

在做下一步之前rpm -qa|grep mysql看一下，有些系统会自带一个5.1的mysql，习惯上我会yum remove掉这个

cp support-files/my-default.cnf /etc/my.cnf

cp support-files/mysql.server /etc/rc.d/init.d/mysqld

vi /etc/my.cnf

在[mysqld]下添加：

log_bin=/usr/local/mysql/log/binlog

binlog_format= mixed

log-error=/usr/local/mysql/log/mysql.err

basedir =/usr/local/mysql

datadir = /data

保存

vi /etc/rc.d/init.d/mysqld

datadir=/data

保存

yum -y install numactl（缺失这个包有可能导致初始化失败，报错为error while loading shared libraries: libnuma.so.1）

mkdir -p /usr/local/mysql/log

touch /usr/local/mysql/log/mysql.err

chown -R mysql.mysql /usr/local/mysql

执行初始化

scripts/mysql_install_db --user=mysql --basedir=/usr/local/mysql --datadir=/data

这里要说一下初始化，my.cnf里面有定义basedir，datadir，那么这里初始化的时候最好就写出来，否则很容易造成配置和实际不符而导致启动失败

service mysqld start

Starting MySQL. SUCCESS!

service mysqld stop

umount /data

echo ‘export MySQL_HOME=/usr/local/mysql’ >> /etc/profile

echo ‘export PATH=$PATH:$MySQL_HOME/bin’ >> /etc/profile

source /etc/profile

添加到chkconfig中（在本次实验中不要做这个操作）

chkconfig --add /etc/init.d/mysqld

在pacemaker的配置中，虚拟IP（VIP），共享存储（NFS），数据库（Mysql）被定义为3个资源，这3个资源的启动顺序是有要求的，并且完全由pacemaker去支配，按照要求，mysql必须在NFS启动之后才能启动，因此mysql切忌不能设定为开机自启

chkconfig mysqld off #一定不要自启动

Master01设置完毕，将部分可用资源拷贝到Master02

scp /etc/my.cnf 192.168.40.101:/etc/

scp /etc/rc.d/init.d/mysqld 192.168.40.101:/etc/rc.d/init.d/

scp -r /root/qimo/tools/mysql 192.168.40.101:/usr/local/

Master02端

cd /usr/local/mysql

mkdir -p log

touch log/mysql.err

chown -R mysql.mysql /usr/local/mysql

（确定/etc/fstab里面已经有做配置 192.168.40.110:/data /data nfs rw 0 0）

mount -a

Master02不需要进行初始化，直接采用刚才01生成的/data即可

service mysqld start

Starting MySQL. SUCCESS!

service mysqld stop

umount /data

echo ‘export MySQL_HOME=/usr/local/mysql’ >> /etc/profile

echo ‘export PATH=$PATH:$MySQL_HOME/bin’ >> /etc/profile

source /etc/profile

chkconfig mysqld off #一定不要自启动

至此，2个节点上的Mysql配置完结

###################

还是说一下安装mysql过程中遇到的问题（其实上面黑体部分就是我出问题的部分）

①.my.cnf和初始化那一段定义的参数一定要一致，否则service mysqld start的时候会看到错误

②.mysql一定是安装二进制版本，我之前用的源码，Master01倒是正常安装了，在安装Master02的时候不需要初始化，这时候源码根本不知道如何下手。

....

3.corosync配置

正常来说，接下来应该是配置ntp同步，但是考虑到corosync配置过程中要生成一个奇葩的authkey，所以我把这步提前了

Master01端

yum Install corosync -y

vi /etc/corosync/corosync.conf

compatibility: whitetank

aisexec {

# Run as root - this is necessary to be able to manage resources with Pacemaker

user: root

group: root

}

service {

# Load the Pacemaker Cluster Resource Manager

name: pacemaker

ver: 0

use_mgmtd: yes

use_logd: yes

}

totem {

version: 2

crypto_cipher: none

crypto_hash: none

interface {

ringnumber: 0

bindnetaddr: 192.168.40.0

mcastaddr: 239.255.1.1

mcastport: 5405

ttl: 1

}

nodelist {

node {

ring0_addr: 192.168.40.100

nodeid: 1

}

node {

ring0_addr: 192.168.40.101

nodeid: 2

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

logfile: /var/log/cluster/corosync.log

to_syslog: no

debug: off

timestamp: on

logger_subsys {

subsys: AMF

debug: off

}

amf {

mode: disabled

}

上面配置里面的几个加黑段说明一下，192.168.40.0是Master01和02所在的网段

239.255.1.1是广播地址段，照着写不用改

重点来了，authkey的生成

corosync-keygen

然后就会看到下面这一堆

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from /dev/random.

Press keys on your keyboard to generate entropy.

Press keys on your keyboard to generate entropy (bits = 104).

这一堆的意思是需要有1024bit的随机数，目前只有104bit，有趣的是随机数并不是自己输入的，而是系统的熵（shang）池

解决方法是再打开一个Master01的session，去执行一些其他操作，比如说我接着执行本篇文章的

第4步配置ntp同步，第5步yum install pacemaker -y，这时候你就会发现bits数在不断增加，直到达到1024之后，在/etc/corosync下就会生成一个authkey

预计第4步和第5步执行完成后，这个authkey应该能生成^_^

贴一下之前安装时不断更新的提示

Press keys on your keyboard to generate entropy (bits = 224).

Press keys on your keyboard to generate entropy (bits = 272).

Press keys on your keyboard to generate entropy (bits = 320).

....

（下跳到第4步和第5步，照着做完了再来看这里吧）

Master01端

ll /etc/corosync/

...

-r--------. 1 root root 128 Jan 30 22:20 authkey

默认权限为400

完成节点2上面corosync的安装，并将2个配置文件从节点1拷贝到节点2

ssh Master02 yum install -y corosync

scp -r /etc/corosync/authkey 192.168.40.101:/etc/corosync/

scp -r /etc/corosync/corosync.conf 192.168.40.101:/etc/corosync/

4.配置NTP同步（为了生成authkey而奋斗~）

Master01端

yum install -y ntp ntpdate

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime #修改时区为东8区

cp: overwrite `/etc/localtime'? y

[root@Master01 ~]# service ntpdate start

ntpdate: Synchronizing with time server: [ OK ]

[root@Master01 ~]# date -R

Tue, 30 Jan 2018 14:45:34 +0800 #+0800是东8区

Master02端

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

cp: overwrite `/etc/localtime'? y

[root@Master02 ~]# service ntpdate start

ntpdate: Synchronizing with time server: [ OK ]

[root@Master02 Asia]# date -R

Tue, 30 Jan 2018 14:43:41 +0800

（PS：手动同步：[root@Master02 corosync]# ntpdate pool.ntp.org）

5.pacemaker，pssh，crmsh的安装

yum install -y pacemaker

将pssh和crmsh包拷贝到Master01上

下载地址：

crmsh和pssh资源包

crmsh依赖于pssh和python，所以要先对pssh进行安装

安装过程中可能会提示缺少python包，均可以通过yum安装解决

yum install -y pssh-2.3.1-5.el6.noarch.rpm

yum install -y crmsh-1.2.6-4.el6.x86_64.rpm

需要注意的是上面链接里面的资源包除开上面2个以外，还有crmsh3.0.0的包，那个包对应的python为2.7，应该不是属于centos6.x版本默认对应的python，因此那个源码包对于此次安装没有什么意义；

执行完这里的安装之后，前面的authkey应该是能生成了，返回第三步完成剩下的操作吧！

######################

发散：

1.源码包的安装

之前的资源包解压出来以后，跟crmsh有关的有2个

crmsh-1.2.6-4.el6.x86_64.rpm（centos6）

crmsh-3.0.0-6.1.src.rpm（源码包，适用于任意系统）

yum install -y rpm-build

rpmbuild --rebuild --clean crmsh-3.0.0-6.1.src.rpm

（中间可能会提示某些python依赖包需要安装）

...

安装完成后会在当前目录下生成一个rpmbuild的文件夹

cd rpmbuild/RPMS/noarch;ll

...

-rw-r--r-- 1 root root 795920 Feb 1 09:11 crmsh-3.0.0-6.1.noarch.rpm

-rw-r--r-- 1 root root 92928 Feb 1 09:11 crmsh-scripts-3.0.0-6.1.noarch.rpm

-rw-r--r-- 1 root root 220784 Feb 1 09:11 crmsh-test-3.0.0-6.1.noarch.rpm

3.0的资源需要python2.7以上的版本，然而python2.7不支持现有的yum..... 所以这就非常尴尬了

2.安装过程中遇到的奇怪的缺少python包的报错

Error: Package: pssh-1.4.3-1.noarch (/pssh-1.4.3-1.noarch)

Requires: python(abi) = 2.5

解决方法是：yum install python25 -y

千万不要很傻很天真的 yum install python(abi) -y，python版本很多，每个版本对应的包感觉并不存在上下兼容的情况，所以要安装特定版本的python

######################

6.启动corosync

在前面的corosync.conf里面，已经把pacemaker进行了配置，因此这里不需要单独去启动pacemaker了

Master01端

service corosync start

cibadmin --modify --xml-text '<cib validate-with="pacemaker-1.2"/>' #后续会对这个进行说明，在节点1执行就OK；节点2不需要执行

ssh Master02 service corosync start #完成节点2的启动

#######################################

安装过程中遇到的报错

1.没有配置authkey就启动corosync

Starting Corosync Cluster Engine (corosync): [FAILED]

cat /var/log/cluster/corosync.log

...

Jan 30 15:35:02 corosync [MAIN ] Could not open /etc/corosync/authkey: No such file or directory

2.在corosync.conf里面没有配置pacemaker的相关字段，会发现service pacemaker start是可以启动的，但是如果先启动corosync会出现pacemaker启动失败的情况，具体原因没有研究，配置参照我前面的corosync.conf即可

3.检查服务是否启动成功的命令

检查corosync引擎是否启动成功

grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

检查初始化节点通知是否发送成功

grep TOTEM /var/log/cluster/corosync.log

检查pacemaker是否启动正常

grep pcmk_startup /var/log/cluster/corosync.log

检查是否有报错

grep ERROR: /var/log/cluster/corosync.log

Jan 30 17:26:37 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Jan 30 17:26:37 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Jan 30 17:26:38 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=6015, rc=100)

这个错误可以忽略

#######################################

在后面执行crm相关命令的时候，还会遇到一个版本检查的报错，

ERROR: CIB not supported: validator 'pacemaker-2.5', release '3.0.10'

ERROR: You may try the upgrade command

因此我们在进行crm配置之前先解决这个错误

cibadmin --modify --xml-text '<cib validate-with="pacemaker-1.2"/>'

7.crm配置

这个才算是心跳配置的难点，依葫芦画瓢吧

[root@Master01 RPMS]# crm

crm(live)# configure

crm(live)configure# property stonith-enabled=false

crm(live)configure# property no-quorum-policy=ignore

crm(live)configure# rsc_defaults resource-stickiness=100

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

node Master01

node Master02

property $id="cib-bootstrap-options" \

have-watchdog="false" \

dc-version="1.1.15-5.el6-e174ec8" \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes="1" \

stonith-enabled="false" \

no-quorum-policy="ignore"

rsc_defaults $id="rsc-options" \

resource-stickiness="100"

资源配置

[root@Master01 ~]# crm

crm(live)# configure

定义nfs

crm(live)configure# primitive mynfs ocf:heartbeat:Filesystem params device="192.168.40.110:/data" directory="/data" fstype="nfs" op start timeout=60s op stop timeout=60s

crm(live)configure# verify

crm(live)configure# commit

定义vip

crm(live)configure# primitive myvip ocf:heartbeat:IPaddr params ip="192.168.40.150" op monitor interval=20 timeout=20 on-fail=restart

crm(live)configure# verify

crm(live)configure# commit

定义mysql

crm(live)configure# primitive myserver lsb:mysqld op monitor interval=20 timeout=20 on-fail=restart

crm(live)configure# verify

crm(live)configure# commit

配置约束

crm(live)configure# colocation myserver_with_mynfs inf: myserver mynfs

配置启动顺序

crm(live)configure# order mynfs_before_myserver mandatory: mynfs:start myserver:start

继续配置约束

crm(live)configure# colocation myvip_with_myserver inf: myvip myserver

配置启动顺序

crm(live)configure# order myvip_before_myserver mandatory: myvip myserver

crm(live)configure# verify

crm(live)configure# commit

顺利的话，配置完这一串就算大功告成，用crm_mon看一下，让3个资源都在一个节点上启动

如果有资源在node2上面启动，可以到node2上面执行

crm node standby

crm node online

这样操作之后，所有资源应该都会集中到一个节点上

理想的状态应该是这样

[root@Master01 ~]# crm status

Stack: classic openais (with plugin)

Current DC: Master01 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Fri Feb 2 01:25:05 2018 Last change: Fri Feb 2 01:23:35 2018 by root via cibadmin on Master01

, 2 expected votes

2 nodes and 3 resources configured

Online: [ Master01 Master02 ]

Active resources:

myvip (ocf::heartbeat:IPaddr): Started Master01

mynfs (ocf::heartbeat:Filesystem): Started Master01

myserver (lsb:mysqld): Started Master01

这时候你会发现，mysql已经自己启动起来了；如果你执行crm node standby

节点1会话会被中断，你打开节点2，执行crm_mon会发现这时候3个资源都飘到了Master02上，而且mysql也自动运行了；

这就是为什么mysql一定不能设定为开机启动的原因，后续mysql在哪个节点上，都交给corosync来进行调配了！

###################################

遇到的问题以及一些理解

1.启动了防火墙并且没有做相应的配置容易出现下面的错误（这时候还只配置完一个资源）

[root@Master01 ~]# crm status

Stack: classic openais (with plugin)

Current DC: Master01 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Thu Feb 1 11:25:52 2018 Last change: Thu Feb 1 11:13:23 2018 by root via cibadmin on Master01

, 1 expected votes

2 nodes and 1 resource configured

Online: [ Master01 ]

OFFLINE: [ Master02 ]

Active resources:

myip (ocf::heartbeat:IPaddr): Started Master01

2.查看状态，ip在Master01，NFS在Master02，想切换到一个节点上

[root@Master01 ~]# crm status

Stack: classic openais (with plugin)

Current DC: Master02 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Thu Feb 1 14:50:50 2018 Last change: Thu Feb 1 14:50:21 2018 by root via cibadmin on Master01

, 2 expected votes

2 nodes and 2 resources configured

Online: [ Master01 Master02 ]

Active resources:

myip (ocf::heartbeat:IPaddr): Started Master01

mynfs (ocf::heartbeat:Filesystem): Started Master02

切换到Master02上

[root@Master02 ~]# crm node standby

切回Master01

[root@Master01 ~]# crm status

Stack: classic openais (with plugin)

Current DC: Master02 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Thu Feb 1 14:51:34 2018 Last change: Thu Feb 1 14:51:29 2018 by root via crm_attribute on Master02

, 2 expected votes

2 nodes and 2 resources configured

Node Master02: standby

Online: [ Master01 ]

Active resources:

myip (ocf::heartbeat:IPaddr): Started Master01

mynfs (ocf::heartbeat:Filesystem): Started Master01

3.服务挂掉的情况

[root@Master01 ~]# crm status

Stack: classic openais (with plugin)

Current DC: Master01 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Fri Feb 2 01:25:05 2018 Last change: Fri Feb 2 01:23:35 2018 by root via cibadmin on Master01

, 2 expected votes

2 nodes and 3 resources configured

Online: [ Master01 Master02 ]

Active resources:

myvip (ocf::heartbeat:IPaddr): Started Master01

mynfs (ocf::heartbeat:Filesystem): Started Master01

myserver (lsb:mysqld): Started Master01

Failed Actions:

* myserver_start_0 on Master02 'unknown error' (1): call=28, status=complete, exitreason='none',

last-rc-change='Fri Feb 2 01:23:35 2018', queued=0ms, exec=3182ms

这个报错是因为Master02的corosync挂了，重启一下就正常了，大功告成！

4.关于VIP

以前搭RAC的时候，除了公有IP以外，还有私IP和VIP的概念，导致这次组建的时候一直受到困扰到底需不需要额外准备一张网卡

VIP为虚拟IP（漂移IP），并不需要额外的网卡对其进行配置，这个IP就是通过软件虚出来的；不要和私IP搞混了

另外，检测方法有的文档上写的是ifconfig，然而我这边实际测试发现配置完成，VIP资源启动后通过ifconfig并不能发现VIP

查看VIP的方法为：ip addr show

5.关于mysql双机

关于mysql双机，之前一直以为是同oracle的rac一样2个节点同时运行，现在发现corosync是用来做判定的，其实只有一个节点是活着的，一个standby以后，另一个顶上继续用而已

在NFS观察发现一个节点standby之后大约3秒，另一个节点会接管这个共享存储，生成自己的pid

###################################

后续：

6.开机配置

iptables，corosync以及ntpdate配置为开启，mysqld配置为关闭

pacemaker因为配置在我的corosync里面了，所以我这里配置成了关闭

corosync 0:off 1:off 2:on 3:on 4:on 5:on 6:off

iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off

mysqld 0:off 1:off 2:off 3:off 4:off 5:off 6:off

ntpdate 0:off 1:off 2:on 3:on 4:on 5:on 6:off

pacemaker 0:off 1:off 2:off 3:off 4:off 5:off 6:off

儿

小型架构实践--Mysql双主+corosync+NFS

猜你喜欢

热点阅读