Postgres-XL 安装

2018-10-28 本文已影响0人 CocoAdapter

标准架构模式

结构：
标准情况下，GTM 和 GTM Slave 装在两台主机上；GTM-proxy，Coordinator，Datanode 装在一台主机上，两两之间互设Datanode slave；或分散Datanode 和 Datanode slave到不同的主机上以支持更好的HA，降低I/O压力。
HA：
Postgres-XL目前没有在内部实现自动主备切换，如果出现异常，需要手动切换。可以通过外力实现自动。（待研究）
硬件需求：
GTM 不需要太多I/O，主要是一些CPU和内存资源（具体压力待测）。Coordinator处理SQL查询，分发和汇总，需要CPU和内存资源。Datanode主要是I/O操作。

实验环境

实验环境为三台主机，CentOS 7.X，两台为虚拟机；一台为阿里云，用作动态添加删除结点测试。
配置如下：

虚拟机

阿里云

开始搭建

1 VMware 网络设置

打开VMware的虚拟网络编辑器，查看NAT模式下的几个关键信息：

起始IP地址
子网掩码
结束IP地址

网关信息

IP信息

2 安装CentOS

VMware支持复制虚拟机，所以我们先只安装好一台主机的环境，复制，再做修改即可。

2.1 安装选项

不需要GUI，选择最小安装即可，缺什么之后再下载。设置root的密码，不添加额外的用户。

2.2 虚拟机配置网络

一般情况下，网络都是不通的，就算通的也需要配置。CentOS7.X中改用ip命令，ifconfig可能直接没安装了；查看IP地址命令如下：

// root 用户
ip address

这个时候可能是DHCP得到的IP或没有IP，这个时候主要是记住自己的网卡是哪块，可能不一样，如：

VMware

阿里云

然后，修改配置文件

// root 用户
vi /etc/sysconfig/network-scripts/ifcfg-网卡名字

// 修改或添加以下内容，内容取决于虚拟网络编辑器中的内容，DNS这里用的阿里云的
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.209.131
NETMASK=255.255.255.0
GATEWAY=192.168.209.2
DNS1=223.5.5.5
DNS2=223.6.6.6

然后再重启网络

// root 用户
service network restart

2.3 防火墙设置

如果可以，当然是定义规则来处理网络连接。不然，也可以直接关闭。

// root 用户
systemctl stop firewalld.service
systemctl disable firewalld.service

// 设置selinux
vi /etc/selinux/config
// 设置SELINUX=disabled
SELINUX=disabled

2.4 修改hosts文件

根据集群规划，修改hosts文件

// root用户
vi /etc/hosts
// 添加下面的内容
192.168.209.131 node1
192.168.209.132 node2
xxx.xxx.xxx.xxx aliyun1

2.5 创建用户, 配置ssh

// root用户
adduser pgxl
passwd pgxl
// 输入密码

可以考虑给pgxl用户sudo权限，命令如下

// root用户
sudo visudo
// 添加
%pgxl ALL=(ALL) ALL

切换成pgxl用户，需要提权加sudo，下述命令除特别说明，都在pgxl用户下执行。

可能ssh都没安装，没装的话先装了。精简版CentOS很多常用软件包都没有，比如rsync, netstat, telnet 等等，检查是否安装，没有安装最好装了。

配置免密登录

su pgxl
mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
// 虚拟机暂时不执行下面这条命令，复制后再执行。
// 其实复制后因为keys完全一样，也可以不执行，这里还是给出通用正常部署流程。
// nodex是主机名，需要存在于hosts文件中
scp -p ~/.ssh/authorized_keys pgxl@nodex:~/.ssh/

3 安装Postgres-XL

3.1 下载安装

yum update
yum install gcc gcc-c++ kernel-devel readline-devel flex bison bison-devel zlib zlib-devel make docbook-style-dsssl jade
// 自行下载一个 Postgres-XL的源码包并解压，假设解压后文件夹名为postgres-xl
cd postgres-xl
./configure --prefix=/usr/local/pgxl/
make
sudo make install

3.2 修改环境变量

严格来说，应该修改pgxl的bashrc，但是如果报一堆 bash: xx command not found 之类的错误又找不到原因的话，可以直接写入 /etc/environment

3.3 生成并修改配置文件

pgxc_ctl 可以以交互模式运行，也可以接受参数

pgxc_ctl prepare

编辑pgxc_ctl.conf，有更改的地方如下：

#---- GTM ------------------------------------------------------------------------------------
 
# GTM is mandatory.  You must have at least (and only) one GTM master in your Postgres-XC cluster.
# If GTM crashes and you need to reconfigure it, you can do it by pgxc_update_gtm command to update
# GTM master with others.   Of course, we provide pgxc_remove_gtm command to remove it.  This command
# will not stop the current GTM.  It is up to the operator.
 
#---- Overall -------
gtmName=gtm
 
#---- GTM Master -----------------------------------------------
 
#---- Overall ----
gtmMasterServer=node1
gtmMasterPort=20001
gtmMasterDir=/data1/pgxc/nodes/gtm
 
#---- Configuration ---
gtmExtraConfig=none         # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none   # Will be added to Master's gtm.conf (done at initialization only)
 
#---- GTM Slave -----------------------------------------------
 
# Because GTM is a key component to maintain database consistency, you may want to configure GTM slave
# for backup.
 
#---- Overall ------
gtmSlave=y                  # Specify y if you configure GTM Slave.   Otherwise, GTM slave will not be configured and
                            # all the following variables will be reset.
gtmSlaveServer=node2        # value none means GTM slave is not available.  Give none if you don't configure GTM Slave.
gtmSlavePort=20001          # Not used if you don't configure GTM slave.
gtmSlaveDir=/data1/pgxc/nodes/gtm   # Not used if you don't configure GTM slave.
# Please note that when you have GTM failover, then there will be no slave available until you configure the slave
# again. (pgxc_add_gtm_slave function will handle it)
 
#---- Configuration ----
gtmSlaveSpecificExtraConfig=none # Will be added to Slave's gtm.conf (done at initialization only)
 
#---- GTM Proxy -------------------------------------------------------------------------------------------------------
# GTM proxy will be selected based upon which server each component runs on.
# When fails over to the slave, the slave inherits its master's gtm proxy.  It should be
# reconfigured based upon the new location.
#
# To do so, slave should be restarted.   So pg_ctl promote -> (edit postgresql.conf and recovery.conf) -> pg_ctl restart
#
# You don't have to configure GTM Proxy if you dont' configure GTM slave or you are happy if every component connects
# to GTM Master directly.  If you configure GTL slave, you must configure GTM proxy too.
 
#---- Shortcuts ------
gtmProxyDir=/data1/pgxc/nodes/gtm_pxy
 
#---- Overall -------
gtmProxy=y              # Specify y if you conifugre at least one GTM proxy.   You may not configure gtm proxies
                        # only when you dont' configure GTM slaves.
                        # If you specify this value not to y, the following parameters will be set to default empty values.
                        # If we find there're no valid Proxy server names (means, every servers are specified
                        # as none), then gtmProxy value will be set to "n" and all the entries will be set to
                        # empty values.
gtmProxyNames=(gtm_pxy1 gtm_pxy2)   # No used if it is not configured
gtmProxyServers=(node1 node2)           # Specify none if you dont' configure it.
gtmProxyPorts=(20002 20002)             # Not used if it is not configured.
gtmProxyDirs=($gtmProxyDir $gtmProxyDir)    # Not used if it is not configured.
 
#---- Configuration ----
gtmPxyExtraConfig=none      # Extra configuration parameter for gtm_proxy.  Coordinator section has an example.
gtmPxySpecificExtraConfig=(none none)
 
#---- Coordinators ----------------------------------------------------------------------------------------------------
 
#---- shortcuts ----------
coordMasterDir=/data1/pgxc/nodes/coord
coordSlaveDir=/data1/pgxc/nodes/coord_slave
coordArchLogDir=/data1/pgxc/nodes/coord_archlog
 
#---- Overall ------------
coordNames=(coord1 coord2)      # Master and slave use the same name
coordPorts=(20004 20004)            # Master and slave use the same port
poolerPorts=(20010 20010)           # Master and slave use the same pooler port
coordPgHbaEntries=(192.168.209.0/24)               # Assumes that all the coordinator (master/slave) accepts
                                                # the same connection
                                                # This entry allows only $pgxcOwner to connect.
                                                # If you'd like to setup another connection, you should
                                                # supply these entries through files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using coordExtraPgHba
# and/or coordSpecificExtraPgHba variables.
 
#---- Master -------------
coordMasterServers=(node1 node2)        # none means this master is not available
coordMasterDirs=($coordMasterDir $coordMasterDir)
coordMaxWALsernder=5    # max_wal_senders: needed to configure slave. If zero value is specified,
                        # it is expected to supply this parameter explicitly by external files
                        # specified in the following.   If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder)
                        # max_wal_senders configuration for each coordinator.
 
#---- Slave -------------
coordSlave=n 
 
#---- Configuration files---
# Need these when you'd like setup specific non-default configuration 
# These files will go to corresponding files for the master.
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries 
# Or you may supply these files manually.
coordExtraConfig=coordExtraConfig   # Extra configuration file for coordinators.  
                        # This file will be added to all the coordinators'
                        # postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 100
EOF
 
# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none none none)
coordExtraPgHba=none    # Extra entry for pg_hba.conf.  This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none none none)
 
#----- Additional Slaves -----
#
# Please note that this section is just a suggestion how we extend the configuration for
# multiple and cascaded replication.   They're not used in the current version.
#
coordAdditionalSlaves=n     # Additional slave can be specified as follows: where you
coordAdditionalSlaveSet=(cad1)      # Each specifies set of slaves.   This case, two set of slaves are
                                            # configured
cad1_Sync=n             # All the slaves at "cad1" are connected with asynchronous mode.
                            # If not, specify "y"
                            # The following lines specifies detailed configuration for each
                            # slave tag, cad1.  You can define cad2 similarly.
cad1_Servers=(node08 node09 node06 node07)  # Hosts
cad1_dir=/data1/pgxc/nodes/coord_slave_cad1
cad1_Dirs=($cad1_dir $cad1_dir $cad1_dir $cad1_dir)
cad1_ArchLogDir=/data1/pgxc/nodes/coord_archlog_cad1
cad1_ArchLogDirs=($cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir)
 
 
#---- Datanodes -------------------------------------------------------------------------------------------------------
 
#---- Shortcuts --------------
datanodeMasterDir=/data1/pgxc/nodes/dn_master
datanodeSlaveDir=/data1/pgxc/nodes/dn_slave
datanodeArchLogDir=/data1/pgxc/nodes/datanode_archlog
 
#---- Overall ---------------
#primaryDatanode=datanode1              # Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node.  Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1               # Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(20008 20009) # Master and slave use the same port!
datanodePoolerPorts=(20012 20013)   # Master and slave use the same port!
datanodePgHbaEntries=(192.168.209.0/24)    # Assumes that all the coordinator (master/slave) accepts
                                        # the same connection
                                        # This list sets up pg_hba.conf for $pgxcOwner user.
                                        # If you'd like to setup other entries, supply them
                                        # through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.
 
#---- Master ----------------
datanodeMasterServers=(node1 node2) # none means this master is not available.
                                                    # This means that there should be the master but is down.
                                                    # The cluster is not operational until the master is
                                                    # recovered and ready to run.   
datanodeMasterDirs=($datanodeMasterDir $datanodeMasterDir)
datanodeMaxWalSender=5                              # max_wal_senders: needed to configure slave. If zero value is 
                                                    # specified, it is expected this parameter is explicitly supplied
                                                    # by external configuration files.
                                                    # If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
                        # max_wal_senders configuration for each datanode
 
#---- Slave -----------------
datanodeSlave=y         # Specify y if you configure at least one coordiantor slave.  Otherwise, the following
                        # configuration parameters will be set to empty values.
                        # If no effective server names are found (that is, every servers are specified as none),
                        # then datanodeSlave value will be set to n and all the following values will be set to
                        # empty values.
datanodeSlaveServers=(node2 node1)  # value none means this slave is not available
datanodeSlavePorts=(20009 20008)
datanodeSlavePoolerPorts=(20013 20012)
datanodeSlaveSync=y     # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)

3.4 初始化

执行集群初始化，初始化完成后会自动start各组件。若报错，集群跑不起来的，需要把error去掉才行。
关键就是，conf文件里的各项配置是否正确，以及环境变量是否正确、工具程序是否安装（缺什么安装什么)。

pgxc_ctl init all

到此为止，集群安装结束。

测试

在任意一个安装了 Coordinator 的节点上，端口号按之前设置的来，PostgreSQL默认端口不是20004，默认数据库与当前用户名一样，所以需要显式指定。因为安装Postgres-XL的时候是pgxl用户，pgxc_ctl已经新建了一个pgxl用户并分配了权限，这些可以在输出日志中看见。

psql -U pgxl -p 20004 -d postgres

登录成功后，查看集群结构

select oid, * from pgxc_node;

创建分布式表，并插入测试数据

create table test (col1 int, col2 text);
insert into test select generate_series(1, 100), 'test';

查看数据分布

SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;

参考