Postgres-XL 安装

2018-10-28  本文已影响0人  CocoAdapter

标准架构模式

结构:
标准情况下,GTM 和 GTM Slave 装在两台主机上;GTM-proxy,Coordinator,Datanode 装在一台主机上,两两之间互设Datanode slave;或分散Datanode 和 Datanode slave到不同的主机上以支持更好的HA,降低I/O压力。
HA:
Postgres-XL目前没有在内部实现自动主备切换,如果出现异常,需要手动切换。可以通过外力实现自动。(待研究)
硬件需求:
GTM 不需要太多I/O,主要是一些CPU和内存资源(具体压力待测)。Coordinator处理SQL查询,分发和汇总,需要CPU和内存资源。Datanode主要是I/O操作。

实验环境

实验环境为三台主机,CentOS 7.X,两台为虚拟机;一台为阿里云,用作动态添加删除结点测试。
配置如下:


虚拟机
阿里云

开始搭建

1 VMware 网络设置

打开VMware的虚拟网络编辑器,查看NAT模式下的几个关键信息:

网关信息
IP信息

2 安装CentOS

VMware支持复制虚拟机,所以我们先只安装好一台主机的环境,复制,再做修改即可。

2.1 安装选项

不需要GUI,选择最小安装即可,缺什么之后再下载。设置root的密码,不添加额外的用户。

2.2 虚拟机配置网络

一般情况下,网络都是不通的,就算通的也需要配置。CentOS7.X中改用ip命令,ifconfig可能直接没安装了;查看IP地址命令如下:

// root 用户
ip address

这个时候可能是DHCP得到的IP或没有IP,这个时候主要是记住自己的网卡是哪块,可能不一样,如:


VMware
阿里云

然后,修改配置文件

// root 用户
vi /etc/sysconfig/network-scripts/ifcfg-网卡名字

// 修改或添加以下内容,内容取决于虚拟网络编辑器中的内容,DNS这里用的阿里云的
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.209.131
NETMASK=255.255.255.0
GATEWAY=192.168.209.2
DNS1=223.5.5.5
DNS2=223.6.6.6

然后再重启网络

// root 用户
service network restart

2.3 防火墙设置

如果可以,当然是定义规则来处理网络连接。不然,也可以直接关闭。

// root 用户
systemctl stop firewalld.service
systemctl disable firewalld.service

// 设置selinux
vi /etc/selinux/config
// 设置SELINUX=disabled
SELINUX=disabled

2.4 修改hosts文件

根据集群规划,修改hosts文件

// root用户
vi /etc/hosts
// 添加下面的内容
192.168.209.131 node1
192.168.209.132 node2
xxx.xxx.xxx.xxx aliyun1

2.5 创建用户, 配置ssh

// root用户
adduser pgxl
passwd pgxl
// 输入密码

可以考虑给pgxl用户sudo权限,命令如下

// root用户
sudo visudo
// 添加
%pgxl ALL=(ALL) ALL

切换成pgxl用户,需要提权加sudo,下述命令除特别说明,都在pgxl用户下执行。

可能ssh都没安装,没装的话先装了。精简版CentOS很多常用软件包都没有,比如rsync, netstat, telnet 等等,检查是否安装,没有安装最好装了。

配置免密登录

su pgxl
mkdir ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
// 虚拟机暂时不执行下面这条命令,复制后再执行。
// 其实复制后因为keys完全一样,也可以不执行,这里还是给出通用正常部署流程。
// nodex是主机名,需要存在于hosts文件中
scp -p ~/.ssh/authorized_keys pgxl@nodex:~/.ssh/

3 安装Postgres-XL

3.1 下载安装

yum update
yum install gcc gcc-c++ kernel-devel readline-devel flex bison bison-devel zlib zlib-devel make docbook-style-dsssl jade
// 自行下载一个 Postgres-XL的源码包并解压,假设解压后文件夹名为postgres-xl
cd postgres-xl
./configure --prefix=/usr/local/pgxl/
make
sudo make install

3.2 修改环境变量

严格来说,应该修改pgxl的bashrc,但是如果报一堆 bash: xx command not found 之类的错误又找不到原因的话,可以直接写入 /etc/environment

3.3 生成并修改配置文件

pgxc_ctl 可以以交互模式运行,也可以接受参数

pgxc_ctl prepare

编辑pgxc_ctl.conf,有更改的地方如下:

#---- GTM ------------------------------------------------------------------------------------
 
# GTM is mandatory.  You must have at least (and only) one GTM master in your Postgres-XC cluster.
# If GTM crashes and you need to reconfigure it, you can do it by pgxc_update_gtm command to update
# GTM master with others.   Of course, we provide pgxc_remove_gtm command to remove it.  This command
# will not stop the current GTM.  It is up to the operator.
 
#---- Overall -------
gtmName=gtm
 
#---- GTM Master -----------------------------------------------
 
#---- Overall ----
gtmMasterServer=node1
gtmMasterPort=20001
gtmMasterDir=/data1/pgxc/nodes/gtm
 
#---- Configuration ---
gtmExtraConfig=none         # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none   # Will be added to Master's gtm.conf (done at initialization only)
 
#---- GTM Slave -----------------------------------------------
 
# Because GTM is a key component to maintain database consistency, you may want to configure GTM slave
# for backup.
 
#---- Overall ------
gtmSlave=y                  # Specify y if you configure GTM Slave.   Otherwise, GTM slave will not be configured and
                            # all the following variables will be reset.
gtmSlaveServer=node2        # value none means GTM slave is not available.  Give none if you don't configure GTM Slave.
gtmSlavePort=20001          # Not used if you don't configure GTM slave.
gtmSlaveDir=/data1/pgxc/nodes/gtm   # Not used if you don't configure GTM slave.
# Please note that when you have GTM failover, then there will be no slave available until you configure the slave
# again. (pgxc_add_gtm_slave function will handle it)
 
#---- Configuration ----
gtmSlaveSpecificExtraConfig=none # Will be added to Slave's gtm.conf (done at initialization only)
 
#---- GTM Proxy -------------------------------------------------------------------------------------------------------
# GTM proxy will be selected based upon which server each component runs on.
# When fails over to the slave, the slave inherits its master's gtm proxy.  It should be
# reconfigured based upon the new location.
#
# To do so, slave should be restarted.   So pg_ctl promote -> (edit postgresql.conf and recovery.conf) -> pg_ctl restart
#
# You don't have to configure GTM Proxy if you dont' configure GTM slave or you are happy if every component connects
# to GTM Master directly.  If you configure GTL slave, you must configure GTM proxy too.
 
#---- Shortcuts ------
gtmProxyDir=/data1/pgxc/nodes/gtm_pxy
 
#---- Overall -------
gtmProxy=y              # Specify y if you conifugre at least one GTM proxy.   You may not configure gtm proxies
                        # only when you dont' configure GTM slaves.
                        # If you specify this value not to y, the following parameters will be set to default empty values.
                        # If we find there're no valid Proxy server names (means, every servers are specified
                        # as none), then gtmProxy value will be set to "n" and all the entries will be set to
                        # empty values.
gtmProxyNames=(gtm_pxy1 gtm_pxy2)   # No used if it is not configured
gtmProxyServers=(node1 node2)           # Specify none if you dont' configure it.
gtmProxyPorts=(20002 20002)             # Not used if it is not configured.
gtmProxyDirs=($gtmProxyDir $gtmProxyDir)    # Not used if it is not configured.
 
#---- Configuration ----
gtmPxyExtraConfig=none      # Extra configuration parameter for gtm_proxy.  Coordinator section has an example.
gtmPxySpecificExtraConfig=(none none)
 
#---- Coordinators ----------------------------------------------------------------------------------------------------
 
#---- shortcuts ----------
coordMasterDir=/data1/pgxc/nodes/coord
coordSlaveDir=/data1/pgxc/nodes/coord_slave
coordArchLogDir=/data1/pgxc/nodes/coord_archlog
 
#---- Overall ------------
coordNames=(coord1 coord2)      # Master and slave use the same name
coordPorts=(20004 20004)            # Master and slave use the same port
poolerPorts=(20010 20010)           # Master and slave use the same pooler port
coordPgHbaEntries=(192.168.209.0/24)               # Assumes that all the coordinator (master/slave) accepts
                                                # the same connection
                                                # This entry allows only $pgxcOwner to connect.
                                                # If you'd like to setup another connection, you should
                                                # supply these entries through files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using coordExtraPgHba
# and/or coordSpecificExtraPgHba variables.
 
#---- Master -------------
coordMasterServers=(node1 node2)        # none means this master is not available
coordMasterDirs=($coordMasterDir $coordMasterDir)
coordMaxWALsernder=5    # max_wal_senders: needed to configure slave. If zero value is specified,
                        # it is expected to supply this parameter explicitly by external files
                        # specified in the following.   If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder)
                        # max_wal_senders configuration for each coordinator.
 
#---- Slave -------------
coordSlave=n 
 
#---- Configuration files---
# Need these when you'd like setup specific non-default configuration 
# These files will go to corresponding files for the master.
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries 
# Or you may supply these files manually.
coordExtraConfig=coordExtraConfig   # Extra configuration file for coordinators.  
                        # This file will be added to all the coordinators'
                        # postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 100
EOF
 
# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none none none)
coordExtraPgHba=none    # Extra entry for pg_hba.conf.  This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none none none)
 
#----- Additional Slaves -----
#
# Please note that this section is just a suggestion how we extend the configuration for
# multiple and cascaded replication.   They're not used in the current version.
#
coordAdditionalSlaves=n     # Additional slave can be specified as follows: where you
coordAdditionalSlaveSet=(cad1)      # Each specifies set of slaves.   This case, two set of slaves are
                                            # configured
cad1_Sync=n             # All the slaves at "cad1" are connected with asynchronous mode.
                            # If not, specify "y"
                            # The following lines specifies detailed configuration for each
                            # slave tag, cad1.  You can define cad2 similarly.
cad1_Servers=(node08 node09 node06 node07)  # Hosts
cad1_dir=/data1/pgxc/nodes/coord_slave_cad1
cad1_Dirs=($cad1_dir $cad1_dir $cad1_dir $cad1_dir)
cad1_ArchLogDir=/data1/pgxc/nodes/coord_archlog_cad1
cad1_ArchLogDirs=($cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir)
 
 
#---- Datanodes -------------------------------------------------------------------------------------------------------
 
#---- Shortcuts --------------
datanodeMasterDir=/data1/pgxc/nodes/dn_master
datanodeSlaveDir=/data1/pgxc/nodes/dn_slave
datanodeArchLogDir=/data1/pgxc/nodes/datanode_archlog
 
#---- Overall ---------------
#primaryDatanode=datanode1              # Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node.  Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1               # Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(20008 20009) # Master and slave use the same port!
datanodePoolerPorts=(20012 20013)   # Master and slave use the same port!
datanodePgHbaEntries=(192.168.209.0/24)    # Assumes that all the coordinator (master/slave) accepts
                                        # the same connection
                                        # This list sets up pg_hba.conf for $pgxcOwner user.
                                        # If you'd like to setup other entries, supply them
                                        # through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.
 
#---- Master ----------------
datanodeMasterServers=(node1 node2) # none means this master is not available.
                                                    # This means that there should be the master but is down.
                                                    # The cluster is not operational until the master is
                                                    # recovered and ready to run.   
datanodeMasterDirs=($datanodeMasterDir $datanodeMasterDir)
datanodeMaxWalSender=5                              # max_wal_senders: needed to configure slave. If zero value is 
                                                    # specified, it is expected this parameter is explicitly supplied
                                                    # by external configuration files.
                                                    # If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
                        # max_wal_senders configuration for each datanode
 
#---- Slave -----------------
datanodeSlave=y         # Specify y if you configure at least one coordiantor slave.  Otherwise, the following
                        # configuration parameters will be set to empty values.
                        # If no effective server names are found (that is, every servers are specified as none),
                        # then datanodeSlave value will be set to n and all the following values will be set to
                        # empty values.
datanodeSlaveServers=(node2 node1)  # value none means this slave is not available
datanodeSlavePorts=(20009 20008)
datanodeSlavePoolerPorts=(20013 20012)
datanodeSlaveSync=y     # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)

3.4 初始化

执行集群初始化,初始化完成后会自动start各组件。若报错,集群跑不起来的,需要把error去掉才行。
关键就是,conf文件里的各项配置是否正确,以及环境变量是否正确、工具程序是否安装(缺什么安装什么)。

pgxc_ctl init all

到此为止,集群安装结束。

测试

在任意一个安装了 Coordinator 的节点上,端口号按之前设置的来,PostgreSQL默认端口不是20004,默认数据库与当前用户名一样,所以需要显式指定。因为安装Postgres-XL的时候是pgxl用户,pgxc_ctl已经新建了一个pgxl用户并分配了权限,这些可以在输出日志中看见。

psql -U pgxl -p 20004 -d postgres

登录成功后,查看集群结构

select oid, * from pgxc_node;

创建分布式表,并插入测试数据

create table test (col1 int, col2 text);
insert into test select generate_series(1, 100), 'test';

查看数据分布

SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;

参考

上一篇下一篇

猜你喜欢

热点阅读