001.基于阿里云CentOS7.6搭建CM6.3大数据平台
1. 集群规划以及配置说明
操作系统 | 主机名 | 内网IP | 内存 | CPU | 角色 | 系统盘容量 | 数据盘容量 | 数据盘挂载点 |
---|---|---|---|---|---|---|---|---|
CentOS-7.6 | node01 | 172.26.0.53 | 16GB | 4核 | 管理节点 | 50GB | 100GB | /data |
CentOS-7.6 | node02 | 172.26.0.50 | 16GB | 4核 | 数据节点 | 40GB | 100GB | /data |
CentOS-7.6 | node03 | 172.26.0.51 | 16GB | 4核 | 数据节点 | 40GB | 100GB | /data |
CentOS-7.6 | node04 | 172.26.0.52 | 16GB | 4核 | 数据节点 | 40GB | 100GB | /data |
2. 系统环境准备
2.1 生产环境的磁盘配置要求
- 系统盘建议做RAID1,容量建议200G以上,并且做LVM逻辑卷,这样可以动态调整系统盘空间大小,CM安装在系统盘
- 管理节点的数据盘做RAID5,管理节点的数据都放在数据盘中
- 数据节点的数据盘做RAID0(一块盘做RAID0,硬件RAID),文件格式为xfs,并配置noatime,不做LVM,最好是同构
2.2 网络配置
-
确保没有启用IPV6,所有节点同步
编辑
/etc/sysctl.conf
文件:# 禁用IPV6 net.ipv6.conf.all.disable_ipv6= 1 net.ipv6.conf.default.disable_ipv6= 1 net.ipv6.conf.lo.disable_ipv6= 1
修改后执行
sysctl -p
命令;/etc/sysconfig/network
文件中新增:NETWORKING_IPV6=no IPV6INIT=no
-
主机名配置
[root@node01 ~]# cat /etc/hostname node01 [root@node02 ~]# cat /etc/hostname node02 [root@node03 ~]# cat /etc/hostname node03 [root@node04 ~]# cat /etc/hostname node04
-
/etc/hosts
文件设置,所有节点同步172.26.0.53 node01 172.26.0.50 node02 172.26.0.51 node03 172.26.0.52 node04
2.3 SSH免密钥登陆配置
设置从node01远程登录到其他3个机器免密钥
[root@node01 ~]# ssh-keygen
Generating public/private rsa key pair.
# 直接Enter
Enter file in which to save the key (/root/.ssh/id_rsa):
# 直接Enter
Enter passphrase (empty for no passphrase):
# 直接Enter
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:wjErOh9znaB9DV6zQkFqvIWgAqABRI9XywCefACA+p4 root@node01
The key's randomart image is:
+---[RSA 2048]----+
|/=.... . |
|*.=.+o.+ |
|+=.+ oB o |
|..o o * . |
| . . * S o |
| .. + * = o |
| .o.+ o * o |
| Eo + . . |
| . |
+----[SHA256]-----+
[root@node01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'node01 (172.26.0.53)' can't be established.
ECDSA key fingerprint is SHA256:RjnwwbdyitVDZL8ZBDSIchP6NIzcUgvnd+jItwp3D00.
ECDSA key fingerprint is MD5:8f:ea:10:8f:cf:3d:83:e2:e9:cc:af:ec:70:bf:1c:af.
# 输入yes
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
# 输入root用户的密码
root@node01's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@node01'"
and check to make sure that only the key(s) you wanted were added.
[root@node01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node02
[root@node01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node03
[root@node01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node04
2.4 禁用SELINUX
阿里云服务器已经禁用了SELINUX,无需操作,如需操作,所有节点同步
# 禁用SELINUX
setenforce 0
# 查看SELINUX是否开启
[root@node01 ~]# getenforce
Disabled
2.5 禁用防火墙
阿里云服务器已经禁用了防火墙,无需操作,如需操作,所有节点同步
systemctl stop firewalld
systemctl disable firewalld
[root@node01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
2.6 禁用交换分区
所有节点同步
# 首先临时禁用交换分区
echo 1 > /proc/sys/vm/swappiness
# 然后修改/etc/sysctl.conf文件
vm.swappiness = 1
# 然后执行sysctl -p命令
2.7 设置透明大页面
所有节点同步
# 首先执行以下两条命令
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# 给此文件赋予执行权限
chmod +x /etc/rc.d/rc.local
# 在rc.local文件中增加如下内容
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
2.8 集群时间同步
-
确保集群所有机器都在上海时区
[root@node01 ~]# timedatectl Local time: Thu 2020-06-04 16:18:43 CST Universal time: Thu 2020-06-04 08:18:43 UTC RTC time: Thu 2020-06-04 16:18:42 Time zone: Asia/Shanghai (CST, +0800) NTP enabled: yes NTP synchronized: yes RTC in local TZ: yes DST active: n/a # 如需设置,命令如下 timedatectl set-timezone Asia/Shanghai
-
所有节点安装NTP服务
yum -y install ntp
-
NTP服务端配置
选择node01机器作为NTP服务端,修改
/etc/ntp.conf
文件注意,阿里云服务器在使用yum安装好NTP服务后,默认设置去阿里云时间服务器同步时间,配置基本不需要改,确认以下配置即可:
# 当外部时间不可⽤时,可使⽤本地硬件时间 server 127.127.1.0 fudge 127.127.1.0 stratum 10 # 允许哪些⽹段的机器来同步时间,设置为集群的内网网段和子网掩码 # 使用ifconfig查看机器的内网网段和子网掩码 restrict 172.26.0.0 mask 255.255.240.0 nomodify notrap nopeer noquery
而物理机服务器的默认配置是去CentOS的时间服务器同步时间吗,除了设置好以上内容外,还需要修改时间服务地址为国内的地址,例如:
# 注释掉CentOS默认的时间服务器地址 # server 0.centos.pool.ntp.org iburst # server 1.centos.pool.ntp.org iburst # server 2.centos.pool.ntp.org iburst # server 3.centos.pool.ntp.org iburst # 替换为国内的地址,例如中国国家授时中心服务器地址 # 如需多个国内地址,可自行搜索 server cn.pool.ntp.org iburst
-
NTP客户端配置
集群中的其他节点,即node02-node04,修改
/etc/ntp.conf
文件,以下是客户端配置文件的全部内容:driftfile /var/lib/ntp/drift pidfile /var/run/ntpd.pid logfile /var/log/ntp.log restrict 172.26.0.53 nomodify notrap nopeer noquery server 172.26.0.53 iburst minpoll 4 maxpoll 10 server 127.127.1.0 fudge 127.127.1.0 stratum 10
-
所有节点启动NTP服务
systemctl start ntpd systemctl enable ntpd
-
在所有客户端查看同步状态
# 可以看到*才代表正常,*代表当前正在从那个服务器同步时间 # 客户端NTP启动之后,可能需要一段时间才能看到以下现象 [root@node02 ~]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== *node01 100.100.61.88 2 u 3 64 17 0.153 -31.350 6.619 LOCAL(0) .LOCL. 10 l 157 64 4 0.000 0.000 0.000
3. 基础软件准备
以下是官网列出的CM6.x支持的硬件配置、操作系统以及数据库等的说明:
3.1 JDK
所有节点安装JDK1.8,注意:
JDK 8u40, 8u45, 8u60, and 8u242 are not supported due to JDK issues impacting CDH functionality
,官网测试过的最新可用的Oracle JDK1.8为8u181,另外,把JDK安装在/usr/java
目录下,我的JAVA_HOME
是/usr/java/jdk
# 这里有个坑,解压之后的所有者和所属组变成了数字
# 应该手动修改为root或者你自己的用户
# 其他的安装包解压后也可能出现了类似的情况,都需要手动处理
[root@node01 java]# ll
total 4
lrwxrwxrwx 1 root root 12 Jun 4 18:56 jdk -> jdk1.8.0_181
drwxr-xr-x 7 10 143 4096 Jul 7 2018 jdk1.8.0_181
# 全部节点执行
chown -R root:root /usr/java
3.2 安装httpd
我的环境中,我把安装CM所需要的其他单节点软件都安装到了node01上
yum -y install httpd
systemctl start httpd
systemctl enable httpd
3.3 安装MySQL-5.7
说明:以上文章是我写的一篇安装MySQL-5.7的入门文章,其中不包含生产环境的复杂配置,由于这里安装的数据库只是用做CM集群的元数据管理,并不牵扯到业务数据,所以简单搭建即可,我选择安装在node01节点
创建CM相关表:
create database am default character set utf8;
create database cm default character set utf8;
create database rm default character set utf8;
create database hue default character set utf8;
create database hive default character set utf8;
create database oozie default character set utf8;
create database nav_as default character set utf8;
create database nav_ms default character set utf8;
create database sentry default character set utf8;
CREATE USER 'am'@'%' IDENTIFIED BY 'password';
CREATE USER 'cm'@'%' IDENTIFIED BY 'password';
CREATE USER 'rm'@'%' IDENTIFIED BY 'password';
CREATE USER 'hue'@'%' IDENTIFIED BY 'password';
CREATE USER 'hive'@'%' IDENTIFIED BY 'password';
CREATE USER 'oozie'@'%' IDENTIFIED BY 'password';
CREATE USER 'nav_as'@'%' IDENTIFIED BY 'password';
CREATE USER 'nav_ms'@'%' IDENTIFIED BY 'password';
CREATE USER 'sentry'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON am.* TO 'am'@'%';
GRANT ALL PRIVILEGES ON cm.* TO 'cm'@'%';
GRANT ALL PRIVILEGES ON rm.* TO 'rm'@'%';
GRANT ALL PRIVILEGES ON hue.* TO 'hue'@'%';
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';
GRANT ALL PRIVILEGES ON oozie.* TO 'oozie'@'%';
GRANT ALL PRIVILEGES ON sentry.* TO 'sentry'@'%';
GRANT ALL PRIVILEGES ON nav_as.* TO 'nav_as'@'%';
GRANT ALL PRIVILEGES ON nav_ms.* TO 'nav_ms'@'%';
FLUSH PRIVILEGES;
3.4 安装JDBC驱动
在所有节点安装JDBC驱动
[root@node01 ~]# mkdir -p /usr/share/java/
[root@node01 ~]# mv ~/mysql-connector-java-5.1.48.jar /usr/share/java/
[root@node01 ~]# cd /usr/share/java/
[root@node01 java]# ln -s mysql-connector-java-5.1.48.jar mysql-connector-java.jar
[root@node01 java]# ll
total 984
-rw-r--r-- 1 root root 1006956 Jun 4 18:50 mysql-connector-java-5.1.48.jar
lrwxrwxrwx 1 root root 31 Jun 4 18:51 mysql-connector-java.jar -> mysql-connector-java-5.1.48.jar
4. Cloudera Manager 安装
4.1 安装包准备
下载CM6.3.1的安装包:
https://archive.cloudera.com/cm6/6.3.1/allkeys.asc
下载CDH6.3.2的安装包,说明:官网提供了CM6.3.1的RPM包但是没有提供CDH6.3.1的包,所以我们只能下载CDH6.3.2的包,经我测试,可以使用:
https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1
https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256
https://archive.cloudera.com/cdh6/6.3.2/parcels/manifest.json
将CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1
文件重名为CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
,然后将所有安装包上传到服务器,我把CM安装包放到了/root/cm6.3.1
目录下,将CDH安装包放到了/root/cdh6.3.2
目录下:
[root@node01 ~]# ll cm6.3.1/
total 1199784
-rw-r--r-- 1 root root 14041 Oct 11 2019 allkeys.asc
-rw-r--r-- 1 2001 2001 10483568 Sep 25 2019 cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm
-rw-r--r-- 1 2001 2001 1203832464 Sep 25 2019 cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm
-rw-r--r-- 1 2001 2001 11488 Sep 25 2019 cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
-rw-r--r-- 1 2001 2001 10996 Sep 25 2019 cloudera-manager-server-db-2-6.3.1-1466458.el7.x86_64.rpm
-rw-r--r-- 1 2001 2001 14209868 Sep 25 2019 enterprise-debuginfo-6.3.1-1466458.el7.x86_64.rpm
[root@node01 ~]# ll cdh6.3.2/
total 2033436
-rw-r--r-- 1 root root 2082186246 Jun 9 18:57 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
-rw-r--r-- 1 root root 40 Jun 9 17:20 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
-rw-r--r-- 1 root root 64 Jun 9 17:21 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256
-rw-r--r-- 1 root root 33887 Jun 9 17:20 manifest.json
进入CM安装包目录,生成RPM元数据:
[root@node01 cm6.3.1]# yum install createrepo -y
[root@node01 cm6.3.1]# createrepo .
Spawning worker 0 with 2 pkgs
Spawning worker 1 with 1 pkgs
Spawning worker 2 with 1 pkgs
Spawning worker 3 with 1 pkgs
Workers Finished
Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete
4.2 HTTP服务器配置
[root@node01 ~]# mv cm6.3.1 /var/www/html/
注意:阿里云服务器需要开放80端口
4.3 制作CM局域网内的yum源
root@node01 ~]# vim /etc/yum.repos.d/cm.repo
[cmrepo]
name = cm_repo
baseurl = http://node01/cm6.3.1
enable = true
gpgcheck = false
# 看到cmrepo那一行就代表cmrepo生效了
[root@node01 ~]# yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
cmrepo | 2.9 kB 00:00:00
cmrepo/primary_db | 6.8 kB 00:00:00
repo id repo name status
base/7/x86_64 CentOS-7 10,070
cmrepo cm_repo 5
epel/x86_64 Extra Packages for Enterprise Linux 7 - x86_64 13,314
extras/7/x86_64 CentOS-7 397
mysql-connectors-community/x86_64 MySQL Connectors Community 153
mysql-tools-community/x86_64 MySQL Tools Community 110
mysql57-community/x86_64 MySQL 5.7 Community Server 424
updates/7/x86_64 CentOS-7 743
repolist: 25,216
4.4 安装Cloudera Manager Server
node01上安装Cloudera Manager Server:
[root@node01 ~]# yum -y install cloudera-manager-server
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package cloudera-manager-server.x86_64 0:6.3.1-1466458.el7 will be installed
--> Processing Dependency: cloudera-manager-daemons = 6.3.1 for package: cloudera-manager-server-6.3.1-1466458.el7.x86_64
--> Running transaction check
---> Package cloudera-manager-daemons.x86_64 0:6.3.1-1466458.el7 will be installed
--> Finished Dependency Resolution
......
Installed:
cloudera-manager-server.x86_64 0:6.3.1-1466458.el7
Dependency Installed:
cloudera-manager-daemons.x86_64 0:6.3.1-1466458.el7
Complete!
安装完成后,在/opt
目录下生成了cloudera
目录:
[root@node01 cloudera]# ll /opt/cloudera/
total 12
drwxr-xr-x 27 cloudera-scm cloudera-scm 4096 Jun 10 15:48 cm
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Sep 25 2019 csd
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Sep 25 2019 parcel-repo
将cdh6.3.2
目录下的4个文件移动到/opt/cloudera/parcel-repo
目录下:
[root@node01 ~]# mv cdh6.3.2/* /opt/cloudera/parcel-repo/
[root@node01 ~]# ll /opt/cloudera/parcel-repo/
total 2033436
-rw-r--r-- 1 root root 2082186246 Jun 9 18:57 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
-rw-r--r-- 1 root root 40 Jun 9 17:20 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
-rw-r--r-- 1 root root 64 Jun 9 17:21 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256
-rw-r--r-- 1 root root 33887 Jun 9 17:20 manifest.json
4.5 初始化数据库
[root@node01 ~]# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql cm cm your_password
JAVA_HOME=/usr/java/jdk
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
......
INFO Successfully connected to database.
All done, your SCM database is configured correctly!
4.6 启动Cloudera Manager Server
[root@node01 ~]# systemctl start cloudera-scm-server
[root@node01 ~]# systemctl status cloudera-scm-server
● cloudera-scm-server.service - Cloudera CM Server Service
Loaded: loaded (/usr/lib/systemd/system/cloudera-scm-server.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-06-10 15:59:34 CST; 5s ago
Process: 2051 ExecStartPre=/opt/cloudera/cm/bin/cm-server-pre (code=exited, status=0/SUCCESS)
Main PID: 2054 (java)
CGroup: /system.slice/cloudera-scm-server.service
└─2054 /usr/java/jdk1.8.0_181/bin/java -cp .:/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-j...
......
启动需要花几分钟时间,观察启动日志,日志中出现以下信息才算真的启动成功:
2020-06-10 16:00:45,469 INFO WebServerImpl:org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@7dd12982{HTTP/1.1,[http/1.1]}{0.0.0.0:7180}
2020-06-10 16:00:45,479 INFO WebServerImpl:org.eclipse.jetty.server.Server: Started @70548ms
2020-06-10 16:00:45,479 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
访问WEB-UI,注意,阿里云服务器需要开放7180端口,账号密码都是admin:
5. CM集群初始化
6. 解决CM集群的警告信息
点击完成后,来到集群监控主页面,我这里报了很多的警告信息,我们要去耐心查看报警信息,修改配置,直到满足CM集群的要求,才能解决这些问题
查看警告信息发现,很多是日志目录的空间不足以及堆转储空间不足
于是我们修改各组件的日志目录和堆转储目录:
在堆转储的警告详情中找到关键目录/tmp
除此之外,还有一些其他的存储位置磁盘空间不足的警告:
这种就直接在配置中搜索所有预警的目录,修改到/data目录下即可:
然后我们还有一些内存配置不合理的警告:
这种警告,我们逐个点击链接进去修改为CM建议的内存大小即可:
还有一个DataNode数量不足的警告:
我们只能是设置检查的策略,就是让CM不要检查DN数量了,我们机器有限,这个只能是通过放宽预警条件来解决:
然后还有内存调拨过度的警告:
这两个警告,只能是通过增加内存来解决,当然也可以通过设置检查阈值来让其不警告,但这是自欺欺人的做法,我们的机器资源有限,只能让这两个警告就放在这里,通过耐心的查看报警信息,修改配置,一般的问题都能得到解决,最终我搭建的CM集群是这样的:
到这里,基于阿里云CentOS7.6系统搭建CM6.3就成功了,我会在之后的文章中继续探索CM的使用