Hadoop - 数据仓库 - 大数据 - centos - 搭
2020-04-22 本文已影响0人
_Unique_楠楠
参考环境配置:centos常用集群搭建配置参考
下载安装包
tar -zxvf /tmp/hadoop-3.1.3.tar.gz -C /opt
准备工作
- 需要准备至少3个节点并做好免密登陆
- 需要jdk
- 需要zookeeper支持
系统环境变量支持
vi /etc/profile
export HADOOP_HOME=/opt/hadoop-3.1.3
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
Hadoop运行环境配置修改-用来定义hadoop运行环境相关的配置信息
vi /opt/hadoop-3.1.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk --增加
集群配置-用于定义系统级别的参数,如HDFS URL 、Hadoop的临时目录等
vi /opt/hadoop-3.1.3/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/ha</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node04:2181,node05:2181,node06:2181</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
HDFS配置-如名称节点和数据节点的存放位置、文件副本的个数、文件的读取权限等
vi /opt/hadoop-3.1.3/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node02:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node03:8020</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node04:8485;node05:8485;node06:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/hadoop/ha/jnn</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
集群资源管理系统参数配置ResourceManager ,nodeManager的通信端口,web监控端口等
vi /opt/hadoop-3.1.3/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmhacluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node02</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node03</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node02:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node03:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node04:2181,node05:2181,node06:2181</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
Mapreduce参数-包括JobHistory Server 和应用程序参数两部分,如reduce任务的默认个数、任务所能够使用内存的默认上下限等
vi /opt/hadoop-3.1.3/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
配置dataNode节点(此配置在start-dfs.sh脚本启动集群时,会吧相对的dataNode节点启动)
vi /opt/hadoop-3.1.3/etc/hadoop/workers
-- 删除localhost 增加dataNode节点配置
node04
node05
node06
分发
-- 将配置好的hadoop分发到其他的节点上
scp -r hadoop-3.1.3/ root@node04:/opt
启动 - datanode
hdfs --daemon start journalnode
node04,node05,node06
格式化和启动 - namenode 格式化(node02)
hdfs namenode -format
--启动刚才格式化好的namenode
hdfs --daemon start namenode
同步 - nameNode备用节点于主节点进行同步
-- 在另一个namenode(node03)行执行同步,把已经格式化好的数据同步过来
hdfs namenode -bootstrapStandby
同步 - 同步zookeeper,在namenode上执行
hdfs zkfc -formatZK
以上步骤属于单独的启动,可以使用集群启动脚本进行整个集群的启动
需进行如下配置
集群启动、停止脚本配置
vi /opt/hadoop-3.1.3/sbin/start-dfs.sh
vi /opt/hadoop-3.1.3/sbin/stop-dfs.sh
-- 增加如下配置
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
-- 启动dfs
start-dfs.sh
-- 停止dfs
stop-dfs.sh
yarn启动、停止脚本配置
vi /opt/hadoop-3.1.3/sbin/start-yarn.sh
vi /opt/hadoop-3.1.3/sbin/stop-yarn.sh
-- 增加如下配置
YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
-- 启动yarn
start-yarn.sh
yarn --daemon start resourcemanager
-- 停止yarn
stop-yarn.sh
yarn --daemon stop resourcemanager