hadoop/hbase/spark安装
2017-03-31 本文已影响0人
曾小俊爱睡觉
安装母机linux
下载CentOS-6.5-x86_64-bin-minimal.iso进行U盘安装
安装KVM虚拟化软件
yum install kvm libvirt python-virtinst qemu-kvm virt-viewer bridge-utils #安装
/etc/init.d/libvirtd start #启动
也可以在软件管理器搜索kvm进行安装
虚拟化机器集群
桥接方式创建虚拟机:
virt-install
--name=gateway #名字
--ram 4096 #内存
--vcpus=4 #cpu核数
-f /home/kvm/gateway.img #文件
--cdrom /root/CentOS-6.5-x86_64-bin-minimal.iso #iso镜像文件
--graphics vnc,listen=0.0.0.0,port=5920, #是否使用vnc连接器
--network bridge=br0 --force --autostart #采用桥接方式桥接br0,自动启动
也可以在图形化界面进行新增操作。minimal安装的linux需要先安装桌面环境。
yum -y groupinstall Desktop
yum -y groupinstall "X Window System"
startx #启动图形化
如果想默认以图形界面启动,则修改/etc/inittab
id:5:initdefault: #3为默认命令行,5为图形化,其他不常用
搭建环境前摇
1、网络设置
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
HWADDR=52:54:00:00:9d:f1
IPADDR=192.168.0.231
PREFIX=24
GATEWAY=192.168.0.1
DNS1=192.168.0.1
DNS2=8.8.8.8
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
2、关闭selinux
/etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
3、修改文件句柄
在/etc/security/limis.conf增加以下配置,将root替换为你想授权的用户
root soft nofile 65535
root hard nofile 65535
root soft nproc 32000
root hard nproc 32000
4、关闭防火墙
service iptables stop #现在关闭iptables
chkconfig --level 35 iptables off #以后重启也不启动iptables
5、修改主机名和hosts
cat > /etc/sysconfig/network << EOF
> NETWORKING=yes
> HOSTNAME=spark-1
> GATEWAY=192.168.0.1
> EOF
cat >> /etc/hosts << EOF
> 192.168.0.231 spark-1
> 192.168.0.232 spark-2
> 192.168.0.233 spark-3
> 192.168.0.234 spark-4
> EOF
环境搭建
- ssh互通
每台机器执行以下命令:
ssh-keygen
touch authorized_keys
将每台机器的id_rsa.pub的内容拷贝到每台机器的authorized_keys文件内 ssh-copy-id命令
- 安装和配置启动hadoop
从官网下载hadoop-2.7.3.tar.gz到~
tar zxvf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 /usr/local
echo "HADOOP_HOME=/usr/local/hadoop-2.7.3" >> /etc/profile
echo "PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> /etc/profile
source /etc/profile
cd $HADOOP_HOME
# 修改core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.7.3/var</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://spark-1:9000</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>2880</value>
</property>
# 修改hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>spark-1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>spark-2:50090</value>
</property>
# 修改mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>spark-1:8021</value>
</property>
# 修改slaves
spark-1
spark-2
spark-3
spark-4
# 启动hadoop
$HADOOP_HOME/bin/hadoop namenode -format
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave
安装zookeeper
同上步骤1将zookeeper-3.4.6.tar.gz解压到/usr/local,将ZOOKEEPER_HOME加到/etc/profile,更新PATH
创建data目录和myid文件和logs目录
mkdir -p $ZOOKEEPER_HOME/data
touch $ZOOKEEPER_HOME/myid
echo 1 > $ZOOKEEPER_HOME/myid #安装的每台机器拥有一个id,一般安装奇数台机器,id一般0或1开始,保证唯一性
mkdir -p $ZOOKEEPER_HOME/logs
配置zookeeper
cd $ZOOKEEPER_HOME/conf
cp zoo_sample.cfg zoo.cfg
echo > zoo.cfg << EOF
> tickTime=2000
> dataDir=/usr/local/zookeeper-3.4.6/data
> dataLogDir=/usr/local/zookeeper-3.4.6/logs
> clientPort=2181
> tickTime=2000
> initLimit=10
> syncLimit=5
> server.0=spark-1:2888:3888
> server.1=spark-2:2888:3888
> server.2=spark-3:2888:3888
> EOF
启动zookeeper,在每台安装机器执行:
zkServer.sh start
zkServer.sh status #查看状态
安装hbase
同上解压和配置环境变量
修改hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://spark-1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>spark-1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>spark-1,spark-2,spark-3</value>
</property>
修改hbase-env.sh,添加以下内容
export JAVA_HOME=/usr/local/jdk #java安装目录
export HBASE_LOG_DIR=/usr/local/hbase-1.2.1/logs #Hbase日志目录
export HBASE_MANAGES_ZK=false #如果使用HBase自带的Zookeeper值设成true 如果使用自己安装的Zookeeper需要将该值设为false
修改regionservers,将安装HBase的主机名加入,去掉localhost启动hbase
$HBASE_HOME/bin/start-hbase.sh
安装spark
如上解压和配置环境变量
将hadoop的core-site.xml、hdfs-site.xml和HBase的hdfs-site.xml拷到spark的conf目录
修改spark-default.conf
spark.executor.memory 6g
spark.eventLog.enabled true
spark.eventLog.dir hdfs://spark-1:9000/spark-history
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.eventLog.compress true
spark.scheduler.mode FAIR
修改spark-env.sh
export HBASE_HOME=/usr/local/hbase-1.2.1
export HIVE_HOME=/usr/local/hive-1.2.1
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/spark-2.0.1-hadoop2.7/jars/hbase/*
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_IP=spark-1
export SPARK_WORKER_MEMORY=11g
export SPARK_WORKER_CORES=4
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=6g
export SPARK_DAEMON_MEMORY=11g
export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
# export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=s1:2181,s2:2181,s3:2181 -Dspark.deploy.zookeeper.dir=/spark" #zookeeper管理模式
export SPARK_LOG_DIR=/usr/local/spark-2.0.1-hadoop2.7/logs
export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://spark-1:9000/spark-history"
将hbase的lib下的以下jar包拷到spark的jars/hbase目录
guava-12.0.1.jar hbase-client-1.2.1.jar hbase-common-1.2.1.jar hbase-protocol-1.2.1.jar hbase-server-1.2.1.jar htrace-core-3.1.0-incubating.jar metrics-core-2.2.0.jar protobuf-java-2.5.0.jar
启动spark
$SPARK_HOME/sbin/start-all.sh
$SPARK_HOME/sbin/start-history-server.sh
快捷安装思路:
- 善用rsync:
由于几乎大部分配置都是一样,可以在一台机器上先做以上配置,用rsync -avz 进行文件夹同步,然后每台机器配置不同的地方 - 使用fabric
通过编写fabric脚本来安装集群,对shell能力要求较高