大数据

Hadoop安装(单机模式和伪分布模式)和spark安装,运行w

2019-04-02  本文已影响0人  MountSong

Hadoop系列产品安装(单机模式和伪分布模式):

使用Ubuntu系统

1. 安装jdk,配置环境;(.bashrc中配置)

2. 安装ssh;(单机模式不用)

3. 下载hadoop安装包,解压;

4. hadoop配置:

4.1. 单机模式:若解压成功,则单机模式安装成功。

查看Hadoop内置的例子。(黄色线圈的是单词计数的例子)

在hadoop目录下创建input目录,在input输入需要测试文件,执行命如下命令在mapreduce上进行词频统计:

 ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount ./input/ ./output

输出结果在output文件中。

4.2. 伪分布模式:

Hadoop安装路径为/home/hp/hadoop_env/hadoop-2.8.5。

在安装目录下创建tmp临时文件目录

修改Hadoop安装主目录下etc/hadoop目录下的配置文件core-site.xml:

<configuration>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp</value>

                <description>Abase for other temporary directories.</description>

        </property>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://localhost:9000</value>

        </property>

</configuration>

修改配置文件hdfs-site.xml:

<configuration>

        <property>

                <name>dfs.replication</name>

                <value>1</value>

        </property>

        <property>

                <name>dfs.namenode.name.dir</name>

                <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp/dfs/name</value>

        </property>

        <property>

                <name>dfs.datanode.data.dir</name>

                <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp/dfs/data</value>

        </property>

</configuration>

修改配置文件mapred-site.xml:

<configuration>

<property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

        </property>

</configuration>

修改配置文件yarn-site.xml:

<configuration>

        <property>

                <name>yarn.resourcemanager.hostname</name>

                <value>localhost</value>

        </property>       

<property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

</configuration>

手动添加JAVA_HOME,编辑安装主目录下etc/hadoop目录下的文件hadoop-env.sh:

添加export JAVA_HOME=/home/hp/jdk1.8.0_191

格式化NameNode:主目录中执行(只能执行1次)

./bin/hdfs namenode -format

开启HDFS守护进程(NameNode和DataNode)守护进程:

 ./sbin/start-dfs.sh

开启YARN守护进程:

./sbin/start-yarn.sh

开启作业历史服务器:

 ./sbin/mr-jobhistory-daemon.sh start historyserver  

验证:输入jps,出现如下结果则成功打开。

浏览器中打开“127.0.0.1:50070”查看hdfs的文件系统;

运行wordcount程序:

1.在HDFS中创建用户目录:

./bin/hdfs dfs -mkdir -p /user/hadoop

2.创建输入文件夹+传入文本+查看

./bin/hdfs dfs -mkdir /user/hadoop/input

./bin/hdfs dfs -put ./input/inputWords /user/hadoop/input

./bin/hdfs dfs -ls /user/hadoop/input

3.执行wordcount程序

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /user/hadoop/input /user/hadoop/output

4.查看结果

./bin/hdfs dfs -ls /user/hadoop/output

./bin/hdfs dfs -cat /user/hadoop/output/part-r-00000

关闭Hadoop:

./sbin/mr-jobhistory-daemon.sh stop historyserver

./sbin/stop-yarn.sh

./sbin/stop-dfs.sh

Spark安装

1. [endif]下载scala和spark安装包:

https://www.scala-lang.org/download/

http://spark.apache.org/downloads.html

注意:

1)使用目前的spark2.4.0版本,不能使用scala2.12安装包,这里用2.11;

2)由于安装过hadoop,spark安装包使用不继承hadoop版本,这里使用spark-2.4.0-bin-without-hadoop

2. [endif]环境配置;

2.1.编辑.bashrc,添加

#set spark environment

export SPARK_HOME=/home/hp/hadoop_env/spark-2.4.0-bin-without-hadoop

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

#set scala environment

export SCALA_HOME=/home/hp/hadoop_env/scala-2.11.12

export PATH=$PATH:$SCALA_HOME/bin

2.2.修改spark目录下的conf/spark-env.sh,添加

export SCALA_HOME=/home/hp/hadoop_env/scala-2.11.12

export SPARK_WORKER_MEMORY=2g

export SPARK_MASTER_IP=hp-notebook(主机名)

export MASTER=spark://hp-notebook:7077

export JAVA_HOME=/home/hp/jdk1.8.0_191

export HADOOP_HOME=/home/hp/hadoop_env/hadoop-2.8.5

export SPARK_DIST_CLASSPATH=$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)

export HADOOP_CONF_DIR=/home/hp/hadoop_env/hadoop-2.8.5/etc/hadoop

3. 运行sbin/start-all.sh,启动master和worker进程。浏览器访问”主机名:8080”

4. 测试wordcount程序。

4.1在hdfs中创建输入文件:

./bin/hdfs dfs -mkdir -p /spark

vim spark.txt

./bin/hdfs dfs -put spark.txt /spark

4.2启动spark-shell,执行wordcount:

spark-shell

scala> sc.textFile("/spark/spark.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("/spark/out")

./bin/hdfs dfs -cat /spark/out/p*

上一篇下一篇

猜你喜欢

热点阅读