搭建分布式Hadoop环境(Mac)
-
搭建HDFS
-
HDFS是一个高度容错性的系统,适合部署在廉价的机器上。HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。
-
安装hadoop:
➜ bin brew install hadoop
-
配置ssh免密码登录
用dsa密钥认证来生成一对公钥和私钥:➜ hadoop ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
-
将生成的公钥加入到用于认证的公钥文件中:
➜ hadoop cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_key
-
验证配置是否成功:
➜ hadoop ssh localhost
-
配置core-site.xml(/usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop):
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
配置hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
配置mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
配置yarn-site.xml:
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
-
进入hadoop目录(/usr/local/Cellar/hadoop/2.8.0/libexec):
-
格式化文件系统:
➜ libexec bin/hdfs namenode -format
-
启动NameNode和DataNode的守护进程:
➜ libexec sbin/start-dfs.sh
-
启动ResourceManager和NodeManager的守护进程:
➜ libexec sbin/start-yarn.sh
-
访问localhost:50070和localhost:8088测试是否正常
50070.png
8080.png -
创建hdfs目录:
➜ libexec bin/hdfs dfs -mkdir /user ➜ libexec bin/hdfs dfs -mkdir /user/lkc
-
拷贝一些文件:
➜ libexec bin/hdfs dfs -put /Users/lkc/Desktop/Other input
查看一下拷贝成功:
hadoop.png
-
-
搭建ZooKeeper
-
ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。
-
从官网下载:ZooKeeper官网
-
解压:
tar zxvf zookeeper-3.4.10.tar.gz
-
进入主目录新建一个zoo.cfg的文件,内容如下:
tickTime=2000 dataDir= /Users/lkc/Desktop/Other/zookeeper-3.4.10/data (填写自己的data目录) dataLogDir=/Users/lkc/Desktop/Other/出单zookeeper-3.4.10/logs clientPort=2181
-
运行zookeeper:
�
./bin/zkServer.sh start
-
执行
telnet 127.0.0.1 2181
查看是否启动成功.
-
停止命令:
./bin/zkServer.sh stop
-
-
搭建HBase
-
Hbase是Hadoop的数据库, 而Hive数据库的管理工具, hbase具有
分布式, 可扩展及面向列存储
的特点(基于谷歌BigTable). HBase可以使用本地文件系统和HDFS文件存储系统, 存储的是松散的数据(key-value的映射关系).HBase位于HDFS的上层, 向下提供存储, 向上提供运算
-
安装:
brew install hbase
-
配置:
在
conf/hbase-env.sh
设置JAVA_HOMEexport JAVA_HOME="/usr/bin/java"
-
在
conf/hbase-site.xml
设置HBase的核心配置<configuration> <property> <name>hbase.rootdir</name> //这里设置让HBase存储文件的地方 <value>file:///Users/lkc/Desktop/Other/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> //这里设置让HBase存储内建zookeeper文件的地方 <value>/Users/lkc/Desktop/Other/zookeeper-3.4.10</value> </property> </configuration>
-
/usr/local/Cellar/hbase/1.2.2/bin/start-hbase.sh
提供HBase的启动 -
验证是否成功:
➜ bin ./start-hbase.sh starting master, logging to /usr/local/var/log/hbase/hbase-lkc-master-localhost.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 ➜ bin jps 15171 Jps 7907 DataNode 7828 NameNode 15093 HMaster 8006 SecondaryNameNode
-
启动HBase shell:
➜ bin ./hbase shell 2017-06-11 23:18:38,460 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/Cellar/hbase/1.2.2/libexec/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/lkc/Downloads/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.2, r3f671c1ead70d249ea4598f1bbcc5151322b3a13, Fri Jul 1 08:28:55 CDT 2016 hbase(main):001:0> exit
-
停止:
./bin/stop-hbase.sh
-