Hadoop 在 Windows 操作系统中的安装、配置及运行

2020-04-26  本文已影响0人  又语

本文介绍在 Windows 操作系统上安装、配置及运行 Hadoop 的方法与过程。


目录


版本说明


安装

  1. Hadoop 3 依赖于 JDK 8 以上版本,所以首先需要下载、安装和配置 JDK(安装和配置过程略)。

  2. 下载 Hadoop,本示例下载文件为 hadoop-3.2.1.tar.gz

  3. 解压缩到指定安装目录。注意:解压缩分为两步:首先,解压缩 .tar.gz 文件;然后,解压缩 .tar 文件。如果 Windows 10 解压缩 .tar.gz 文件出现问题,可以先将 .tar.gz 格式转换为 zip。(转换方法略)

  4. 下载 https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin 放入 Hadoop 安装根目录的 bin 文件夹下。

  5. 创建一个文件夹 hadoop-env 存放下载的文件。


配置

配置环境变量
  1. 添加系统环境变量 HADOOP_HOME 指向 Hadoop 安装根目录。

  2. 修改系统环境变量 Path,添加 %HADOOP_HOME%\bin

  3. 输入 cmd 打开 Windows 命令提示符,输入命令 hadoop version 查看版本。

C:\Users\...>hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /D:/Dev/Hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
配置 Hadoop 集群

需要修改 4 个文件:

  1. 修改 \etc\hadoop\core-site.xml,添加 fs.defaultFS 属性配置。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
  1. 修改 \etc\hadoop\mapred-site.xml,添加 mapreduce.framework.name 属性配置。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>
  1. 修改 \etc\hadoop\hdfs-site.xml,添加 dfs.replicationdfs.namenode.name.dirdfs.datanode.data.dir 属性。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///D:/Dev/Hadoop/hadoop-env/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///D:/Dev/Hadoop/hadoop-env/data/datanode</value>
    </property>
</configuration>
  1. 修改 \etc\hadoop\yarn-site.xml,添加 yarn.nodemanager.aux-servicesyarn.nodemanager.aux-services.mapreduce.shuffle.class 属性。
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>
  1. 执行命令 hdfs namenode -format 格式化 Name Node,因为 Hadoop 3.2.1 版本 Bug 导致会出现以下异常:
...
2020-04-26 17:46:55,871 ERROR namenode.NameNode: Failed to start namenode.
java.lang.UnsupportedOperationException
        at java.nio.file.Files.setPosixFilePermissions(Files.java:2044)
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
        at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
        at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1649)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
2020-04-26 17:46:55,877 INFO util.ExitUtil: Exiting with status 1: java.lang.UnsupportedOperationException
2020-04-26 17:46:55,881 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
************************************************************/

解决方法:

再次执行格式化 Name Node 的命令 hdfs namenode -format,成功。

...
2020-04-26 18:04:31,888 INFO namenode.FSImageFormatProtobuf: Image file D:\Dev\Hadoop\hadoop-env\data\namenode\current\fsimage.ckpt_0000000000000000000 of size 404 bytes saved in 0 seconds .
2020-04-26 18:04:31,904 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-04-26 18:04:31,920 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2020-04-26 18:04:31,920 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
************************************************************/

启动运行

  1. 进入 %HADOOP_HOME%\sbin 目录,执行 start-dfs.cmd 命令,会打开两个窗口,一个是名称节点(Name Node),一个是数据节点(Data Node)。

  2. 进入 %HADOOP_HOME%\sbin 目录,执行 start-yarn.cmd 命令启动 Hadoop Yarn 服务。同样会打开两个窗口,一个是资源管理器(resource manager),一个是节点管理器(node manager)。

D:\Dev\Hadoop\hadoop-3.2.1\sbin>start-yarn.cmd
starting yarn daemons
  1. 执行 jps 命令检查是否所有服务都已成功启动。
D:\Dev\Hadoop\hadoop-3.2.1\sbin>jps
13140 DataNode
16596 NameNode
9956 Jps
10712 ResourceManager
11864
1132 NodeManager

除以上启动方法外,还可以使用 %HADOOP_HOME%\sbin 目录下的 start-all.cmd 一次性全部启动(已不建议使用)。


Web UI

Hadoop 提供了三个用户 Web 界面:

  1. 名称节点 Web 页:http://localhost:9870/dfshealth.html#tab-overview
  2. 数据节点 Web 页:http://localhost:9864/datanode.html
  3. YARN Web 页:http://localhost:8088/cluster

停止运行

%HADOOP_HOME%\sbin 目录下的 stop-all.cmd 一次性全部停止。

上一篇下一篇

猜你喜欢

热点阅读