Note22:Hadoop2.7.2 支持LZO压缩

2020-07-14  本文已影响0人  K__3f8b

Note22:Hadoop2.7.2 支持LZO压缩

编译准备

hadoop 本身并不支持 lzo 压缩,故需要使用 twitter 提供的 hadoop-lzo 开源组件。

hadoop-lzo 需依赖 hadoop 和 lzo 进行编译,编译步骤如下:

  1. LZO:http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz

  2. hadoop-lzo源码包:https://github.com/twitter/hadoop-lzo/archive/master.zip

  3. JDK

  4. Maven

  1. 安装JDK
[root@hadoop115 software] # tar -zxvf jdk-8u241-linux-x64.tar.gz -C /opt/module/

[root@hadoop115 software]# vim /etc/profile
#JAVA_HOME:
export JAVA_HOME=/opt/module/jdk1.8.0_241
export PATH=$PATH:$JAVA_HOME/bin

[root@hadoop115 software]# source /etc/profile

验证命令:java -version

  1. 安装Maven
[root@hadoop115 software]# tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module/
[root@hadoop115 apache-maven-3.6.3]# vim /etc/profile
#MAVEN_HOME
export MAVEN_HOME=/opt/module/apache-maven-3.6.3
export PATH=$PATH:$MAVEN_HOME/bin

[root@hadoop115 software]# source /etc/profile

验证命令:mvn -version

修改 settings.xml 配置国内阿里云镜像

[root@hadoop115 apache-maven-3.6.3]# vi conf/settings.xml

# 需要找对相应位置添加下面内容:

<localRepository>/opt/module/apache-maven-3.6.3/Local_Repository</localRepository>

<mirrors>
     <mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>central</mirrorOf>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
     </mirror>
</mirrors>
  1. 安装其他
[root@hadoop115 apache-maven-3.6.3]# yum -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool

编译LZO

[root@hadoop115 software]# tar -zxvf lzo-2.10.tar.gz
[root@hadoop115 software]# cd lzo-2.10/
[root@hadoop115 lzo-2.10]# ./configure -prefix=/usr/local/hadoop/lzo/
[root@hadoop115 lzo-2.10]# make
[root@hadoop115 lzo-2.10]# make install

编译 hadoop-lzo 源码

[root@hadoop115 software]# unzip hadoop-lzo-master.zip
[root@hadoop115 software]# cd hadoop-lzo-master
[root@hadoop115 hadoop-lzo-master]# vim pom.xml

<hadoop.current.version>2.7.2</hadoop.current.version>
[root@hadoop115 hadoop-lzo-master]# export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
[root@hadoop115 hadoop-lzo-master]# export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
[root@hadoop115 hadoop-lzo-master]# cd hadoop-lzo-master
[root@hadoop115 hadoop-lzo-master]# mvn package -Dmaven.test.skip=true

使用LZO压缩

将编译好后的 hadoop-lzo-0.4.21-SNAPSHOT.jar 放入 /opt/module/hadoop-2.7.2/share/hadoop/common/

分发

[kevin@hadoop112 module]$ cd /opt/module/hadoop-2.7.2/share/hadoop/common/
[kevin@hadoop112 common]$ xsync.sh hadoop-lzo-0.4.21-SNAPSHOT.jar
[kevin@hadoop112 module]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/
[kevin@hadoop112 hadoop]$ vim core-site.xml

内容:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop101:9000</value>
    </property>

    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>

    <!-- 添加压缩方式 -->
    <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
    </property>
    <property>
        <name>io.compression.codec.lzo.class</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
</configuration>
[kevin@hadoop112 hadoop]$ xsync.sh core-site.xml

创建索引

[kevin@hadoop112 module]$ yum -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool

否则会报错误:ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library

[kevin@hadoop112 module]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
上一篇下一篇

猜你喜欢

热点阅读