Sqoop安装和简单使用
2018-02-24 本文已影响0人
百里香香
下载sqoop
因为官方并不建议在生产环境中使用sqoop2,即1.99.7,所以这里我用的是sqoop1,即1.4.7
点击下载:http://mirror.bit.edu.cn/apache/sqoop/
解压和上传
我用的xshell辅助工具,先在本地解压改了名称之后上传到Linux上的
sqoop.png
上传linux
linux下.png
也可上传压缩包,执行解压缩,执行命令:
cd /home/hadoop
tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar
配置环境变量
1.配置环境变量
编辑/etc/profile文件,添加SQOOP_HOME变量,并且将$SQOOP_HOME/bin添加到PATH变量中
添加后的profile文件内容
export JAVA_HOME=/home/hadoop/jdk1.8.0_144
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export HBASE_HOME=/home/hadoop/hbase-1.2.2
export HIVE_HOME=/home/hadoop/hive-2.1.0
export SQOOP_HOME=/home/hadoop/sqoop-1.4.7
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin:$PATH
编辑完成后,执行命令: source /etc/profile
2.sqoop配置文件修改
2.1 sqoop-env.sh 文件
进入 /home/hadoop/sqoop-1.4.7/conf 目录下,也就是执行:
cd /home/hadoop/sqoop-1.4.7/conf
将sqoop-env-template.sh复制一份,并取名为sqoop-env.sh,也就是执行命令:
cp sqoop-env-template.sh sqoop-env.sh
文件末尾加入一下配置:
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.2
export HIVE_HOME=/home/hadoop/hive-2.1.0
export HBASE_HOME=/home/hadoop/hbase-1.2.2
注:上面的路径要改为自己的路径
3.把MySQL的驱动包上传到sqoop的lib下
驱动包.png本人测试这个版本在这里是可以的
4.使用sqoop
安装后,如果命令不涉及hive和hdfs的,可以不启动,例如sqoop help命令:
此命令帮助查看sqoop有哪些命令
[hadoop@master ~]$ sqoop help
Warning: /home/hadoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/02/24 16:39:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
5.使用sqoop查看mysql中的数据表:
进入$SQOOP_HOME/bin目录下执行如下命令:连接mysql看有多少个表
./sqoop list-databases --connect jdbc:mysql://192.168.1.34:3306/test?characterEncoding=UTF-8 --username root --password '123'
命令.png
6.把MySQL中的表导入hdfs中
前提:一定要启动hdfs和yarn,本人忘了启动yarn,一直显示连接超时,搞了半天才反应过来.......
sqoop import -m 1 --connect jdbc:mysql://192.168.1.34:3306/test --username root --password 123 --table name --target-dir /user/sqoop/datatest
显示:
18/02/24 15:52:32 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/02/24 15:52:33 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/02/24 15:52:33 INFO client.RMProxy: Connecting to ResourceManager at master.hadoop/192.168.139.128:8032
18/02/24 15:52:38 INFO db.DBInputFormat: Using read commited transaction isolation
18/02/24 15:52:38 INFO mapreduce.JobSubmitter: number of splits:1
18/02/24 15:52:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1519458728102_0001
18/02/24 15:52:39 INFO impl.YarnClientImpl: Submitted application application_1519458728102_0001
18/02/24 15:52:39 INFO mapreduce.Job: The url to track the job: http://master.hadoop:8088/proxy/application_1519458728102_0001/
18/02/24 15:52:39 INFO mapreduce.Job: Running job: job_1519458728102_0001
18/02/24 15:52:51 INFO mapreduce.Job: Job job_1519458728102_0001 running in uber mode : false
18/02/24 15:52:51 INFO mapreduce.Job: map 0% reduce 0%
18/02/24 15:52:57 INFO mapreduce.Job: map 100% reduce 0%
18/02/24 15:52:57 INFO mapreduce.Job: Job job_1519458728102_0001 completed successfully
18/02/24 15:52:58 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=136482
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=16
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3648
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3648
Total vcore-milliseconds taken by all map tasks=3648
Total megabyte-milliseconds taken by all map tasks=3735552
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=88
CPU time spent (ms)=630
Physical memory (bytes) snapshot=100278272
Virtual memory (bytes) snapshot=2060853248
Total committed heap usage (bytes)=16719872
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=16
18/02/24 15:52:58 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 24.5469 seconds (0.6518 bytes/sec)
18/02/24 15:52:58 INFO mapreduce.ImportJobBase: Retrieved 1 records.