大数据集群HDFS IO 测试
2018-12-13 本文已影响34人
润土1030
在日常的开发过程中,很多时候需要测试HDFS集群的写入性能,根据测试结果判断集群的性能是否要增加节点。或者很多时候需要判断我写入1个T的文件所需要的时间。
hadoop 自身就带了这样一个工具,我用的是cdh版本的,命令如下
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
这条命令是往HDFS中写入10个大小为100M的文件,参数可以根据你的需要来调整,这里因为只是测试的原因,不写入那么多文件
执行命令后结果如下,可看到统计信息
[hdfs@dlbdn3 ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
18/12/13 17:11:07 INFO fs.TestDFSIO: TestDFSIO.1.7
18/12/13 17:11:07 INFO fs.TestDFSIO: nrFiles = 10
18/12/13 17:11:07 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
18/12/13 17:11:07 INFO fs.TestDFSIO: bufferSize = 1000000
18/12/13 17:11:07 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
18/12/13 17:11:08 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files
18/12/13 17:11:09 INFO fs.TestDFSIO: created control files for: 10 files
18/12/13 17:11:09 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:11:10 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:11:10 INFO mapred.FileInputFormat: Total input paths to process : 10
18/12/13 17:11:10 INFO mapreduce.JobSubmitter: number of splits:10
18/12/13 17:11:10 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
18/12/13 17:11:10 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
18/12/13 17:11:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543800485319_1066
18/12/13 17:11:11 INFO impl.YarnClientImpl: Submitted application application_1543800485319_1066
18/12/13 17:11:11 INFO mapreduce.Job: The url to track the job: http://dlbdn3:8088/proxy/application_1543800485319_1066/
18/12/13 17:11:11 INFO mapreduce.Job: Running job: job_1543800485319_1066
18/12/13 17:11:18 INFO mapreduce.Job: Job job_1543800485319_1066 running in uber mode : false
18/12/13 17:11:18 INFO mapreduce.Job: map 0% reduce 0%
18/12/13 17:11:38 INFO mapreduce.Job: map 67% reduce 0%
18/12/13 17:11:40 INFO mapreduce.Job: map 73% reduce 0%
18/12/13 17:11:45 INFO mapreduce.Job: map 83% reduce 0%
18/12/13 17:11:46 INFO mapreduce.Job: map 87% reduce 0%
18/12/13 17:11:47 INFO mapreduce.Job: map 90% reduce 0%
18/12/13 17:11:48 INFO mapreduce.Job: map 100% reduce 0%
18/12/13 17:11:55 INFO mapreduce.Job: map 100% reduce 100%
18/12/13 17:11:55 INFO mapreduce.Job: Job job_1543800485319_1066 completed successfully
18/12/13 17:11:55 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=405
FILE: Number of bytes written=1411227
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2320
HDFS: Number of bytes written=1048576079
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=244225
Total time spent by all reduces in occupied slots (ms)=4729
Total time spent by all map tasks (ms)=244225
Total time spent by all reduce tasks (ms)=4729
Total vcore-milliseconds taken by all map tasks=244225
Total vcore-milliseconds taken by all reduce tasks=4729
Total megabyte-milliseconds taken by all map tasks=250086400
Total megabyte-milliseconds taken by all reduce tasks=4842496
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=752
Map output materialized bytes=996
Input split bytes=1200
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=996
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1236
CPU time spent (ms)=77520
Physical memory (bytes) snapshot=6496624640
Virtual memory (bytes) snapshot=18602020864
Total committed heap usage (bytes)=9065988096
Peak Map Physical memory (bytes)=626966528
Peak Map Virtual memory (bytes)=1701945344
Peak Reduce Physical memory (bytes)=334086144
Peak Reduce Virtual memory (bytes)=1710272512
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=79
18/12/13 17:11:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/12/13 17:11:55 INFO fs.TestDFSIO: Date & time: Thu Dec 13 17:11:55 CST 2018
18/12/13 17:11:55 INFO fs.TestDFSIO: Number of files: 10
18/12/13 17:11:55 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
18/12/13 17:11:55 INFO fs.TestDFSIO: Throughput mb/sec: 5.411636099942095
18/12/13 17:11:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.548243522644043
18/12/13 17:11:55 INFO fs.TestDFSIO: IO rate std deviation: 0.9476594426012499
18/12/13 17:11:55 INFO fs.TestDFSIO: Test exec time sec: 45.924
18/12/13 17:11:55 INFO fs.TestDFSIO:
[hdfs@dlbdn3 ~]$
hadoop还带了很多有用的工具,放到以后再讲。