使用 Hibench 对 Spark 进行基准测试

2018-09-20 本文已影响451人 breeze_lsw

概述

Hibench 是 Intel 开源的大数据基准测试工具，可以评估不同大数据框架的速度，吞吐量和系统资源利用率。包括 Sort, WordCount, TeraSort, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight 和 enhanced DFSIO等，还支持流任务的基准测试，例如 Spark Streaming, Flink, Storm 和 Gearpump。

环境准备

安装

//获取源码
wget https://github.com/intel-hadoop/HiBench/archive/HiBench-7.0.zip
//编译spark相关模块
 mvn -Phadoopbench -Psparkbench -Dspark=2.2 -Dscala=2.11 clean package
// 安装 bc 用于生成 report 信息
yum install bc

配置

conf/spark.conf
根据文件提示配置相关的路径和参数即可

conf/hadoop.conf

# Hadoop home
hibench.hadoop.home     /opt/cloudera/parcels/CDH/lib/hadoop

# The path of hadoop executable
hibench.hadoop.executable     /opt/cloudera/parcels/CDH/bin/hadoop

# Hadoop configraution directory
hibench.hadoop.configure.dir  /etc/hadoop/conf

# hdfs namenode 地址或者 nameservice
hibench.hdfs.master       hdfs://localhost:8020

# Hadoop release provider. Supported value: apache, cdh5, hdp
hibench.hadoop.release    cdh5

conf/hibench.conf

# 生成测试数据的规模，默认有 tiny, small, large, huge, gigantic and bigdata，具体可以在对应benchmark的conf里修改
hibench.scale.profile                tiny

# Mapper number in hadoop, partition number in Spark
hibench.default.map.parallelism         8

# Reducer nubmer in hadoop, shuffle partition number in Spark
hibench.default.shuffle.parallelism     8

运行

进行terasort 测试

配置

可以自定义数据规模
conf/workloads/micro/terasort.conf

#datagen
hibench.terasort.tiny.datasize          32000
hibench.terasort.small.datasize         3200000
hibench.terasort.large.datasize         32000000
hibench.terasort.huge.datasize          320000000
hibench.terasort.gigantic.datasize      3200000000
hibench.terasort.bigdata.datasize       6000000000
# 增加自定义的数据量
hibench.terasort.myscale.datasize 5242880

hibench.workload.datasize       ${hibench.terasort.${hibench.scale.profile}.datasize}

# export for shell script
hibench.workload.input          ${hibench.hdfs.data.dir}/Terasort/Input
hibench.workload.output         ${hibench.hdfs.data.dir}/Terasort/Output%

在 hibench.conf 中设置 hibench.scale.profile 为 myscale。

bin/workloads/micro/terasort/prepare/prepare.sh
bin/workloads/micro/terasort/spark/run.sh

查看报告

report/hibench.report

68DDBA74-3397-467E-8C50-8CA65ABEC4DE.png

使用 Hibench 对 Spark 进行基准测试

概述

环境准备

安装

配置

运行

配置

查看报告

猜你喜欢

热点阅读