hive-testbench

2022-03-09  本文已影响0人  你的努力时光不会辜负

Github:https://github.com/hortonworks/hive-testbench/

TPC-DS:提供一个公平和诚实的业务和数据模型,99个案例
TPC-H:面向商品零售业的决策支持系统测试基准,定义了8张表,22个查询
wget https://github.com/hortonworks/hive-testbench/archive/hive14.zip
unzip hive14.zip
cd hive-testbench-hive14/
./tpcds-build.sh
./tpcds-setup.sh 1000 //生成1000G的hive表数据集
FORMAT=parquet ./tpcds-setup.sh 10 //生成10G的parquet格式的hive表

[root@ip-172-31-16-68 hive-testbench]# ./tpcds-setup.sh 10 /extwarehouse/tpcds
(可左右滑动)

参数说明:

10表示生成的数据量大小GB单位

/extwarehouse/tpcds表数据数据生成的目录,目录不存在自动生成,如果不指定数据目录则默认生成到/tmp/tpcds目录下。

执行完成后,查看hive

image

数据生成已导入。

测试:

cd sample-queries-tpcds/

hive> use tpcds_bin_partitioned_orc_100;

hive>source query1.sql;

查看执行结果。

————————————————
版权声明:本文为CSDN博主「无影风Victorz」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/victorzzzz/article/details/88741767

下载失败,可以通过https://public-repo-1.hortonworks.com/hive-testbench/tpcds/TPCDS_Tools.zip下载

编译失败参考:https://www.jianshu.com/p/6be3e51256f4

image.png

hive-testbench

A testbench for experimenting with Apache Hive at any data scale.

Overview

The hive-testbench is a data generator and set of queries that lets you experiment with Apache Hive at scale. The testbench allows you to experience base Hive performance on large datasets, and gives an easy way to see the impact of Hive tuning parameters and advanced settings.

Prerequisites

You will need:

Install and Setup

All of these steps should be carried out on your Hadoop cluster.

Feedback

If you have questions, comments or problems, visit the Hortonworks Hive forum.

If you have improvements, pull requests are accepted.

上一篇下一篇

猜你喜欢

热点阅读