Kylin系列3-快速入门
2022-01-21 本文已影响0人
只是甲
一. 数据准备
以Oracle scott下的emp表dept表为例
参考博客:
MySQL版本scott
根据MySQL版本的数据,同步到hive。
二. 创建项目
Kylin的架构是 :
项目->Model->Cube
-
重新加载元数据
image.png
-
创建project
image.png
三. 选择数据源
-
Model->Data Source-> Load Table From Tree
image.png
-
选择我们需要的emp和dept表
image.png
四. 创建Model
-
选择 Model->Models->New Model
image.png
-
输入mode的名称
image.png
-
选择事实表
image.png
-
选择维度表
选择Add Lookup Table
image.png
![](https://img.haomeiwen.com/i2638478/9a26ce35a9762e99.png)
-
选择维度
image.png
-
选择度量
image.png
-
没有分区,直接略过
image.png
五. 创建cube
-
选择 Model->New Cube->
image.png
-
选择Model的名称,并填写Cube名称
image.png
-
选择维度
需要注意的是,维度表的维度默认是 Dervied,最好改为Normal
image.png
![](https://img.haomeiwen.com/i2638478/91a030684af902bd.png)
-
选择度量
默认有一个count(*),此时我们再加一个sum(sal)
image.png
![](https://img.haomeiwen.com/i2638478/89a4a80ffcb0835a.png)
-
HBase刷新策略,选择默认
image.png
-
高级配置先都略过
image.png
![](https://img.haomeiwen.com/i2638478/422a3e0fcd81bcd4.png)
-
查看最终的描述
image.png
![](https://img.haomeiwen.com/i2638478/c30084e916108cbd.png)
-
将新创建的cube进行build
image.png
![](https://img.haomeiwen.com/i2638478/a94199cae197ee22.png)
-
监控build的进度
image.png
![](https://img.haomeiwen.com/i2638478/f16c04c22ee3ad21.png)
-
等待几分钟,知道build运行完成
image.png
六. Kylin对比Hive
sql脚本
select dname, sum(sal)
from emp e
join dept d
on e.deptno = d.deptno
group by dname;
6.1 Hive端执行
hive>
> select dname, sum(sal)
> from emp e
> join dept d
> on e.deptno = d.deptno
> group by dname;
Query ID = root_20211231110115_de250e84-2a0a-4c12-bca0-12c17c2de113
Total jobs = 1
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-12-31 11:01:22 Starting to launch local task to process map join; maximum memory = 1908932608
2021-12-31 11:01:24 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/5f296d47-5178-464c-b2e4-28fdba9439ec/hive_2021-12-31_11-01-15_353_7571014424349893653-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable
2021-12-31 11:01:24 Uploaded 1 File to: file:/tmp/root/5f296d47-5178-464c-b2e4-28fdba9439ec/hive_2021-12-31_11-01-15_353_7571014424349893653-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable (373 bytes)
2021-12-31 11:01:24 End of local task; Time Taken: 1.475 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
21/12/31 11:01:26 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1640913930054_0012, Tracking URL = http://hp3:8088/proxy/application_1640913930054_0012/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1640913930054_0012
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
2021-12-31 11:01:33,600 Stage-2 map = 0%, reduce = 0%
2021-12-31 11:01:40,867 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 7.36 sec
2021-12-31 11:01:46,037 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 9.91 sec
MapReduce Total cumulative CPU time: 9 seconds 910 msec
Ended Job = job_1640913930054_0012
MapReduce Jobs Launched:
Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 9.91 sec HDFS Read: 21315 HDFS Write: 174 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 910 msec
OK
accounting 8750.00
research 10875.00
sales 9400.00
Time taken: 31.859 seconds, Fetched: 3 row(s)
hive>
6.2 Kylin
第一次查询 1.79秒,速度比Hive的10秒快了很多
![](https://img.haomeiwen.com/i2638478/862b4299a34c90c1.png)
第二次就很快
![](https://img.haomeiwen.com/i2638478/08ff373ff9a77325.png)