大数据

Impala--查询优化COMPUTE STATS指令

2018-11-24  本文已影响192人  撸码小丑

问题描述

impala在查询一些大表的时候,查询特别慢

优化前

Query: select count(a.sn) from usermodel_inter_total_label a join usermodel_inter_total_info bon a.sn = b.sn where a.label = 'porn' and a.heat > 0.1 and b.platform = 'android' 

Returned 1 row(s) in 36.86s

寻路

按SQL调优的套路来,explain一下发现了一个很隐蔽的warning:

WARNING: The following tables are missing relevant table and/or column statistics.default.usermodel_inter_total_info, default.usermodel_inter_total_label

神奇指令

COMPUTE STATS usermodel_inter_total_info;
COMPUTE STATS usermodel_inter_total_label;

优化后

Query: select count(a.sn) from usermodel_inter_total_label a join usermodel_inter_total_info bon a.sn = b.sn where a.label = 'porn' and a.heat > 0.1 and b.platform = 'android' Returned 1 row(s) in 3.15s

Cool!10倍的提升,相对Hive20倍的提升,和单表查询一样的迅速!

分析

「COMPUTE STATS」前
指令:

show table stats usermodel_inter_total_label;

返回:

log_date #Rows #Files Size Bytes Cached Format
2014-12-13 -1 15 1.18GB NOT CACHED TEXT
2014-12-14 -1 3 1.80GB NOT CACHED TEXT
2014-12-15 -1 4 2.96GB NOT CACHED TEXT
Total -1 22 5.93GB 0B -

指令:

show column stats usermodel_inter_total_label;

返回:

Column Type #Distinct Values #Nulls Max Size Avg Size
sn STRING -1 -1 -1 -1
label STRING -1 -1 -1 -1
heat DOUBLE -1 -1 -1 -1
active_record STRING -1 -1 -1 -1
log_date STRING 3 0 -1 -1

「COMPUTE STATS」后
指令:

show table stats usermodel_inter_total_label;

返回:

log_date #Rows #Files Size Bytes Cached Format
2014-12-13 9498438 2 469.76MB NOT CACHED TEXT
2014-12-14 17891595 1 893.44MB NOT CACHED TEXT
2014-12-15 27885473 2 1.37GB NOT CACHED TEXT
Total 55275506 5 2.71GB 0B -

指令:

show column stats usermodel_inter_total_label;

返回:

Column Type #Distinct Values #Nulls Max Size Avg Size
sn STRING 13984716 -1 30 24.0039005279541
label STRING 36 -1 13 4.26140022277832
heat DOUBLE 382126 -1 8 8
active_record STRING 7 -1 3 1.667400002479553
log_date STRING 3 0 -1 -1

看来「COMPUTE STATS」的作用就是得出Impala原先不知道的值(-1)。

上一篇 下一篇

猜你喜欢

热点阅读