Impala--查询优化COMPUTE STATS指令
2018-11-24 本文已影响192人
撸码小丑
问题描述
impala在查询一些大表的时候,查询特别慢
优化前
Query: select count(a.sn) from usermodel_inter_total_label a join usermodel_inter_total_info bon a.sn = b.sn where a.label = 'porn' and a.heat > 0.1 and b.platform = 'android'
Returned 1 row(s) in 36.86s
寻路
按SQL调优的套路来,explain一下发现了一个很隐蔽的warning:
WARNING: The following tables are missing relevant table and/or column statistics.default.usermodel_inter_total_info, default.usermodel_inter_total_label
神奇指令
COMPUTE STATS usermodel_inter_total_info;
COMPUTE STATS usermodel_inter_total_label;
优化后
Query: select count(a.sn) from usermodel_inter_total_label a join usermodel_inter_total_info bon a.sn = b.sn where a.label = 'porn' and a.heat > 0.1 and b.platform = 'android' Returned 1 row(s) in 3.15s
Cool!10倍的提升,相对Hive20倍的提升,和单表查询一样的迅速!
分析
「COMPUTE STATS」前
指令:
show table stats usermodel_inter_total_label;
返回:
log_date | #Rows | #Files | Size | Bytes Cached | Format |
---|---|---|---|---|---|
2014-12-13 | -1 | 15 | 1.18GB | NOT CACHED | TEXT |
2014-12-14 | -1 | 3 | 1.80GB | NOT CACHED | TEXT |
2014-12-15 | -1 | 4 | 2.96GB | NOT CACHED | TEXT |
Total | -1 | 22 | 5.93GB | 0B | - |
指令:
show column stats usermodel_inter_total_label;
返回:
Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
---|---|---|---|---|---|
sn | STRING | -1 | -1 | -1 | -1 |
label | STRING | -1 | -1 | -1 | -1 |
heat | DOUBLE | -1 | -1 | -1 | -1 |
active_record | STRING | -1 | -1 | -1 | -1 |
log_date | STRING | 3 | 0 | -1 | -1 |
「COMPUTE STATS」后
指令:
show table stats usermodel_inter_total_label;
返回:
log_date | #Rows | #Files | Size | Bytes Cached | Format |
---|---|---|---|---|---|
2014-12-13 | 9498438 | 2 | 469.76MB | NOT CACHED | TEXT |
2014-12-14 | 17891595 | 1 | 893.44MB | NOT CACHED | TEXT |
2014-12-15 | 27885473 | 2 | 1.37GB | NOT CACHED | TEXT |
Total | 55275506 | 5 | 2.71GB | 0B | - |
指令:
show column stats usermodel_inter_total_label;
返回:
Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
---|---|---|---|---|---|
sn | STRING | 13984716 | -1 | 30 | 24.0039005279541 |
label | STRING | 36 | -1 | 13 | 4.26140022277832 |
heat | DOUBLE | 382126 | -1 | 8 | 8 |
active_record | STRING | 7 | -1 | 3 | 1.667400002479553 |
log_date | STRING | 3 | 0 | -1 | -1 |
看来「COMPUTE STATS」的作用就是得出Impala原先不知道的值(-1)。