ClickHouse 21.7.3.14-2(九) 语法优化规则

2021-11-08  本文已影响0人  _大叔_

ClickHouse 的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则

上传测试用例

将官方提供的测试集 visits_v1.tar 和 hits_v1.tar下载并上传到虚拟机,解压到 clickhouse 数据路径下。数据下载地址 ,当然官方也可以在线的方式,不需要下载就测试sql语句查询。在线地址

# 解压到 clickhouse 数据路径
sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse
sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse
# 修改所属用户
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets

重启 clickhouse-server,执行查询

clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"

注意:官方的 tar 包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。hits_v1 表有 130 多个字段,880 多万条数据,visits_v1 表有 180 多个字段,160 多万条数据。

COUNT 优化

在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows。以下语句可以使用 explain plan select.. 来看变化

#  默认会查询 count 文件直接拿到数据
select count() from hits_v1
# 会转变成 count()
select count(*) from hits_v1
# 会转变成 count()
select count(1) from hits_v1
# 只要不是写具体的字段不会触发冲洗计算
select count(UserID) from hits_v1

消除子查询重复字段或无用字段

# 实际查询语句
EXPLAIN SYNTAX SELECT
 a.UserID,
 b.VisitID,
 a.URL,
 b.UserID
 FROM
 hits_v1 AS a
 LEFT JOIN (
 SELECT
 UserID,
 UserID as HaHa,
 VisitID
 FROM visits_v1) AS b
 USING (UserID)
 limit 3;

# 返回优化语句
SELECT
 UserID,
 VisitID,
 URL,
 b.UserID
FROM hits_v1 AS a
ALL LEFT JOIN
(
 SELECT
 UserID,
 VisitID
 FROM visits_v1
) AS b USING (UserID)
LIMIT 3

谓词下推

当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。例如下面的查询,having name 变成了 where name,在 group by 之前过滤。

# 查询语句
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID ='8585742290196126178';

# 返回优化语句
SELECT UserID FROM hits_v1 WHERE UserID = '8585742290196126178' GROUP BY UserID
# 查询语句
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM visits_v1 ) WHERE UserID = '8585742290196126178'

# 返回优化后的语句
SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178'
# 查询语句
EXPLAIN SYNTAX SELECT * FROM ( 
  SELECT * FROM ( SELECT UserID FROM visits_v1)  
    UNION ALL 
  SELECT *   FROM ( SELECT UserID FROM visits_v1) 
) WHERE UserID = '8585742290196126178'

# 优化后的
SELECT UserID FROM ( 
  SELECT UserID FROM ( 
    SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178' 
  ) WHERE UserID = '8585742290196126178' 
    UNION ALL 
  SELECT UserID FROM ( 
    SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178'  
  ) WHERE UserID = '8585742290196126178' 
) WHERE UserID = '8585742290196126178'

聚合计算外推

聚合函数内的计算,会外推

# 查询语句
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM visits_v1

# 优化后
SELECT sum(UserID) * 2 FROM visits_v1

聚合函数消除

如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除

# 查询语句
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM visits_v1 GROUP BY UserID

# 优化后
SELECT sum(UserID) * 2, max(VisitID), UserID FROM visits_v1 GROUP BY UserID

删除重复的 order by key

# 查询语句
EXPLAIN SYNTAX SELECT * FROM visits_v1 ORDER BY  UserID ASC, UserID ASC, VisitID ASC,VisitID ASC

# 优化后
select …… FROM visits_v1 ORDER BY UserID ASC,VisitID ASC

删除重复的 limit by key

# 查询语句
EXPLAIN SYNTAX SELECT * FROM visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10

# 返回优化后的语句:
select …… FROM visits_v1 LIMIT 3 BY VisitID LIMIT 10

删除重复的 USING Key

# 查询语句
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a 
LEFT JOIN visits_v1 AS b USING (UserID, UserID)

# 返回优化后的语句:
SELECT  UserID, UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN visits_v1 AS b USING (UserID)

标量替换

如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的 total_disk_usage 字段

# 查询语句
EXPLAIN SYNTAX WITH (  SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT
 (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, 
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10;

# 返回优化后的语句
WITH CAST(0, 'UInt64') AS total_disk_usage SELECT  (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10

三元运算优化

如果开启了 optimize_if_chain_to_multiif 参数,三元运算符会被替换成 multiIf 函数

# 查询语句
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'atguigu') FROM numbers(10) 
settings optimize_if_chain_to_multiif = 1;

# 返回优化后的语句:
SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'atguigu') FROM numbers(10)
上一篇 下一篇

猜你喜欢

热点阅读