clickhouse-物化视图
2022-06-09 本文已影响0人
越狱的灵感
参考文档
https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#
介绍
物化视图可以理解为一个预聚合触发器,数据在控制好触发的汇聚条件,几乎是实时的
物化视图会存储一份计算好的聚合数据,是一种空间换时间的绝妙方法,对集群的稳定性和很重要。
1.png
使用
物化视图的建立有两种方法
1,使用TO关键字(推荐使用),可以控制TTL,不能使用POPULATE
例:
--创建本地存储表,物化视图本地表和普通的建表没啥区别,inner_物化视图本地表前缀
create table IF NOT EXISTS demo_db.inner_demo_table on CLUSTER demo0201oddp_v1 (
`day` date default toDate(now())
,`its` datetime default now()
,`channel` String
,`appName` String
,`stream` String
,`requestUrl` String
) ENGINE = ReplicatedReplacingMergeTree (
'/clickhouse/tables/{cluster_demo0201oddp_v1}/{shard_demo0201oddp_v1}/inner_demo_table'
,'{replica_demo0201oddp_v1}'
) partition by toYYYYMMDD (its)
order by (channel,appName,stream) TTL its + toIntervalDay(20) SETTINGS index_granularity = 8192,storage_policy = 'demo_default_policy'
--创建物化视图,mview_ 物化视图前缀,to 后面接本地表,as 后面添加计算逻辑
create MATERIALIZED view demo_db.mview_demo_table on CLUSTER demo0201oddp_v1 to demo_db.inner_demo_table as
select channel
,appName
,stream
,requestUrl
from demo_db.inner_demo_table
group by channel
,appName
,stream
,requestUrl
2,使用默认表
此方案建议是数据量小的表,因为无法控制TTL,后期数据运维不方便。默认存储表在clickhouse中是.inner_id.uuid值作为表名
例:
--物化视图原始数据表
create table IF NOT EXISTS demo_db.inner_demo_table ON CLUSTER demo0201oddp_v1 (
`day` date default toDate(now()),
`its` datetime default now(),
`_cnt` UInt8 default 1,
`channel` String,
`eventTime` UInt32,
`clientId` UInt32,
`scheduleStr` String,
`ispCode` UInt32,
`hitFlag` Int16,
`reqCnt` UInt32,
`fileSize` UInt64,
`clientIp` String,
`guid` String,
`status` String
) ENGINE = ReplicatedMergeTree (
'/clickhouse/tables/{cluster1}/{shard}/inner_demo_table',
'{replica}'
) partition by toYYYYMMDD (toDate(eventTime))
order by
its TTL its + INTERVAL 1 hour SETTINGS index_granularity = 8192
--使用默认存储表
CREATE MATERIALIZED VIEW IF NOT EXISTS demo_db.mview_inner_demo_table_uv ON CLUSTER demo0201oddp_v1 ENGINE = ReplicatedAggregatingMergeTree(
'/clickhouse/tables/{cluster1}/{shard}/mview_inner_demo_table_uv',
'{replica}'
) PARTITION BY t_day
ORDER BY
(channel) SETTINGS index_granularity = 8192,
use_minimalistic_part_header_in_zookeeper = 1 AS
SELECT
sumState(1) as uv,
day as t_day,
toStartOfHour(toDateTime(eventTime)) as __time,
channel,
clientIp
FROM
demo_db.inner_demo_table
GROUP BY
channel,
clientIp,
day,
__time;
总结
1,物化视图是一种空间换时间的预聚合方式,聚合后的数据将存储在新表中,一般于SummingMergeTree,AggregatingMergeTree等聚合引擎一起使用。
2,物化视图因为是写入触发器,所以as select只对每批次的insert data有效果,所以即使是where条件也是对这批写入数据起效果(https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized)
3,物化视图只有在原表insert的时候才会触发
4,POPULATE关键字,不建议使用,会把原始表中的已存在数据全部物化一遍,老数据的同步,建议直接insert到mv中
5,多表join生成物化视图,左表插入数据时才更新
6,源表数据的改变不会影响物化视图,如update, delete, drop partition