clickhouse 20.x实践小计

2021-11-25  本文已影响0人  架构师老狼

1 列式存储与行式存储对比

2 数据类型-典型的cpp

3表引擎

3.1 TinyLog
3.2 Memory
3.3 MergeTree:ClickHouse 中最强大的表引擎当属 MergeTree(合并树)引擎
create table t_order_mt(
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine =MergeTree
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id);
1) partition by 分区(可选)
2) primary key 主键(可选)
3) order by(必选)
4)二级索引
create table t_order_mt2(
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime,
INDEX a total_amount TYPE minmax GRANULARITY 5
) engine =MergeTree
partition by toYYYYMMDD(create_time) 
primary key (id)
order by (id, sku_id); --GRANULARITY N 是设定二级索引对于一级索引粒度的粒度
5)TTL
create table t_order_mt3(
id UInt32,
sku_id String,
total_amount Decimal(16,2) TTL create_time+interval 10 SECOND,
create_time Datetime
) engine =MergeTree
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id, sku_id);
alter table t_order_mt3 MODIFY TTL create_time + INTERVAL 10 SECOND;
-- 涉及判断的字段必须是 Date 或者 Datetime 类型,推荐使用分区的日期字段。
3.4 ReplacingMergeTree - 去重
1) 去重时机
2) 去重范围
create table t_order_rmt(
id UInt32,
sku_id String,
total_amount Decimal(16,2) ,
create_time Datetime
) engine =ReplacingMergeTree(create_time)
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id, sku_id);
3)ReplacingMergeTree()
3.5 SummingMergeTree - 预聚合
4 sql
1)Update 和 Delete
5 分片与副本集群

ClickHouse 的集群是表级别的,实际企业中,大部分做了高可用,但是没有用分片,避免降低查询性能以及操作集群的复杂性。

5.1 高可用集群
clickhouse高可用
<yandex>
<zookeeper-servers>
 <node index="1">
     <host>hadoop01</host>
     <port>2181</port>
 </node>
 <node index="2">
     <host>hadoop02</host>
     <port>2181</port>
 </node>
 <node index="3">
     <host>hadoop03</host>
     <port>2181</port>
 </node>
</zookeeper-servers>
</yandex>
create table t_order_rep2 (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine =ReplicatedMergeTree('/clickhouse/table/01/t_order_rep','rep_01')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);
5.2 带有路由高可用集群
<?xml version="1.0"?>
<yandex>
<remote_servers>
<ck_cluster> <!-- 集群名称--> 
 <shard> <!--集群的第一个分片--> 
 <internal_replication>true</internal_replication>
   <replica> <!--该分片的第一个副本-->
     <host>hadoop102</host>
     <port>9000</port>
   </replica>
   <replica> <!--该分片的第二个副本-->
     <host>hadoop103</host>
     <port>9000</port>
   </replica>
</shard>

 <shard> <!--集群的第二个分片-->
     <internal_replication>true</internal_replication>
     <replica> <!--该分片的第一个副本-->
         <host>hadoop104</host>
         <port>9000</port>
     </replica>
 </shard>
</ck_cluster>
</remote_servers>

<zookeeper-servers>
    <node index="1">
        <host>hadoop102</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>hadoop103</host>
         <port>2181</port>
    </node>
    <node index="3">
         <host>hadoop104</host>
         <port>2181</port>
    </node>
</zookeeper-servers>

<macros>
    <shard>01</shard> <!--不同机器放的分片数不一样-->
    <replica>rep_1_1</replica> <!--不同机器放的副本数不一样-->
</macros>
</yandex>
create table st_order_mt on cluster gmall_cluster (
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine 
= ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt','{replica}')
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id);
create table st_order_mt_all2 on cluster ck_cluster
(
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2),
 create_time Datetime
) engine = Distributed(ck_cluster,default, st_order_mt,hiveHash(sku_id));
6 OLAP技术演进与选型
主流OLAP引擎对比
6.1 ClickHouse优点
6.2 ClickHouse缺点
6.3 ClickHouse的应用场景:
上一篇 下一篇

猜你喜欢

热点阅读