ClickHouse 安装和基本操作

2022-02-14  本文已影响0人  明翼

一 官网安装脚本

用的是centos版本,命令如下:

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client

sudo /etc/init.d/clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you set up a password.

速度还挺快的顺利安装完毕。

更改下数据存储目录,便于测试大数据:

[root@localhost test]# grep test  /etc/clickhouse-server/config.xml
    <path>/home/test/clickhouse</path>
    <tmp_path>/home/test/clickhouse/tmp/</tmp_path>
    <user_files_path>/home/test/clickhouse/user_files/</user_files_path>
            <path>/home/test/clickhouse/access/</path>
    <format_schema_path>/home/test/clickhouse/format_schemas/</format_schema_path>

注意下,test目录需要更改下权限,简单点操作:

chown -R clickhouse:clickhouse /home/test/clickhouse

二 启动服务器

[root@localhost miaohq]# service clickhouse-server start
[root@localhost miaohq]# clickhouse-client
ClickHouse client version 22.1.3.7 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.1.3 revision 54455.

localhost :) show databases;

SHOW DATABASES

Query id: 83b6b5ae-3108-4c2f-970e-9091ab843052

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ system             │
└────────────────────┘

4 rows in set. Elapsed: 0.001 sec. 

localhost :) 

三 基本操作命令

  1. 新建数据库
create database flow;
use flow;
  1. 创建表
create table flow_org(
    flow_id String,
    print_timestamp DateTime,
    create_date Date,
    timestamp UInt32,
    src_mac String,
    dst_mac String,
    ipversion UInt8,
    src_ip String,
    dst_ip String,
    proto UInt8,
    src_port UInt16,
    dst_port UInt16
)engine=MergeTree(create_date,(src_ip,dst_ip,src_port,dst_port,timestamp),8192);

建表必须有一个date字段默认用于分区,还有其他很多标引擎,这里面用的是MergeTree,最常用。
含义:create_date 时间字段默认用于分区,如果不写这个字段,这个MergeTree里面不能有其他值了(见下面),(src_ip,dst_ip,src_port,dst_port,timestamp)构成主键。

如果不必须定义主键,可以用其他字段得到分区字段。

CREATE TABLE stats
(
    `datetime`      DateTime('UTC') DEFAULT now(),
    `worker_id`     UInt64,
    `project_id`    UInt64,
    `platform_type` UInt8,
    `temp`          Array(Int32),
    `fan`           Array(Int32),
    `units`         Int32,
    `power`         Array(Float32),
    `power_total`   Float32
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(datetime)
ORDER BY (worker_id, datetime);
  1. 导入数据
clickhouse-client -m --format_csv_delimiter="," --query="insert into flow.flow_org format CSV" < 1644541860.csv 

  1. 删除数据
    虽然可以实现修改和删除,但是和一般的OLTP数据库不一样,Mutation语句是一种很“重”的操作,而且不支持事务。
    “重”的原因主要是每次修改或者删除都会导致放弃目标数据的原有分区,重建新分区。所以尽量做批量的变更,不要进行频繁小数据的操作。
# 推荐删除方法
alter table flow_org drop partition '202202';
# 不推荐
alter table flow_org delete WHERE id=56;
  1. 查看数据的压缩比:
SELECT 
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts;

 

SELECT 
    table AS `表名`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM system.parts
WHERE table IN ('flow_org')
GROUP BY table

粗略看了下,我们的数据压缩的后的数据占原来数据的大小为17%。

  1. 查看表结构
    和mysql的很多命令都差不多:
#比如查看表结构的定义:
desc test;
  1. 导出数据
clickhouse-client  --query "select * from test where create_time = '2022-02-14 12:00:00'"  --format  CSVWithNames>  /opt/module/data/rs1.csv

四 参考网址

比较基础比较详细
https://www.cnblogs.com/shengyang17/p/10894490.html#_label3_0_0_3

操作函数
https://www.cnblogs.com/pengpenghuhu/p/14546305.html

手册:
https://clickhouse.com/docs/zh/sql-reference/aggregate-functions/reference/topk/

上一篇 下一篇

猜你喜欢

热点阅读