clickhouse

ClickHouse数据压缩[译文]

2017-11-28  本文已影响548人  JackpGao

原文:https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse
Altinity是国外一家从事ClickHouse咨询、服务的公司,该公司高管由ClickHouse开发者,以及来自Percona的专家组成。目前Altinity的ClickHouse云服务测试版已经上线。

综述

It might not be obvious from the start, but ClickHouse supports different kinds of compressions, namely two LZ4 and ZSTD.

There are evaluations for both of these methods: https://www.percona.com/blog/2016/04/13/evaluating-database-compression-methods-update/
But in short, LZ4 is fast but provides smaller compression ratio comparing to ZSTD. While ZSTD is slower than LZ4, it is often faster and compresses better than a traditional Zlib, so it might be considered as a replacement for Zlib compression.

实际压测

To get some real numbers using ClickHouse, let’s review a table compressed with both methods.
For this, we will take the table lineorder, from the benchmark described in https://www.altinity.com/blog/2017/6/16/clickhouse-in-a-general-analytical-workload-based-on-star-schema-benchmark
The uncompressed datasize for lineorder table with 1000 scale is 680G.

数据对比

And now let’s load this table into ClickHouse. With the default compression (LZ4), we have184G lineorderlz4
And with ZSTD135G lineorderzstd
There we need to mention how to make ClickHouse using ZSTD. For this, we add the following lines into config:

<compression incl="clickhouse_compression">
        <case>
                <method>zstd</method>  
        </case>
</compression>

So the compression ratio for this table

压缩比率对比

Compression Ratio
LZ4 3.7
ZSTD 5.0

What about performance? For this let’s run the following query

SELECT toYear(LO_ORDERDATE) AS yod, sum(LO_REVENUE) FROM lineorder 
GROUP BY yod;

And we will execute this query in “cold” run (no data is cached), and following “hot” run when some data is already cached in OS memory after the first run.

So query results, for LZ4 compression:

LZ4的性能如下:

# Cold run:
7 rows in set. Elapsed: 19.131 sec. Processed 6.00 billion rows, 
36.00 GB (313.63 million rows/s., 1.88 GB/s.)

Hot run:
7 rows in set. Elapsed: 4.531 sec. Processed 6.00 billion rows, 
36.00 GB (1.32 billion rows/s., 7.95 GB/s.)

For ZSTD compression:

ZSTD性能如下:

Cold run:
7 rows in set. Elapsed: 20.990 sec. Processed 6.00 billion rows, 
36.00 GB (285.85 million rows/s., 1.72 GB/s.)

Hot run:
7 rows in set. Elapsed: 7.965 sec. Processed 6.00 billion rows, 
36.00 GB (753.26 million rows/s., 4.52 GB/s.)

While there is practically no difference in cold run times (as the IO time prevail decompression time), in hot runs LZ4 is much faster (as there is much less IO operations, and performance of decompression becomes a major factor).

Conclusion:

结论

ClickHouse proposes two methods of compression: LZ4 and ZSTD, so you can choose what is suitable for your case.
With LZ4 you may get a better execution time with the cost of the worse compression and data taking more space on the storage.

译者注

上一篇 下一篇

猜你喜欢

热点阅读