Druid 查询优化

2019-07-25 本文已影响0人 zfylin

缓存

当集群机器较少时(官方文档推荐<20台)，在broker节点配置缓存，可以适当增加缓存大小，或者从local替换成memcached。

当集群机器较多时，应当只配置historical节点缓存，减轻broker节点压力。

# Query cache
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.cache.type=local
druid.cache.sizeInBytes=2000000000

冷热数据分层

查询为CPU密集型，通过合理的数据分配策略，使数据尽量分散在不同的历史节点。

根据查询频率的高低分为冷热数据，比如经常查询60天内的数据，则60天内数据则为hot数据，60天之后的数据则为冷数据。

将不同的History Node划分为不同的tier，让Historical Node加载不同时间范围的数据。加载hot数据的Historical Node 数据量小，从而加速查询。下图设置了两个Load Rules，loadByPeriod部分表示60天内的数据在hot保存两份副本，loadForever部分表示所有数据在_default_tier保存两份副本。

选区_434.png

选区_435.png

调整History Node配置参数

druid.server.priority

默认为0, broker节点中，druid.broker.select.tier的配置默认为highestPriority，表示如果有重复的数据样本，优先查询优先级高的历史节点。

将hot分片的机器druid.server.priority设置为100，_default_tier分片的默认0，则查询热数据时，不会查询_default_tier，只会去查询hot，而hot集群机器比较多，查询速度快。

druid.processing.numThreads

可以根据自己机器的情况进行调节，如果机器只用于历史节点，可以设置为（核心数-1），32核机器可以设= 置为31，或者默认值就是31。

druid.segmentCache.locations

可以将数据存储分散在不同的磁盘上，可以减轻磁盘的读写压力。

#集群分片，不写默认_default_tier
druid.server.tier=hot  
#查询优先级，不写默认0，_default_tier分片的两个节点为0，hot节点的都改为100。这样，热数据只会查hot节点的机器。
druid.server.priority=100
#processing.buff，可以注释掉，默认是1G
#processing.numThreads:默认是繁忙时core-1做process，剩余的1个进程做与zk通信和拉取seg等。
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=31
#segment路径和各路径的空间，例如：在disk1、2、3各配了200G
druid.segmentCache.locations=[{"path": "/disk1/druid/persistent", "maxSize": 200000000000},{"path": "/disk2/druid/persistent", "maxSize": 200000000000},{"path": "/disk3/druid/persistent", "maxSize": 200000000000}]
#segment总空间（字节）=上述空间之和
druid.server.maxSize=600000000000

历史数据Roll Up

进行实时和批量indexing的时候，一般配置的按小时的粒度进行roll up，最后存储的就是按小时聚合的数据。

将60天之后的数据提交batch indexing任务按天roll up。

Druid 查询优化

缓存

冷热数据分层

历史数据Roll Up

猜你喜欢

热点阅读