kibana启动失败all shards failed,无法连接
现象:
本地集群启动3个Node,es都启动正常,search-head也都能连接上,但是有警告日志:
2019-12-31T08:54:46,320][WARN ][o.e.c.r.a.DiskThresholdMonitor] [node1] high disk watermark [90%] exceeded on [wYsY5n5QRduREAAZvA5Biw][vipnode2][/node-2/data/nodes/0] free: 17.8gb[7.6%], shards will be relocated away from this node
然后启动kibana,启动报一堆的红色日志,控制台打不开,关键错误日志:
elasticsearch - SearchPhaseExecutionException[Failed to execute phase [query], all shards failed]
{ statusCode: 503,
payload:
{ statusCode: 503,
error: 'Service Unavailable',
message: 'Request Timeout after 30000ms' },
headers: {} },
reformat: [Function],
[Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/esUnavailable' }
log [00:44:10.647] [info][plugins-system] Stopping all plugins.
log [00:44:10.648] [info][plugins][translations] Stopping plugin
解决:
参考了https://www.jianshu.com/p/443cf6ce87d5排查问题ap,https://www.elastic.co/guide/en/elasticsearch/reference/5.5/cluster-allocation-explain.htmli,
最后确定了关键的参数cluster.routing.allocation.disk.threshold_enabled
(es可以根据磁盘使用情况来决定是否继续分配shard。默认设置是开启的).
为了在本地单机上测试,我自己电脑磁盘空间剩下没多少了,修改elasticsearch.yml,设置cluster.routing.allocation.disk.threshold_enabled: false。
然后删除了data,logs的文件,重启es,kibana,一切都正常,从red到green.
总结:
1.系统启动的warm日志也很重要,关注每一个细节,能快速定位问题。
2.这次问题的几个关键参数,具体含义可以去官网查:cluster.routing.allocation.disk.threshold_enabled,cluster.routing.allocation.disk.watermark.low,cluster.routing.allocation.disk.watermark.high