ES集群存储扩容及在线调整配置

2021-11-16  本文已影响0人  行者深蓝

环境信息

ELK日志系统,运行在 K8S 集群中 ,存储使用StorageClass 动态存储卷

故障现象

  1. Kibana 页面无法查询最新日志,但可以查看历史日志记录

故障排查参考

本文中 ELK 部署在集群default命名空间,如果部署在自定义命名空间,执行命令请替换default名称

[2021-11-16T09:55:31,753][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"index [uk8s-vidxqjoo-kube-system-2021.11.16] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2021-11-16T09:55:31,753][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}
for pod in es-master-0 es-master-1 es-master-2
do
  kubectl exec -t -i $pod -- sh -c 'df -h| grep /usr/share/elasticsearch/data' -n default
done

可以看到空间磁盘使用率高达96%

/dev/vdb         20G   19G  933M  96% /usr/share/elasticsearch/data
/dev/vdb         20G   19G  939M  96% /usr/share/elasticsearch/data
/dev/vdc         20G   19G  933M  96% /usr/share/elasticsearch/data
ES_CLUSTER_IP=`kubectl get svc es-svc | awk 'NR>1 {print $3}'`
curl http://${ES_CLUSTER_IP}:9200/_all/_settings?pretty

可以看到返回信息中包含"read_only_allow_delete": "true" 从这里可以定位故障原因,虽然磁盘没有写满,但是触发了ES的保护机制:

参考处理方式

1. ES PVC扩容

日志ELK默认部署在集群default命名空间,如果部署在自定义命名空间,执行命令请替换default名称

扩容后参考检查步骤

#!/bin/sh
ES_CLUSTER_IP=`kubectl get svc multi-master | awk 'NR>1 {print $3}'`
curl -H "Content-Type: application/json" -XPUT http://${ES_CLUSTER_IP}:9200/_all/_settings -d '{ "index.blocks.read_only_allow_delete": false }'
#!/bin/sh
ES_CLUSTER_IP=`kubectl get svc multi-master | awk 'NR>1 {print $3}'`
curl http://${ES_CLUSTER_IP}:9200/_cat/allocation?pretty
curl http://${ES_CLUSTER_IP}:9200/_cat/health
curl http://${ES_CLUSTER_IP}:9200/_all/_settings | jq

2. 调整ES配置

如果目前 ES PVC 容量非常大,按照ES默认配置 90% 存储依然剩余大量空余空间,可以调大ES参数阈值, 解除索引只读模式 , 将ES集群恢复至正常状态

#!/bin/sh
ES_CLUSTER_IP=`kubectl get svc multi-master | awk 'NR>1 {print $3}'`

curl -H "Content-Type: application/json" -XPUT http://${ES_CLUSTER_IP}:9200/_cluster/settings -d '{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.info.update.interval": "1m"
  }
}'
curl -H "Content-Type: application/json" -XPUT http://${ES_CLUSTER_IP}:9200/_all/_settings -d '{
  "index.blocks.read_only_allow_delete": false
}'

ES相关参数说明:

上一篇下一篇

猜你喜欢

热点阅读