cat health（查看健康）

2018-03-29 本文已影响11人菜花_Q

官网地址：cat health | Elasticsearch Reference [5.5] | Elastic

cat health

health从/_cluster/health获取到一个简洁的、单行表示同类信息。

GET /_cat/health?v

返回结果：

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent

1475871424 16:17:04 elasticsearch green 1 1 5 5 0 0 0 0 - 100.0%

它有一个选项ts来禁用时间戳:

GET /_cat/health?v&ts=false

返回结果像这样：

cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent

elasticsearch green 1 1 5 5 0 0 0 0 - 100.0%

该命令的一个常见用法是检查各个节点的健康状况是否一致（pssh命令，用法详见pssh用法）:

% pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health

[1] 20:20:52 [SUCCESS] es3.vm

1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0

[2] 20:20:52 [SUCCESS] es1.vm

1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0

[3] 20:20:52 [SUCCESS] es2.vm

1384309218 18:20:18 foo green 3 3 3 3 0 0 0 0

一个不太明显的用途是随着时间的推移跟踪大型集群的恢复。有了足够的碎片，启动一个集群，甚至在失去一个节点后恢复，可能需要时间(取决于您的网络和磁盘)。跟踪其进度的一种方法是在一个延迟的循环中使用这个命令:

% while true; do curl localhost:9200/_cat/health; sleep 120; done

1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0

1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0

1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0

1384309806 18:30:06 foo green 3 3 1832 916 4 0 0

^C

在这种情况下，我们可以看出恢复大约需要4分钟。

如果这种情况持续几个小时，我们就能看到未分配的碎片陡然下降。

如果这个数字保持不变，我们就会有一个问题。

Why the timestamp?

当集群出现故障时，通常使用health命令。在此期间，对日志文件、警报系统等的活动进行关联是非常重要的。

有两个输出。{HH:MM:SS}输出仅仅是为了快速的人类消费。{epoch}保留了更多的信息，包括日期，如果你的恢复期持续了几天。

cat health（查看健康）

cat health

Why the timestamp?

猜你喜欢

热点阅读