配置中心健康检查配置不合理导致的全局事故

2019-03-29  本文已影响0人  Shaman

前段时间遇到一场事故,配置中心服务依赖的 git 数据源不可访问,K8s deployment 里配置的健康检查超时时间较短,(如果超时时间设置为10s, 是不会触发这次故障的)导致配置中心服务健康检查挂掉,网关默认强依赖配置中心服务,所以网关健康检查接口也不通过,所以在负载均衡看来,网关也不可用,导致整体服务中断。

为了实现服务高可用,我们会做以下2 点优化:

  1. 去除网关对于配置中心的强依赖
  2. 去除配置中心对 git 服务的强依赖

disable config client health indicator

https://github.com/spring-cloud/spring-cloud-config/issues/435

The Config Client supplies a Spring Boot Health Indicator that attempts to load configuration from Config Server. The health indicator can be disabled by setting health.config.enabled=false. The response is also cached for performance reasons. The default cache time to live is 5 minutes. To change that value set the health.config.time-to-live property (in milliseconds).

management.health.hystrix.enabled: false

health.config.enabled: false

上面是spring boot 的配置方法 (https://docs.spring.io/spring-boot/docs/current/reference/html/common-application-properties.html)
下面是spring cloud

反思

上一篇 下一篇

猜你喜欢

热点阅读