SpringCloud微服务如何优雅停机及源码分析

2020-01-14  本文已影响0人  幸福进化_琢玉

[SpringCloud微服务如何优雅停机及源码分析]

( 原文链接:https://www.cnblogs.com/trust-freedom/p/10744683.html)

版本:
SpringBoot 1.5.4.RELEASE
SpringCloud Dalston.RELEASE

本文主要讨论的是微服务注册到Eureka注册中心,并使用Zuul网关负载访问的情况,如何停机可以使用户无感知。

方式一:kill -9 java进程id【不建议】

kill -9 属于强杀进程,首先微服务正在执行的任务被强制中断了;其次,没有通过Eureka注册中心服务下线,Zuul网关作为Eureka Client仍保存这个服务的路由信息,会继续调用服务,Http请求返回500,后台异常是Connection refuse连接拒绝

这种情况默认最长需要等待:

90s(微服务在Eureka Server上租约到期)

30s(Eureka Server服务列表刷新到只读缓存ReadOnlyMap的时间,Eureka Client默认读此缓存)

30s(Zuul作为Eureka Client默认每30秒拉取一次服务列表)

30s(Ribbon默认动态刷新其ServerList的时间间隔)

= 180s,即 3分钟

总结:

此种方式既会导致正在执行中的任务无法执行完,又会导致服务没有从Eureka Server摘除,并给Eureka Client时间刷新到服务列表,导致了通过Zuul仍然调用已停掉服务报500错误的情况,不推荐。

方式二:kill -15 java进程id 或 直接使用/shutdown 端点【不建议】

kill 与/shutdown 的含义

首先,kill等于kill -15,根据man kill的描述信息

The command kill sends the specified signal to the specified process or process group. If no signal is specified, the TERM signal is sent.

即kill没有执行信号等同于TERM(终止,termination)

kill -l查看信号编号与信号之间的关系,kill -15就是 SIGTERM,TERM信号

image

给JVM进程发送TERM终止信号时,会调用其注册的 Shutdown Hook,当SpringBoot微服务启动时也注册了 Shutdown Hook

而直接调用/shutdown端点本质和使用 Shutdown Hook是一样的,所以无论是使用killkill -15,还是直接使用/shutdown端点,都会调用到JVM注册的Shutdown Hook

注意:

启用 /shutdown端点,需要如下配置

endpoints.shutdown.enabled = true
endpoints.shutdown.sensitive = false

所有问题都导向了 Shutdown Hook会执行什么??

Spring注册的Shutdown Hook

通过查询项目组使用Runtime.getRuntime().addShutdownHook(Thread shutdownHook)的地方,发现ribbon注册了一些Shutdown Hook,但这不是我们这次关注的,我们关注的是Spring的应用上下文抽象类AbstractApplicationContext注册了针对整个Spring容器的Shutdown Hook,在执行Shutdown Hook时的逻辑在 AbstractApplicationContext#doClose()

//## org.springframework.context.support.AbstractApplicationContext#registerShutdownHook 
/**
 * Register a shutdown hook with the JVM runtime, closing this context
 * on JVM shutdown unless it has already been closed at that time.
 * <p>Delegates to {@code doClose()} for the actual closing procedure.
 * @see Runtime#addShutdownHook
 * @see #close()
 * @see #doClose()
*/
@Override
public void registerShutdownHook() {
    if (this.shutdownHook == null) {
        // No shutdown hook registered yet.
        // 注册shutdownHook,线程真正调用的是 doClose()
        this.shutdownHook = new Thread() {
            @Override
            public void run() {
                synchronized (startupShutdownMonitor) {
                    doClose();
                }
            }
        };
        Runtime.getRuntime().addShutdownHook(this.shutdownHook);
    }
}

//## org.springframework.context.support.AbstractApplicationContext#doClose 
/**
 * Actually performs context closing: publishes a ContextClosedEvent and
 * destroys the singletons in the bean factory of this application context.
 * <p>Called by both {@code close()} and a JVM shutdown hook, if any.
 * @see org.springframework.context.event.ContextClosedEvent
 * @see #destroyBeans()
 * @see #close()
 * @see #registerShutdownHook()
*/
protected void doClose() {
    if (this.active.get() && this.closed.compareAndSet(false, true)) {
        if (logger.isInfoEnabled()) {
            logger.info("Closing " + this);
        }

        // 注销注册的MBean
        LiveBeansView.unregisterApplicationContext(this);

        try {
            // Publish shutdown event.
            // 发送ContextClosedEvent事件,会有对应此事件的Listener处理相应的逻辑
            publishEvent(new ContextClosedEvent(this));
        }
        catch (Throwable ex) {
            logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
        }

        // Stop all Lifecycle beans, to avoid delays during individual destruction.
        // 调用所有 Lifecycle bean 的 stop() 方法
        try {
            getLifecycleProcessor().onClose();
        }
        catch (Throwable ex) {
            logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
        }

        // Destroy all cached singletons in the context's BeanFactory.
        // 销毁所有单实例bean
        destroyBeans();

        // Close the state of this context itself.
        closeBeanFactory();

        // Let subclasses do some final clean-up if they wish...
        // 调用子类的 onClose() 方法,比如 EmbeddedWebApplicationContext#onClose()
        onClose();

        this.active.set(false);
    }
}

AbstractApplicationContext#doClose() 的关键点在于

而ContextClosedEvent事件的Listener有很多,实现了Lifecycle生命周期接口的bean也很多,但其中我们只关心一个,即 EurekaAutoServiceRegistration ,它即监听了ContextClosedEvent事件,也实现了Lifecycle接口

EurekaAutoServiceRegistration的stop()事件

//## org.springframework.cloud.netflix.eureka.serviceregistry.EurekaAutoServiceRegistration
public class EurekaAutoServiceRegistration implements AutoServiceRegistration, SmartLifecycle, Ordered {

    // lifecycle接口的 stop()
    @Override
    public void stop() {
        this.serviceRegistry.deregister(this.registration);
        this.running.set(false);  // 设置liffecycle的running标示为false
    }

    // ContextClosedEvent事件监听器
    @EventListener(ContextClosedEvent.class)
    public void onApplicationEvent(ContextClosedEvent event) {
        // register in case meta data changed
        stop();
    }

}

如上可以看到,EurekaAutoServiceRegistration中对 ContextClosedEvent事件 和 Lifecycle接口 的实现都调用了stop()方法,虽然都调用了stop()方法,但由于各种对于状态的判断导致不会重复执行,如

下面具体看看EurekaServiceRegistry#deregister()方法

EurekaServiceRegistry#deregister() 注销

//## org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry#deregister
@Override
public void deregister(EurekaRegistration reg) {
    if (reg.getApplicationInfoManager().getInfo() != null) {

        if (log.isInfoEnabled()) {
            log.info("Unregistering application " + reg.getInstanceConfig().getAppname()
                    + " with eureka with status DOWN");
        }

        // 更改实例状态,会立即触发状态复制请求
        reg.getApplicationInfoManager().setInstanceStatus(InstanceInfo.InstanceStatus.DOWN);

        //TODO: on deregister or on context shutdown
        // 关闭EurekaClient
        reg.getEurekaClient().shutdown();
    }
}

主要涉及两步:

[ image

]

总结

使用killkill -15/shutdown端点都会调用Shutdown Hook,触发Eureka Instance实例的注销操作,这一步是没有问题的,优雅下线的第一步就是从Eureka注册中心注销实例,但关键问题是shutdown操作除了注销Eureka实例,还会马上停止服务,而此时无论Eureka Server端,Zuul作为Eureka Client端都存在陈旧的缓存还未刷新,服务列表中仍然有注销下线的服务,通过zuul再次调用报500错误,后台是connection refuse连接拒绝异常,故不建议使用

另外,由于unregister注销操作涉及状态更新DOWN 和 注销下线 两步操作,且是分两个线程执行的,实际注销时,根据两个线程执行完成的先后顺序,最终在Eureka Server上体现的结果不同,但最终效果是相同的,经过一段时间的缓存刷新后,此服务实例不会再被调用

方式三:/pause 端点【可用,但有缺陷】

/pause 端点

首先,启用/pause端点需要如下配置

endpoints.pause.enabled = true endpoints.pause.sensitive = false

PauseEndpointRestartEndPoint的内部类

//## Restart端点
@ConfigurationProperties("endpoints.restart")
@ManagedResource
public class RestartEndpoint extends AbstractEndpoint<Boolean>
        implements ApplicationListener<ApplicationPreparedEvent> {

    // Pause端点
    @ConfigurationProperties("endpoints")
    public class PauseEndpoint extends AbstractEndpoint<Boolean> {

        public PauseEndpoint() {
            super("pause", true, true);
        }

        @Override
        public Boolean invoke() {
            if (isRunning()) {
                pause();
                return true;
            }
            return false;
        }
    }

    // 暂停操作
    @ManagedOperation
    public synchronized void pause() {
        if (this.context != null) {
            this.context.stop();
        }
    }
}

如上可见,/pause端点最终会调用Spring应用上下文的stop()方法

## AbstractApplicationContext#stop()

//## org.springframework.context.support.AbstractApplicationContext#stop
@Override
public void stop() {
    // 1、所有实现Lifecycle生命周期接口 stop()
    getLifecycleProcessor().stop();

    // 2、触发ContextStoppedEvent事件
    publishEvent(new ContextStoppedEvent(this));
}

查看源码,并没有发现有用的ContextStoppedEvent事件监听器,故stop的逻辑都在Lifecycle生命周期接口实现类的stop()

getLifecycleProcessor().stop() 与 方式二中shutdown调用的 getLifecycleProcessor().doClose() 内部逻辑都是一样的,都是调用了DefaultLifecycleProcessor#stopBeans(),进而调用Lifecycle接口实现类的stop(),如下

//## DefaultLifecycleProcessor
@Override
public void stop() {
    stopBeans();
    this.running = false;
}

@Override
public void onClose() {
    stopBeans();
    this.running = false;
}

所以,执行/pause端点 和 shutdown时的其中一部分逻辑是一样的,依赖于EurekaServiceRegistry#deregister() 注销,会依次执行:

总结

/pause端点可以用于让服务从Eureka Server下线,且与shutdown不一样的是,其不会停止整个服务,导致整个服务不可用,只会做从Eureka Server注销的操作,最终在Eureka Server上体现的是 服务下线服务状态为DOWN,且eureka client相关的定时线程也都停止了,不会再被定时线程注册上线,所以可以在sleep一段时间,待服务实例下线被像Zuul这种Eureka Client刷新到,再停止微服务,就可以做到优雅下线( 停止微服务的时候可以使用/shutdown端点 或 直接暴利kill -9

注意:

我实验的当前版本下,使用/pause端点下线服务后,无法使用/resume端点再次上线,即如果发版过程中想重新注册服务,只有重启微服务。且为了从Eureka Server下线服务,将整个Spring容器stop(),也有点“兴师动众”

/resume端点无法让服务再次上线的原因是,虽然此端点会调用AbstractApplicationContext#start() --> EurekaAutoServiceRegistration#start() --> EurekaServiceRegistry#register(),但由于之前已经停止了Eureka Client的所有定时任务线程,比如状态复制 和 心跳线程,重新注册时虽然有maybeInitializeClient(eurekaRegistration)尝试重新启动EurekaClient,但并没有成功(估计是此版本的Bug),导致UP状态并没有发送给Eureka Server

可下线,无法重新上线

方式四:/service-registry 端点【可用,但有坑】

/service-registry 端点

首先,在我使用的版本 /service-registry 端点默认是启用的,但是是sensitive 的,也就是需要认证才能访问

我试图找一个可以单独将/service-registrysensitive置为false的方式,但在当前我用的版本没有找到,/service-registry端点是通过 ServiceRegistryAutoConfiguration自动配置的 ServiceRegistryEndpoint,而 ServiceRegistryEndpoint这个MvcEndpoint的isSensitive()方法写死了返回true,并没有给可配置的地方或者自定义什么实现,然后在ManagementWebSecurityAutoConfiguration这个安全管理自动配置类中,将所有这些sensitive==true的通过Spring Security的 httpSecurity.authorizeRequests().xxx.authenticated()设置为必须认证后才能访问,目前我找到只能通过 management.security.enabled=false 这种将所有端点都关闭认证的方式才可以无认证访问

# 无认证访问 /service-registry 端点 management.security.enabled=false

更新远端实例状态

/service-registry端点的实现类是ServiceRegistryEndpoint,其暴露了两个RequestMapping,分别是GET 和 POST请求的/service-registry,GET请求的用于获取实例本地的status、overriddenStatus,POST请求的用于调用Eureka Server修改当前实例状态

//## org.springframework.cloud.client.serviceregistry.endpoint.ServiceRegistryEndpoint
@ManagedResource(description = "Can be used to display and set the service instance status using the service registry")
@SuppressWarnings("unchecked")
public class ServiceRegistryEndpoint implements MvcEndpoint {
    private final ServiceRegistry serviceRegistry;

    private Registration registration;

    public ServiceRegistryEndpoint(ServiceRegistry<?> serviceRegistry) {
        this.serviceRegistry = serviceRegistry;
    }

    public void setRegistration(Registration registration) {
        this.registration = registration;
    }

    @RequestMapping(path = "instance-status", method = RequestMethod.POST)
    @ResponseBody
    @ManagedOperation
    public ResponseEntity<?> setStatus(@RequestBody String status) {
        Assert.notNull(status, "status may not by null");

        if (this.registration == null) {
            return ResponseEntity.status(HttpStatus.NOT_FOUND).body("no registration found");
        }

        this.serviceRegistry.setStatus(this.registration, status);
        return ResponseEntity.ok().build();
    }

    @RequestMapping(path = "instance-status", method = RequestMethod.GET)
    @ResponseBody
    @ManagedAttribute
    public ResponseEntity getStatus() {
        if (this.registration == null) {
            return ResponseEntity.status(HttpStatus.NOT_FOUND).body("no registration found");
        }

        return ResponseEntity.ok().body(this.serviceRegistry.getStatus(this.registration));
    }

    @Override
    public String getPath() {
        return "/service-registry";
    }

    @Override
    public boolean isSensitive() {
        return true;
    }

    @Override
    public Class<? extends Endpoint<?>> getEndpointType() {
        return null;
    }
}

我们关注的肯定是POST请求的/service-registry,如上可以看到,其调用了 EurekaServiceRegistry.setStatus() 方法更新实例状态

public class EurekaServiceRegistry implements ServiceRegistry<EurekaRegistration> {

    // 更新状态
    @Override
    public void setStatus(EurekaRegistration registration, String status) {
        InstanceInfo info = registration.getApplicationInfoManager().getInfo();

        // 如果更新的status状态为CANCEL_OVERRIDE,调用EurekaClient.cancelOverrideStatus()
        //TODO: howto deal with delete properly?
        if ("CANCEL_OVERRIDE".equalsIgnoreCase(status)) {
            registration.getEurekaClient().cancelOverrideStatus(info);
            return;
        }

        // 调用EurekaClient.setStatus()
        //TODO: howto deal with status types across discovery systems?
        InstanceInfo.InstanceStatus newStatus = InstanceInfo.InstanceStatus.toEnum(status);
        registration.getEurekaClient().setStatus(newStatus, info);
    }

}

EurekaServiceRegistry.setStatus() 方法支持像Eureka Server发送两种请求,分别是通过 EurekaClient.setStatus()EurekaClient.cancelOverrideStatus() 来支持的,下面分别分析:

总结

方式五: 直接调用Eureka Server Rest API【可用,但URL比较复杂】

上面说了这么多,其实这些都是针对Eureka Server Rest API在Eureka客户端上的封装,即通过Eureka Client服务由于引入了actuator,增加了一系列端点,其实一些端点通过调用Eureka Server暴露的Rest API的方式实现Eureka实例服务下线功能

Eureka Rest API包括:

Operation HTTP action Description
Register new application instance POST /eureka/apps/appID Input: JSON/XMLpayload HTTPCode: 204 on success
De-register application instance DELETE /eureka/apps/appID/instanceID HTTP Code: 200 on success
Send application instance heartbeat PUT /eureka/apps/appID/instanceID HTTP Code: * 200 on success * 404 if instanceID doesn’t exist
Query for all instances GET /eureka/apps HTTP Code: 200 on success Output: JSON/XML
Query for all appID instances GET /eureka/apps/appID HTTP Code: 200 on success Output: JSON/XML
Query for a specific appID/instanceID GET /eureka/apps/appID/instanceID HTTP Code: 200 on success Output: JSON/XML
Query for a specific instanceID GET /eureka/instances/instanceID HTTP Code: 200 on success Output: JSON/XML
Take instance out of service PUT /eureka/apps/appID/instanceID/status?value=OUT_OF_SERVICE HTTP Code: * 200 on success * 500 on failure
Move instance back into service (remove override) DELETE /eureka/apps/appID/instanceID/status?value=UP (The value=UP is optional, it is used as a suggestion for the fallback status due to removal of the override) HTTP Code: * 200 on success * 500 on failure
Update metadata PUT /eureka/apps/appID/instanceID/metadata?key=value HTTP Code: * 200 on success * 500 on failure
Query for all instances under a particular vip address GET /eureka/vips/vipAddress * HTTP Code: 200 on success Output: JSON/XML * 404 if the vipAddressdoes not exist.
Query for all instances under a particular secure vip address GET /eureka/svips/svipAddress * HTTP Code: 200 on success Output: JSON/XML * 404 if the svipAddressdoes not exist.

其中大多数非查询类的操作在之前分析Eureka Client的端点时都分析过了,其实调用Eureka Server的Rest API是最直接的,但由于目前多采用一些类似Jenkins的发版部署工具,其中操作均在脚本中执行,Eureka Server API虽好,但URL中都涉及appIDinstanceID,对于制作通用的脚本来说拼接出调用端点的URL有一定难度,且不像调用本地服务端点IP使用localhost 或 127.0.0.1即可,需要指定Eureka Server地址,所以整理略显复杂。不过在比较规范化的公司中,也是不错的选择

参考:

实用技巧:Spring Cloud中,如何优雅下线微服务

Eureka REST operations

eureka 源码分析三--InstanceInfo 中OverriddenStatus的作用

上一篇下一篇

猜你喜欢

热点阅读