程序员

gunicorn不停服重启更新服务

2021-01-21  本文已影响0人  梟遙書眚

gunicorn不停服重启更新服务

每次项目更新最头疼的就是重启服务的那一段空白期,如果没有负载均衡或者负载均衡没有做好,那么在重启服务的这段时间中都会造成短暂的“宕机”,给用户的体验很不好,gunicorn使用prefork master-worker模型,可以管理自己fork的进程,这就可以让你动态的添加减少worker进程。这次就直接讲gunicorn如何不停机更新服务,这里是官方文档 https://docs.gunicorn.org/en/stable/signals.html

信号

gunicorn是通过信号处理来达到对进程管理的目的,先看一下他接收的几种信号

上面的信号这次只说三个HUP,USR2,TERM

HUP

文档中的意思使用HUP可以达到重启的效果,测试的日志是这样的

[2021-01-21 17:25:14 +0800] [20388] [INFO] Handling signal: hup
[2021-01-21 17:25:14 +0800] [20388] [INFO] Hang up: Master
[2021-01-21 17:25:14 +0800] [29249] [INFO] Booting worker with pid: 29249
[2021-01-21 17:25:14 +0800] [29248] [INFO] Booting worker with pid: 29248
[2021-01-21 17:25:14 +0800] [29250] [INFO] Booting worker with pid: 29250
[2021-01-21 17:25:14 +0800] [28643] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28643] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28640] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28640] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28642] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28642] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28643] [INFO] Finished server process [28643]
[2021-01-21 17:25:14 +0800] [28643] [INFO] Worker exiting (pid: 28643)
[2021-01-21 17:25:14 +0800] [28640] [INFO] Finished server process [28640]
[2021-01-21 17:25:14 +0800] [28640] [INFO] Worker exiting (pid: 28640)
[2021-01-21 17:25:14 +0800] [28642] [INFO] Finished server process [28642]
[2021-01-21 17:25:14 +0800] [28642] [INFO] Worker exiting (pid: 28642)
[2021-01-21 17:25:15 +0800] [29248] [INFO] Started server process [29248]
[2021-01-21 17:25:15 +0800] [29248] [INFO] Waiting for application startup.
[2021-01-21 17:25:15 +0800] [29248] [INFO] ASGI 'lifespan' protocol appears unsupported.
[2021-01-21 17:25:15 +0800] [29248] [INFO] Application startup complete.
[2021-01-21 17:25:15 +0800] [29249] [INFO] Started server process [29249]
[2021-01-21 17:25:15 +0800] [29249] [INFO] Waiting for application startup.
[2021-01-21 17:25:15 +0800] [29249] [INFO] ASGI 'lifespan' protocol appears unsupported.
[2021-01-21 17:25:15 +0800] [29249] [INFO] Application startup complete.</pre>

通过日志可以看到他是先停止了旧进程然后再启动了新的进程,但是从gunicorn源码中看是先启动了进程然后通过进程数和配置的进程数对比来kill掉老的进程:

# 简化后的处理HUP方法
# spawn new workers
for _ in range(self.cfg.workers):
   self.spawn_worker()  # 这里启动了进程
# manage workers
self.manage_workers()  # 这里根据进程启动的时候给的一个age值来kill掉老的进程</pre>
# manage_workers方法
def manage_workers(self):
 """\
 Maintain the number of workers by spawning or killing
 as required.
 """
 if len(self.WORKERS) < self.num_workers:
 self.spawn_workers()
​
 workers = self.WORKERS.items()
 workers = sorted(workers, key=lambda w: w[1].age)
 while len(workers) > self.num_workers:
 (pid, _) = workers.pop(0)
 self.kill_worker(pid, signal.SIGTERM)
​
 active_worker_count = len(workers)
 if self._last_logged_active_worker_count != active_worker_count:
 self._last_logged_active_worker_count = active_worker_count
 self.log.debug("{0} workers".format(active_worker_count),
 extra={"metric": "gunicorn.workers",
 "value": active_worker_count,
 "mtype": "gauge"})

测试了一下也确实会有问题(我用的django3.1服务用的uvicorn,因为uvicorn没有进程管理的功能所以用gunicorn来启动uvicorn,uvicorn官方文档也是这么建议的),在重启的瞬间发起请求会有异常抛出

USR2

It executes a new binary whose PID file is postfixed with .2 (e.g. /var/run/gunicorn.pid.2), which in turn starts a new master process and new worker processes
大概的意思发送USR2信号后会启动新的主进程和工作进程也就是新的master进程和worker进程

先看一下当前的进程(为了方便观看我删除了ps命令结果的最后一列信息):

[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root      9146     1  0 17:30 pts/7    00:00:00 gunicorn
root      9168  9146  1 17:30 pts/7    00:00:00 gunicorn
root      9169  9146  1 17:30 pts/7    00:00:00 gunicorn
root      9170  9146  1 17:30 pts/7    00:00:00 gunicorn

可以看到当前的master进程为9146,工作进程分别为9168,9169,9170

发送信号后的变化为:

[root@Luckybamboo report-web]# kill -USR2 9146
[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root      9146     1  0 17:30 pts/7    00:00:00 gunicorn 
root      9168  9146  1 17:30 pts/7    00:00:00 gunicorn 
root      9169  9146  1 17:30 pts/7    00:00:00 gunicorn 
root      9170  9146  1 17:30 pts/7    00:00:00 gunicorn 
root     11562  9146  9 17:32 pts/7    00:00:00 gunicorn 
root     11564 11562 30 17:32 pts/7    00:00:00 gunicorn 
root     11565 11562 64 17:32 pts/7    00:00:00 gunicorn 
root     11566 11562 60 17:32 pts/7    00:00:00 gunicorn 

这时候可以看到启动了新的master进程11562,新的工作进程11564,11565,11566

这个时候可以通过TERM信号来停止老的进程9146只保留新的进程就可以了

[root@Luckybamboo report-web]# kill -TERM 9146
[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root     11562     1  0 17:32 pts/7    00:00:00 gunicorn 
root     11564 11562  2 17:32 pts/7    00:00:00 gunicorn 
root     11565 11562  2 17:32 pts/7    00:00:00 gunicorn 
root     11566 11562  2 17:32 pts/7    00:00:00 gunicorn

可以看到这时候就只有新的进程了。我期望的是在新的进程启动之后旧的进程将不再处理新的请求,测试了一下确实是这样,但是因为测试的比较少而且源码中没有看到这个逻辑,而且这个信号是用来在线升级gunicorn的,所以最好还是把旧的进程当成正常的进程来看待处理,文档中也说如果不用新的进程可以kill掉新的进程,也可以接着对旧的进程进行各种信号处理,希望有人能补充我这种期望该怎么操作

上一篇下一篇

猜你喜欢

热点阅读