Flask服务器部署：使用Docker+Gunicorn+gev

2021-01-17 本文已影响0人 xiaogp

摘要：Flask，gunicorn，nginx，docker，gevent，WSGI

整理一下Flask的部署相关代码，以及各个组件的使用和一些原理

Flask部署.png

为什么需要Gunicorn

在开发时flask的run命令可以直接启动提供web服务，实际上是由Werkzeug提供的WSGI服务器，相当于Flask内置了一个WSGI服务器，只适合在开发调试的时候使用；在生产环境中需要一个更强健，性能更高的WSGI服务器，WSGI服务器也被称为独立的WSGI容器，主流的WSGI容器有Gunicorn和uWSGI

什么是WSGI服务器

Web Server Gateway Interface 的缩写，即 Web 服务器网关接口。Python web开发中，服务端程序分为两个部分

服务器程序：用来接收、整理客户端发送的请求，比如Nginx
应用程序：处理服务器程序传递过来的请求，比如Flask，Django，Tornado

服务器程序和应用程序互相配合才能给用户提供服务，而不同应用程序（不同框架）会有不同的函数、功能。此时就需要一个标准，让服务器程序和应用程序都支持这个标准，这样二者就能很好的配合了，这个标准就是WSGI，是python web开发的标准，类似于协议，是web服务器程序与应用程序解耦的规范，这样服务器程序和应用程序就可以随意组合实现自己的web应用。它是服务器程序和应用程序的一个约定，规定了各自使用的接口和功能，以便二和互相配合。

为什么需要Nginx

Nginx是Web服务器，流行的Web服务器还有Apache，Tengine等，Web服务器主要负责和客户端交换数据，处理请求和响应,像Gunicorn这类WSGI服务器内置了Web服务器，但是内置的Web服务器不够强健，更流行的部署方式是采用一个常规的Web服务器运行在前端，为WSGI服务器提供反向代理。在Gunicorn之后再加一层Nginx有以下好处：

负载均衡：当有多个应用多台机器时需要做负载均衡
静态文件处理：经过配置之后，Nginx可以直接处理静态文件请求而不用经过Python服务器，Gunicorn或者Flask等对静态资源的处理效率不如Nginx，并且Nginx可以对静态文件设置缓存
安全问题：Gunicorn暴露在公网公网十分危险，在Nginx挡在前面会安全不少
抗并发压力：前端多一层Nginx，可以吸收一些瞬时的并发请求作为请求缓冲，让Nginx先保持住连接，然后后端慢慢消化
支持的http协议更广：gunicorn的http解析可能有bug，Nginx处理更好
提供其他额外功能：比如IP过滤等

使用Gunicorn作为容器启动Flask

安装gunicorn，使用pip下载安装

pip install gunicorn

如果以gevent模式运行gunicorn，需要安装gevent，版本20.9.0以上

pip install gevent==20.9.0

编写gunicorn配置文件

root@ubuntu:~/myproject/pira_score_web_application# cat gun.conf.py 
# gun.conf
bind = '0.0.0.0:5000'
workers = 5 
backlog = 2048
worker_class = "gevent"
debug = False
proc_name = 'gunicorn.proc'
pidfile = './gunicorn.pid'
#accesslog = '/var/log/gunicorn/pira_score_web/detail.log'
#access_log_format = '%(h)s %(l)s %(u)s %(t)s'
#loglevel = 'info'

Gunicorn配置详解

-c, --config：启动时引入Gunicorn的配置文件路径
-b, --bind：Gunicorn与指定socket进行绑定.
--backlog：未决连接的最大数量，即等待服务的客户的数量。必须是正整数，一般设定在64~2048的范围内，一般设置为2048，超过这个数字将导致客户端在尝试连接时错误
-w, --workers：用于处理工作进程的数量，为正整数，默认为1。worker推荐的数量为当前的CPU个数*2 + 1
-k, --worker-class：要使用的工作模式，默认为sync，可以使用其他模式比如gevent，tornado，但是需要额外pip安装
--threads：处理请求的工作线程数，使用指定数量的线程运行每个worker。为正整数，默认为1。
--reload：代码更新时将重启工作，默认为False。
-D,--daemon：守护Gunicorn进程后台运行，默认False。
-p, --pid, pidfile：设置pid文件的文件名，如果不设置将不会创建pid文件。
proc_name：设置进程名。

编写gunicorn启动脚本

root@ubuntu:~/myproject/pira_score_web_application# cat run.sh 
#! /bin/bash
cd /home/gp/myproject/pira_score_web_application
gunicorn -c gun.conf.py -D app:app

查看后台gunicorn进程pid

root@ubuntu:~/myproject/pira_score_web_application# cat gunicorn.pid 
30322
root@ubuntu~/myproject/pira_score_web_application# ps -ef|grep `cat gunicorn.pid`
root     30322  1104  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     30325 30322  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     30326 30322  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     30327 30322  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     30328 30322  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     30329 30322  0 10:47 ?        00:00:00 /opt/anaconda3/bin/python /opt/anaconda3/bin/gunicorn -c gun.conf.py -D app:app
root     31611 21931  0 11:01 pts/3    00:00:00 grep --color=auto 30322

可见后台一共有一个父进程和5个子进程，和5个workers对应，可以使用父进程pid直接关闭gunicorn的所有进程

root@ubuntu:~/myproject/pira_score_web_application# kill -9 `cat gunicorn.pid`

gevent和协程

gevent：是一个基于协程的python网络库，在遇到IO阻塞时，程序会自动进行切换，可以让开发者用同步的方式写异步IO代码。
协程：是单线程下的并发，又称微线程，是一种并发编程模式，协程并发的本质是切换+保存状态。

测试使用gevent运行两个阻塞IO任务，分别阻塞3秒，4秒；gevent使用spawn定义一个协程任务，接受任务名和入参，是一个异步任务，使用join等待协程执行完毕退出，也可以调用joinall方法传入一个任务列表。

from gevent import monkey
import gevent
import time

monkey.patch_all()  # 合并成一行，专门用于打标记


def eat(name):
    print("%s is eating 1" % name)
    # gevent.sleep(3)  # gevent.sleep()和 time.sleep()效果一样
    time.sleep(3)
    print("%s is eating 2" % name)


def play(name):
    print("%s play 1" % name)
    # gevent.sleep(4)
    time.sleep(4)
    print("%s play 2" % name)


start = time.time()

g1 = gevent.spawn(eat, "aaa")  # 提交任务  #  spawn()第一个参数写任务名，后面直接参数就行（位置参数或关键字参数都可以）
g2 = gevent.spawn(play, "bbb")  # gevent.spawn()是异步提交任务

g1.join()
g2.join()  # 保证上面提交的两个任务都执行完毕了  # 协程是单线程的，需要再线程结束前等待g1和g2，要不然g1和g2还没起来，“主线程”就结束了，此时g1和g2也就不会再执行了
# g1.join()和g2.join()可以合并成：
# gevent.joinall([g1,g2])

stop = time.time()
print(stop - start)  # 4.002309322357178

执行结果为最大阻塞时间4秒，如果是串行执行为7秒。
gevent执行过程分析：
（1）先启任务1：g1先起来，执行了第一个print，然后遇到了IO阻塞（gevent.sleep(3)),然后立马就切到了 g2 提交的 play任务
（2）任务1阻塞切换任务2：执行 play中的第一个print，接着又遇到了IO阻塞（gevent.sleep(4)），然后就又切到了 g1的eat任务
（3）来回切换：此时g1的eat还是处于阻塞状态，接着就在两个任务之间来回切
（4）分别等待协程执行就绪：直到 g1的eat 又处于就绪状态，打印 eat的第2个print；执行完 eat之后，g2的play还处于阻塞状态，然后等其阻塞结束后执行 play的第2个print；
gevent监测了多个任务之间的IO阻塞，遇到IO阻塞就切走

Gunicorn前后压测对比

使用压测工具siege模拟并发，先修改根目录下.siege目录下的配置文件siege.conf设置最大并发量为1000，先对比400并发下，循环测试2次，请求之间无间隔的统计结果
flask自带的WSGI服务器

root@ubuntu:~/.siege# siege -c 400 -r 2 -b "http://192.168.67.72:5000/北京优胜辉煌教育科技有限公司.html"
** SIEGE 4.0.4
** Preparing 400 concurrent users for battle.
The server is now under siege...
Transactions:               4800 hits
Availability:             100.00 %
Elapsed time:              30.62 secs
Data transferred:         744.21 MB
Response time:              1.77 secs
Transaction rate:         156.76 trans/sec
Throughput:            24.30 MB/sec
Concurrency:              276.91
Successful transactions:        4800
Failed transactions:               0
Longest transaction:           29.01
Shortest transaction:           0.04

Gunicorn作为WSGI服务器
Gunicorn开启5个进程启动Flask，工作模式使用gevent

root@ubuntu:~/.siege# siege -c 400 -r 2 -b "http://192.168.61.100:5000/北京优胜辉煌教育科技有限公司.html"
** SIEGE 4.0.4
** Preparing 400 concurrent users for battle.
The server is now under siege...
Transactions:               4800 hits
Availability:             100.00 %
Elapsed time:              13.72 secs
Data transferred:         744.21 MB
Response time:              0.83 secs
Transaction rate:         349.85 trans/sec
Throughput:            54.24 MB/sec
Concurrency:              291.27
Successful transactions:        4800
Failed transactions:               0
Longest transaction:           12.04
Shortest transaction:           0.00

Transactions：总计传输的事务数，请求这个url加上其他静态文件一共发出6个请求，循环两次，一共400 × 12 = 4800
Elapsed time：总耗时， Flask总耗时 30.62秒，Gunicorn耗时13.72秒
Response time：平均响应时间，Flask 1.77秒， Gunicorn 0.83秒
Transaction rate：TPS每秒传输事务数，Flask 156，Gunicorn 350

测试结果Gunicorn的处理速度和性能是Flask的2倍多，并且在模拟并发达到500及以上时，请求Flask服务报错，而Gunicorn运行良好

Docker容器化部署服务器

在项目目录下创建requirements.txt，指定项目需要的Python包和版本

# requirements.txt
flask==1.1.1
Flask-SQLAlchemy==2.4.4
gevent==20.9.0
gunicorn==20.0.4
numpy==1.19.5
pymysql==1.0.0
SQLAlchemy==1.3.13
python-dotenv==0.15.0
Flask-Caching==1.9.0

在项目根目录向创建Dockerfile，指定镜像源，在基础镜像中安装requirements.txt包

FROM python:3.7
ENV PIPURL "https://pypi.tuna.tsinghua.edu.cn/simple"

ADD ./requirements.txt /home/
WORKDIR /home
RUN pip install --no-cache-dir -i ${PIPURL} -r requirements.txt
#CMD gunicorn -c gun.conf.py app:app

构建镜像

root@ubuntu:~/myproject/pira_score_web_application# docker build . -t=pira_score_web:latest

挂载flask项目根目录下的所有文件到容器内部启动

root@ubuntu:~/myproject/pira_score_web_application# docker run --rm -d -v `pwd`:/home -p 5001:5000 pira_score_web:latest

Nginx配置反向代理

ubuntu下安装nginx

sudo apt-get install nginx

查看nginx运行状态

root@ubuntu:~# service nginx status
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-01-18 10:13:15 CST; 2s ago
     Docs: man:nginx(8)
  Process: 1141 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=0/SUCCESS)
  Process: 1231 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 1218 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 1232 (nginx)
    Tasks: 5 (limit: 4915)
   CGroup: /system.slice/nginx.service
           ├─1232 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           ├─1237 nginx: worker process
           ├─1238 nginx: worker process
           ├─1239 nginx: worker process
           └─1240 nginx: worker process

开启，停止，重启nginx服务

root@ubuntu:~# service nginx start
root@ubuntu:~# service nginx stop
root@ubuntu:~# service nginx restart

image.png

给Flask应用构建Nginx配置文件，通常在/etc/nginx/sites-enabled或者/etc/nginx/conf.d目录下创建单独的配置文件，而不直接在全局配置文件/etv/nginx/nginx.conf直接创建，在nginx.conf中已经引入这两个路径，这两个路径下的配置会被插入到全局配置文件中。

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;

Nginx配置文件的基本结构

Nginx配置文件为nginx.conf，结构如下

...              #全局块


events {         #events块
   ...
}

http      #http块
{
    ...   #http全局块
    server        #server块
    { 
        ...       #server全局块
        location [PATTERN]   #location块
        {
            ...
        }
        location [PATTERN] 
        {
            ...
        }
    }
    server
    {
      ...
    }
    ...     #http全局块
}

全局块：配置影响nginx全局的指令。一般有运行nginx服务器的用户组，nginx进程pid存放路径，日志存放路径，配置文件引入，允许生成worker process数等，默认如下

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events块：配置影响nginx服务器或与用户的网络连接。有每个进程的最大连接数，选取哪种事件驱动模型处理连接请求，是否允许同时接受多个网路连接，开启多个网络连接序列化等，默认如下。

events {
worker_connections 768;
# multi_accept on;
}

http块：可以嵌套多个server，配置代理，缓存，日志定义等绝大多数功能和第三方模块的配置。如文件引入，mime-type定义，日志自定义，是否使用sendfile传输文件，连接超时时间，单连接请求数等，在这个块底部使用include引入其他文件的server。

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;

server块：配置虚拟主机的相关参数，一个http中可以有多个server，可以在sites-enabled和conf.d下定义文件。
location块：配置请求的路由，以及各种页面的处理情况。
[PATTERN]：设置location 的路径匹配规则，根据不同的路径分配给请求不同的处理方式，主要分为前缀匹配和正则匹配

前缀匹配
精确前缀匹配: location = uri {...}
优先前缀匹配: location ^~ uri {...}
普通前缀匹配: location uri {...}
正则匹配
大小写敏感: location ~ uri {...}
大小写不敏感: location ~* uri {...}

location的匹配顺序以及优先级

1.首先匹配=
2.其次匹配^~
3.再其次按照配置文件的顺序进行正则匹配
4.最后是交给/进行通用匹配，通用匹配记录下最长的匹配作为命中的规则

server配置

server {
    listen 80;
    server_name 127.0.0.1;
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    location / {
        proxy_pass http://127.0.0.1:5000;
        proxy_redirect    off;

        proxy_set_header    Host                $host;
        proxy_set_header    X-Real_IP           $remote_addr;
        proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for ;
        proxy_set_header    X-Forwarded-Proto   $scheme;
        #deny 127.0.0.1;  #拒绝的ip
    }

    location /PiraScore/static {
        alias /home/gp/myproject/pira_score_web_application/static/;
        expires 30d;
        add_header wall  "use nginx cache";
    }
}

server参数配置详解

listen：监听端口
server_name：监听地址
access_log：访问日志地址
error_log：错误日志
proxy_pass：请求转向的目标位置
proxy_redirect：禁止所有的proxy_redirect指令
proxy_set_header：用来设定被代理服务器接收到的header信息

语法：proxy_set_header field value;
field ：为要更改的项目，也可以理解为变量的名字，比如host
value ：为变量的值

Host，$host：设置header信息中的Host，如果不设置则默认host的值为proxy_pass后面跟的那个域名或者IP
X-Real_IP，$remote_addr：用来设置被代理端接收到的远程客户端IP，如果不设置，则header信息中并不会透传远程真实客户端的IP地址
X-Forwarded-For，$proxy_add_x_forwarded_for：用来设置被代理端接收到的远程客户端IP，如果不设置，则header信息中并不会透传远程真实客户端的IP地址
X-Forwarded-Proto：用于识别识别实际用户发出的协议是 http 还是 https
其中proxy_set_header的几行设置是反向代理的标准配置

proxy_set_header    Host                $host;
proxy_set_header    X-Real_IP           $remote_addr;
proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for ;  
proxy_set_header    X-Forwarded-Proto   $scheme;

deny：设置拒绝的ip，设置了之后就403 forbidden了
location：第一个规则是/，代表通用匹配，所有url如果没用命中第二个location都会走这个规则处理
/PiraScore/static：第二个规则是/PiraScore/static，代表通用匹配，静态文件的地址
alias：设置url对应文件系统的位置，设置后会到定义的目录中寻找资源，alias后面必须要用 / 结束，否则会找不到文件
expires：30d设置缓存时间为30天
add_header：用于设置header中的自定义信息，变量名可以随意指定，比如wall

nginx启动

使用命令检查Nginx配置是否有语法错误

root@ubuntu:/etc/nginx/sites-enabled# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

重启Nginx

root@ubuntu:/etc/nginx/sites-enabled# service nginx restart

访问url http:127.0.0.1:80/PiraScore成功，访问静态文件的url查看是否被对应location处理成缓存，在响应头中查看是否有自定义内容，存在！

静态文件访问地址的响应头.png
也可以在日志中查看，日志中不应该存在请求静态文件的url，因为静态文件走了Nginx缓存