python

Django 多进程操作 Mysql 时 2013: Lost

2019-02-17  本文已影响0人  亮哥007

最近在优化Django数据库操作的性能问题,由于Python的GIL限制引入了多进程,优化过程中碰到了非常奇怪的数据库连接丢失问题,导致程序终止运行

errcode is 2013, errmsg is 'Lost connection to MySQL server during query'

排查了很长时间,最后明确导致这个错误的原因跟进程的fork机制有关:fork子进程完全复制父进程的空间
原程序的逻辑是这样的:
1.主进程查询待处理任务数据集
2.然后,创建进程池,并将任务数据集交由进程池处理
也就是说:
1.主进程建立了数据库连接
2.fork子进程的时候也fork了数据库连接(可以理解数据库连接资源是一个引用复用,不确定这样描述是否准确)
3.主进程操作完成,释放了数据库连接,这时子进程在做数据库操作的时候就连接丢失了

解决方案

1.在主进程创建进程池前释放数据库连接(测试时发现不能解决问题)
2.主进程不做数据库操作,子进程做任务抢占

模拟代码

问题复现

在主进程操作数据库,随后创建进程池,并在子进程中进行数据库操作

def process_lost(key):
    i = 0
    while True:
        if i > 5:
            return
        time.sleep(2)
        keys = ReviewRetraceKey.objects.filter(job_id__in=[6631860103316242436]).order_by('-create_time')
        print keys
        i += 1

def lost_connection():
    '''
    Lost connection to MySQL server during query
    :return:
    '''
    keys = ReviewRetraceKey.objects.filter(job_id__in=[6631860103316242436]).order_by('-create_time')
    data_list = list()
    for i in range(5):
        data_list.extend(keys)
    print len(data_list)
    pool = Pool(5)
    pool.map(process_lost, data_list)
    pool.close()
    pool.join()

lost_connection()

通过运行结果可以看到“数据库连接丢失的错误”

Traceback (most recent call last):
  File "test_data_model.py", line 77, in <module>
    lost_connection()
  File "test_data_model.py", line 38, in lost_connection
    pool.map(process_lost, data_list)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
_mysql_exceptions.OperationalError: errcode is 2013, errmsg is 'Lost connection to MySQL server during query'

优化方案

主进程只负责进程池创建,由子进程自己完成任务抢占和处理

lock = Lock()

manager = Manager()
map = manager.dict()

def process_un_lost(inx):
    job = None
    try:
        lock.acquire()
        for j in range(5):
            # 模拟取job通过锁控制竞争
            if j in map:
                print j, 'in map continue', inx
            else:
                map[j] = inx
                job = j
                break
    finally:
        lock.release()
    print 'process = ', inx, 'job = ', job
    i = 0
    while True:
        if i > 5:
            return
        time.sleep(2)
        keys = ReviewRetraceKey.objects.filter(job_id__in=[6631860103316242436]).order_by('-create_time')
        print inx, keys
        i += 1

def no_lost_connection():
    pool = Pool(5)
    pool.map(process_un_lost, [i for i in range(6, 11)])
    pool.close()
    pool.join()

no_lost_connection()
print map

程序运行正常

{0: 6, 1: 7, 2: 8, 3: 9, 4: 10}

总结

1.对类似进程fork这种底层技术了解的不多
2.CPython的GIL导致python并发编程的复杂度相对java和go高不少
3.ORM框架虽然简化了代码实现,但某些特定场景还是直接用原生技术更易掌控

上一篇下一篇

猜你喜欢

热点阅读