Python 3.7 通过 asyncio 实现异步编程
Python 中通过 asyncio 实现的异步编程主要包含如下三个模块:
- 事件循环(event loop):每一个需要异步执行的任务都会在事件循环中注册,事件循环负责管理这些任务之间的执行流程
- 协程(Coroutine):指用于执行具体某个异步任务的函数。函数体中的
await
关键字可以将协程的控制权释放给事件循环 - Future:表示已经执行或者尚未执行的任务的结果
在异步程序的世界里,所有代码都运行在事件循环中,可以同时执行多个协程。这些协程异步地执行,直到遇到 await
关键字,此时该协程会让出程序控制权给事件循环,使得其他协程有机会发挥作用。
需要注意的是,不能在同一个函数中同时包含异步和同步代码。即在同步函数中无法使用 await
关键字。
一、Hello World
以下是一段简单的使用了 async
关键字的 Hello World 程序:
import asyncio
async def hello(first_print, second_print):
print(first_print)
await asyncio.sleep(1)
print(second_print)
asyncio.run(hello("Welcome", "Good-bye"))
# => Welcome
# => Good-bye
上述代码的行为看上去更像是同步代码,先输出 Welcome
,等待一秒钟之后,再输出 Good-bye
。
在进一步探究之前,先看下上述异步代码中出现的几个基本概念:
- Python 语言中,任何由
async def
定义的函数(即上面的hello()
)都可以称之为协程。调用协程函数所返回的对象称为协程对象。 - 函数
asyncio.run
是所有异步代码的主入口,只应该被调用一次。它负责组织传入的协程对象,同时管理 asyncio 的事件循环。 -
await
关键字用于将协程运行时获取的程序控制权移交给事件循环,并中断该协程的执行流程。
一个更现实的异步程序的示例如下:
import asyncio
import time
async def say_something(delay, words):
print(f"Started: {words}")
await asyncio.sleep(delay)
print(f"Finished: {words}")
async def main():
print(f"Starting Tasks: {time.strftime('%X')}")
task1 = asyncio.create_task(say_something(1, "First task"))
task2 = asyncio.create_task(say_something(2, "Second task"))
await task1
await task2
print(f"Finished Tasks: {time.strftime('%X')}")
asyncio.run(main())
# => Starting Tasks: 20:32:28
# => Started: First task
# => Started: Second task
# => Finished: First task
# => Finished: Second task
# => Finished Tasks: 20:32:30
从同步执行的逻辑来看,应该是 task1
开始,等待一秒钟,结束;task2
开始,等待两秒钟,结束。共耗时 3 秒以上。
异步程序实际的执行流程为,task1
和 task2
同时开始,各自等待一段时间后,先后结束。共耗时 2 秒。具体如下:
-
task1
中的say_something
协程开始执行 -
say_something
遇到await
关键字时(await asyncio.sleep(delay)
),协程暂停执行并等待 1 秒钟,在暂停的同时将程序控制权转移给事件循环 -
task2
从事件循环获取控制权开始执行,同样遇到await
关键字时暂停协程并等待 2 秒钟,在暂停的同时将程序控制权转移给事件循环 -
task1
等待时间结束后,事件循环将控制权移交给task1
,恢复其协程的运行直至结束 -
task1
运行结束,task2
等待时间完成,task2
获取程序控制权并恢复运行直至结束。两个任务执行完成。
二、Awaitable 对象
await
关键字用于将程序控制权移交给事件循环并中断当前协程的执行。它有以下几个使用规则:
- 只能用在由
async def
修饰的函数中,在普通函数中使用会抛出异常 - 调用一个协程函数后,就必须等待其执行完成并返回结果
-
await func()
中的func()
必须是一个 awaitable 对象。即一个协程函数或者一个在内部实现了__await__()
方法的对象,该方法会返回一个生成器
Awaitable 对象包含协程、Task 和 Future 等。
协程
关于被 await
调用的协程,即上面的第二条规则,可以参考如下代码:
import asyncio
async def mult(first, second):
print(f"Calculating multiply of {first} and {second}")
await asyncio.sleep(1)
num_mul = first * second
print(f"Multiply is {num_mul}")
return num_mul
async def sum(first, second):
print(f"Calculating sum of {first} and {second}")
await asyncio.sleep(1)
num_sum = first + second
print(f"Sum is {num_sum}")
return num_sum
async def main(first, second):
await sum(first, second)
await mult(first, second)
asyncio.run(main(7, 8))
# => Calculating sum of 7 and 8
# => Sum is 15
# => Calculating multiply of 7 and 8
# => Multiply is 56
上述代码中由 await
修饰的两个协程函数 sum
和 mult
即为 awaitable 对象,从输出结果中可以看出,sum
函数先执行完毕并输出结果,随后 mult
函数执行并输出结果。
即 await
调用的协程函数必须执行完毕后才能继续执行另外的 await
协程,这看上去并不符合异步程序的定义。
Tasks
协程异步执行的关键在于 Tasks。
当任意一个协程函数被类似于 asyncio.create_task()
的函数调用时,该协程就会自动排进由事件循环管理的执行流程里。在 asyncio 的定义中,由事件循环控制运行的协程即被称为任务。
绝大多数情况下,编写异步代码即意味着需要使用 create_task()
方法将协程放进事件循环。
参考如下代码:
import asyncio
async def mul(first, second):
print(f"Calculating multiply of {first} and {second}")
await asyncio.sleep(1)
num_mul = first * second
print(f"Multiply is {num_mul}")
return num_mul
async def sum(first, second):
print(f"Calculating sum of {first} and {second}")
await asyncio.sleep(1)
num_sum = first + second
print(f"Sum is {num_sum}")
return num_sum
async def main(first, second):
sum_task = asyncio.create_task(sum(first, second))
mul_task = asyncio.create_task(mul(first, second))
await sum_task
await mul_task
asyncio.run(main(7, 8))
# => Calculating sum of 7 and 8
# => Calculating multiply of 7 and 8
# => Sum is 15
# => Multiply is 56
对比上一段代码示例,从输出中可以看出,sum_task
和 mul_task
两个任务的执行流程符合异步程序的逻辑。
sum_task
遇到 await asyncio.sleep(1)
语句后并没有让整个程序等待自己返回计算结果,而是中断执行并把控制权通过事件循环移交给 mul_task
。两个任务先后执行并进入等待,最后在各自的等待时间结束后输出结果。
除 create_task()
函数以外,还可以使用 asyncio.gather()
函数创建异步任务:
import asyncio
import time
async def greetings():
print("Welcome")
await asyncio.sleep(1)
print("Good by")
async def main():
await asyncio.gather(greetings(), greetings())
def say_greet():
start = time.perf_counter()
asyncio.run(main())
elasped = time.perf_counter() - start
print(f"Total time elasped: {elasped}")
say_greet()
# => Welcome
# => Welcome
# => Good by
# => Good by
# => Total time elasped: 1.0213364
实际两个任务完成的时间略大于 1 秒而不是 2 秒。
Futures
Futures 代表异步操作的预期结果,即该异步操作可能已经执行也可能尚未执行完毕。通常情况下并不需要在代码中显式地管理 Future 对象,这些工作一般由 asyncio 库隐式地处理。
当一个 Future 实例被创建成功以后,即代表该实例关联的异步操作还没有完成,但是会在未来的某个时间返回结果。
asyncio 有一个 asyncio.wait_for(aws, timeout, *)
方法可以为异步任务设置超时时间。如果超过指定时间后异步操作仍未执行完毕,则该任务被取消并抛出 asyncio.TimeoutError
异常。
timeout
的默认值为 None
,即程序会阻塞并一直等待直到 Future 对象关联的操作返回结果。
import asyncio
async def long_time_taking_method():
await asyncio.sleep(4000)
print("Completed the work")
async def main():
try:
await asyncio.wait_for(long_time_taking_method(),
timeout=2)
except asyncio.TimeoutError:
print("Timeout occurred")
asyncio.run(main())
# => Timeout occurred
三、Async 实例代码
通过创建子进程异步执行 Shell 命令:
import asyncio
async def run(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await proc.communicate()
print(f'[{cmd!r} exited with {proc.returncode}]')
if stdout:
print(f'[stdout]\n{stdout.decode()}')
if stderr:
print(f'[stderr]\n{stderr.decode()}')
async def main():
await asyncio.gather(
run('sleep 2; echo "world"'),
run('sleep 1; echo "hello"'),
run('ls /zzz'))
asyncio.run(main())
# => ['ls /zzz' exited with 2]
# => [stderr]
# => ls: cannot access '/zzz': No such file or directory
# => ['sleep 1; echo "hello"' exited with 0]
# => [stdout]
# => hello
# => ['sleep 2; echo "world"' exited with 0]
# => [stdout]
# => world
通过 Queue 将工作负载分发给多个异步执行的 Task 处理:
import asyncio
import random
import time
async def worker(name, queue):
while True:
# Get a "work item" out of the queue.
sleep_for = await queue.get()
# Sleep for the "sleep_for" seconds.
await asyncio.sleep(sleep_for)
# Notify the queue that the "work item" has been processed.
queue.task_done()
print(f'{name} has slept for {sleep_for:.2f} seconds')
async def main():
# Create a queue that we will use to store our "workload".
queue = asyncio.Queue()
# Generate random timings and put them into the queue.
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)
# Create three worker tasks to process the queue concurrently.
tasks = []
for i in range(3):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)
# Wait until the queue is fully processed.
started_at = time.monotonic()
await queue.join()
total_slept_for = time.monotonic() - started_at
# Cancel our worker tasks.
for task in tasks:
task.cancel()
# Wait until all worker tasks are cancelled.
await asyncio.gather(*tasks, return_exceptions=True)
print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')
asyncio.run(main())
# => worker-2 has slept for 0.12 seconds
# => worker-1 has slept for 0.28 seconds
# => worker-1 has slept for 0.12 seconds
# => worker-0 has slept for 0.46 seconds
# => worker-0 has slept for 0.49 seconds
# => worker-2 has slept for 0.90 seconds
# => worker-1 has slept for 0.62 seconds
# => worker-1 has slept for 0.67 seconds
# => worker-0 has slept for 0.85 seconds
# => worker-2 has slept for 0.94 seconds
# => worker-1 has slept for 0.45 seconds
# => worker-2 has slept for 0.19 seconds
# => worker-0 has slept for 0.99 seconds
# => worker-2 has slept for 0.86 seconds
# => worker-1 has slept for 0.97 seconds
# => worker-0 has slept for 0.74 seconds
# => worker-1 has slept for 0.58 seconds
# => worker-2 has slept for 0.73 seconds
# => worker-1 has slept for 0.27 seconds
# => worker-0 has slept for 0.57 seconds
# => ====
# => 3 workers slept in parallel for 4.10 seconds
# => total expected sleep time: 11.80 seconds