python因为有GIL锁,因此多线程也只能使用一个处理器,但是numpy是例外: 这篇文字讲了numpy的并行计算,我把自己的理解总结如下:
numpy本身的矩阵运算(array operations)可以绕过GIL
因为numpy内部是用C写的,不经过python解释器,因此它本身的矩阵运算(array operations)都可以使用多核,此外它内部还用了BLAS(the Basic Linear Algebra Subroutines),因此可以进一步优化计算速度。
while a thread is waiting** for IO **(for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do, (A和B都是numpy矩阵):
>>> A = B + C
>>> print A
During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.
多进程间numpy arrays也可共享,具体怎么共享再说
It is possible to share memory between processes, including numpy arrays
Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).
import numpy as np
import math
def f(x):
print x
y = [1]*10000000
[math.exp(i) for i in y]
def g(x):
print x
y = np.ones(10000000)
from handythread import foreach
from processing import Pool
from timings import f,g
def fornorm(f,l):
for i in l:
time fornorm(g,range(100))
time fornorm(f,range(10))
time foreach(g,range(100),threads=2)
time foreach(f,range(10),threads=2)
p = Pool(2)
100 * g() | 10 * f() | |
normal | 43.5s |
48s |
2 threads | 31s |
71.5s |
2 processes | 27s |
31.23 |
For function f()
, which does not release the GIL, threading actually performs worse than serial code, presumably due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function g()
which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although multiprocesses is slightly faster.