python 程序监控及泄漏问题查找

2019-12-13  本文已影响0人  俊采星驰_87e0

主要有一下几个监控程序。

cProfile:可以打印每个函数的运行次数及运行总时间和平均时间。
line_profiler: 打印每一行代码的执行次数,执行总时间和执行平均时间。
memory_profiler: 打印每一行代占用内存的总量及内存的增量。(执行速度较慢)
tracemalloc: 打印每一行代码的内存总量,对象数量,平均大小。还可以对比前后运行的差异,及回溯内存的代码。

1.cProfile

Python自带了几个性能分析的模块:profile、cProfile和hotshot,使用方法基本都差不多,无非模块是纯Python还是用C写的。本文介绍cProfile。

import time


def func1():
    sum = 0
    for i in range(1000000):
        sum += i


def func2():
    time.sleep(10)


func1()
func2()
运行
python -m cProfile del.py
结果
  6 function calls in 10.047 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   10.047   10.047 test.py:1(<module>)
        1    0.000    0.000   10.000   10.000 test.py:10(func2)
        1    0.047    0.047    0.047    0.047 test.py:4(func1)
        1    0.000    0.000   10.047   10.047 {built-in method builtins.exec}
        1   10.000   10.000   10.000   10.000 {built-in method time.sleep}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

其中对参数的解释:

ncalls:表示函数调用的次数;
tottime:表示指定函数的总的运行时间,除掉函数中调用子函数的运行时间;
percall:(第一个percall)等于 tottime/ncalls;
cumtime:表示该函数及其所有子函数的调用运行的时间,即函数开始调用到返回的时间;
percall:(第二个percall)即函数运行一次的平均时间,等于 cumtime/ncalls;
filename:lineno(function):每个函数调用的具体信息;
需要注意的是cProfile很难搞清楚函数内的每一行发生了什么,是针对整个函数来说的。

-s cumulative
-s cumulative开关告诉cProfile对每个函数累计花费的时间进行排序,他能让我看到代码最慢的部分。

2.line_profiler

主要作用就是记录每一行代码的执行时间及执行次数。

安装:

windows无法直接安装,可以再https://www.lfd.uci.edu/~gohlke/pythonlibs/找到对应的版本然后再安装。

line_profiler有两种调用方式:
一种是直接在代码里调用

from line_profiler import LineProfiler
import random


def do_stuff(numbers):
    s = sum(numbers)
    l = [numbers[i] / 43 for i in range(len(numbers))]
    m = ['hello' + str(numbers[i]) for i in range(len(numbers))]


numbers = [random.randint(1, 100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()
输出结果:
Timer unit: 1e-06 s
 
Total time: 0.000649 s
File: <ipython-input-2-2e060b054fea>
Function: do_stuff at line 4
 
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_stuff(numbers):
     5         1           10     10.0      1.5      s = sum(numbers)
     6         1          186    186.0     28.7      l = [numbers[i]/43 for i in range(len(numbers))]
     7         1          453    453.0     69.8      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

每一列的含义如下:

Line :文件中的行号。
Hits:该行被执行的次数。
Time:以计时器单位执行该行所花费的总时间。 在表格前的标题信息中,您将看到“计时器单位:”行,将转换因子指定为秒。 在不同的系统上可能有所不同。
Per Hit:以计时器为单位执行该行一次所花费的平均时间。
% Time:相对于该功能所记录的总时间,该行所花费的时间的百分比。
Line Contents:实际的源代码。 请注意,在查看格式化结果时,总是从磁盘读取此内容,而不是在执行代码时读取。 如果同时编辑了文件,则各行将不匹配,并且格式化程序甚至可能无法找到要显示的功能。

对于函数内部调用函数的情况:
from line_profiler import LineProfiler
import random
 
def do_other_stuff(numbers):
    s = sum(numbers)
 
def do_stuff(numbers):
    do_other_stuff(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
 
numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

这样做的话,只能显示子函数的总时间。输出如下:

Timer unit: 1e-06 s
 
Total time: 0.000773 s
File: <ipython-input-3-ec0394d0a501>
Function: do_stuff at line 7
 
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def do_stuff(numbers):
     8         1           11     11.0      1.4      do_other_stuff(numbers)
     9         1          236    236.0     30.5      l = [numbers[i]/43 for i in range(len(numbers))]
    10         1          526    526.0     68.0      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

为了能够同时显示函数每行所用时间和调用函数每行所用时间,加入add_function就能够解决。


from line_profiler import LineProfiler
import random
 
def do_other_stuff(numbers):
    s = sum(numbers)
 
def do_stuff(numbers):
    do_other_stuff(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]
 
numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp.add_function(do_other_stuff)   # add additional function to profile
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

Timer unit: 1e-06 s
 
Total time: 9e-06 s
File: <ipython-input-4-dae73707787c>
Function: do_other_stuff at line 4
 
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def do_other_stuff(numbers):
     5         1            9      9.0    100.0      s = sum(numbers)
 
Total time: 0.000694 s
File: <ipython-input-4-dae73707787c>
Function: do_stuff at line 7
 
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def do_stuff(numbers):
     8         1           12     12.0      1.7      do_other_stuff(numbers)
     9         1          208    208.0     30.0      l = [numbers[i]/43 for i in range(len(numbers))]
    10         1          474    474.0     68.3      m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

也可以添加装饰器启动
import random

@profile
def do_other_stuff(numbers):
    s = sum(numbers)

@profile
def do_stuff(numbers):
    do_other_stuff(numbers)
    l = [numbers[i] / 43 for i in range(len(numbers))]
    m = ['hello' + str(numbers[i]) for i in range(len(numbers))]


numbers = [random.randint(1, 100) for i in range(1000)]
do_stuff(numbers)

不要担心,你不需要导入任何模组,直接使用命令启动即可

kernprof -l script_to_profile.py
或者
python -m line_profiler script_to_profile.py.lprof

运行结果如下

Timer unit: 2.84445e-07 s

Total time: 7.68e-06 s
File: test.py
Function: do_other_stuff at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def do_other_stuff(numbers):
     5         1         27.0     27.0    100.0      s = sum(numbers)

Total time: 0.000570311 s
File: test.py
Function: do_stuff at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           @profile
     8                                           def do_stuff(numbers):
     9         1         59.0     59.0      2.9      do_other_stuff(numbers)
    10         1        639.0    639.0     31.9      l = [numbers[i] / 43 for i in range(len(numbers))]
    11         1       1307.0   1307.0     65.2      m = ['hello' + str(numbers[i]) for i in range(len(numbers))]

3.memory_profiler

memory_profiler模块用来基于逐行测量代码的内存使用。使用这个模块会让代码运行的更慢。

安装方法如下:
pip install memory_profiler

另外,建议安装psutil包,这样memory_profile会运行的快一点:

pip install psutil
例子:
import time


@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    time.sleep(10)
    del b
    del a
    print("+++++++++")


if __name__ == '__main__':
    my_func()
启动程序
python -m memory_profiler primes.py
输出结果:
+++++++++
Filename: test.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   40.684 MiB   40.684 MiB   @profile
     5                             def my_func():
     6   48.316 MiB    7.633 MiB       a = [1] * (10 ** 6)
     7  200.906 MiB  152.590 MiB       b = [2] * (2 * 10 ** 7)
     8  200.914 MiB    0.008 MiB       time.sleep(10)
     9   48.324 MiB    0.000 MiB       del b
    10   40.691 MiB    0.000 MiB       del a
    11   40.699 MiB    0.008 MiB       print("+++++++++")

Line:行号
Mem usage: 内存使用
Increment : 内存增量
Line Contents: 实际代码内容

第三方api

memory_profiler公开了第三方代码中要使用的许多功能。

memory_usage(proc = -1,interval = .1,timeout = None)

返回一个时间间隔内的内存使用情况。 第一个参数proc表示应监视的内容。
这可以是进程(不一定是Python程序)的PID,包含要评估的某些python代码的字符串,
也可以是包含要评估为f(* args)的函数及其参数的元组(f,args,kw) ,** kw)。 例如,

>>> from memory_profiler import memory_usage
>>> mem_usage = memory_usage(-1, interval=.2, timeout=1)
>>> print(mem_usage)
    [7.296875, 7.296875, 7.296875, 7.296875, 7.296875]

在这里,我告诉memory_profiler获取当前进程在1秒内的内存消耗,时间间隔为0.2秒。 正如我给PID的-1那样,它是一个特殊的数字(PID通常为正),表示当前进程,也就是说,我正在获取当前Python解释器的内存使用情况。 因此,我从一个普通的python解释器中获得了大约7MB的内存使用量。 如果在IPython(控制台)上尝试相同的操作,我将获得29MB,如果在IPython笔记本上尝试相同的操作,则可扩展至44MB。

如果要获取Python函数的内存消耗,则应在元组(f,args,kw)中指定该函数及其参数。 例如:

>>> # define a simple function
>>> def f(a, n=100):
    ...     import time
    ...     time.sleep(2)
    ...     b = [a] * n
    ...     time.sleep(1)
    ...     return b
    ...
>>> from memory_profiler import memory_usage
>>> memory_usage((f, (1,), {'n' : int(1e6)}))

这将执行代码f(1,n = int(1e6)),并在执行期间返回内存消耗。

4.tracemalloc

python3的标准库

import tracemalloc # from 3.4

tracemalloc.start() # 开始跟踪内存分配

d = [dict(zip('xy',(5, 6))) for i in range(1000000)]
t = [tuple(zip('xy',(5, 6))) for i in range(1000000)]

snapshot = tracemalloc.take_snapshot() # 快照,当前内存分配
top_stats = snapshot.statistics('lineno') # 快照对象的统计

for stat in top_stats:
    print(stat)

结果

F:/lyh/git/test.py:5: size=237 MiB, count=2000002, average=124 B
F:/lyh/git/test.py:6: size=191 MiB, count=3000002, average=67 B
F:/lyh/git/test.py:8: size=448 B, count=1, average=448 B
C:\Users\yuanbao\AppData\Local\Programs\Python\Python36\lib\tracemalloc.py:522: size=64 B, count=1, average=64 B

每个文件名和每个 line 编号分配的 memory 块的统计信息:分配的 memory 块的总大小,数量和平均大小

计算两个快照之间的差异以检测 memory 泄漏
import tracemalloc # from 3.4

tracemalloc.start() # 开始跟踪内存分配

d = [dict(zip('xy',(5, 6))) for i in range(1000000)]
snapshot1 = tracemalloc.take_snapshot()
t = [tuple(zip('xy',(5, 6))) for i in range(1000000)]

# ... call the function leaking memory ...
snapshot2 = tracemalloc.take_snapshot()

top_stats = snapshot2.compare_to(snapshot1, 'lineno')

print("[ Top 10 differences ]")
for stat in top_stats[:10]:
    print(stat)
结果
[ Top 10 differences ]
F:/lyh/git/test.py:7: size=191 MiB (+191 MiB), count=3000004 (+3000004), average=67 B
C:\Users\yuanbao\AppData\Local\Programs\Python\Python36\lib\tracemalloc.py:387: size=664 B (+664 B), count=5 (+5), average=133 B
C:\Users\yuanbao\AppData\Local\Programs\Python\Python36\lib\tracemalloc.py:524: size=568 B (+568 B), count=3 (+3), average=189 B
C:\Users\yuanbao\AppData\Local\Programs\Python\Python36\lib\tracemalloc.py:281: size=112 B (+112 B), count=2 (+2), average=56 B
C:\Users\yuanbao\AppData\Local\Programs\Python\Python36\lib\tracemalloc.py:522: size=128 B (+64 B), count=2 (+1), average=64 B
F:/lyh/git/test.py:5: size=237 MiB (+0 B), count=2000002 (+0), average=124 B
F:/lyh/git/test.py:6: size=448 B (+0 B), count=1 (+0), average=448 B
显示最大内存块的回溯的代码:
import tracemalloc # from 3.4

tracemalloc.start(25) # 开始跟踪内存分配

d = [dict(zip('xy',(5, 6))) for i in range(1000000)]
# snapshot1 = tracemalloc.take_snapshot()
t = [tuple(zip('xy',(5, 6))) for i in range(1000000)]

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# pick the biggest memory block
stat = top_stats[0]
print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
for line in stat.traceback.format():
    print(line)
2000001 memory blocks: 242868.5 KiB
  File "F:/lyh/git/test.py", line 5
    d = [dict(zip('xy',(5, 6))) for i in range(1000000)]
  File "F:/lyh/git/test.py", line 5
    d = [dict(zip('xy',(5, 6))) for i in range(1000000)]

参考文章:https://blog.csdn.net/guofangxiandaihua/article/details/77825524
https://www.docs4dev.com/docs/zh/python/3.7.2rc1/all/library-tracemalloc.html

上一篇 下一篇

猜你喜欢

热点阅读