lxml解析网页速度比BeautifulSoup快

2016-08-17  本文已影响0人  bbjoe

我的代码:

# -*- coding: utf-8 -*-
import requests
from time import ctime
from lxml import etree
from bs4 import BeautifulSoup

url = 'http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html'
tries = 300
web_data = requests.get(url).text

# step 1
print('lxml start at:', ctime())
while tries > 0:
    lxml_page = etree.HTML(web_data)
    tries = tries - 1
print('lxml done at:', ctime())

# step 2
print('soup start at:', ctime())
while tries > 0:
    soup_page = BeautifulSoup(web_data, 'lxml')
    tries = tries - 1
print('soup done at:', ctime())

我是分步运行的:先注释掉step2,运行step1;之后注释掉1,运行2。新手轻拍

运行结果:

解析一个博客页面300次,Beautiful用了约8秒lxml用了约1秒

BeautifulSoup.png lxml.png
上一篇下一篇

猜你喜欢

热点阅读