lxml解析网页速度比BeautifulSoup快
2016-08-17 本文已影响0人
bbjoe
我的代码:
# -*- coding: utf-8 -*-
import requests
from time import ctime
from lxml import etree
from bs4 import BeautifulSoup
url = 'http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html'
tries = 300
web_data = requests.get(url).text
# step 1
print('lxml start at:', ctime())
while tries > 0:
lxml_page = etree.HTML(web_data)
tries = tries - 1
print('lxml done at:', ctime())
# step 2
print('soup start at:', ctime())
while tries > 0:
soup_page = BeautifulSoup(web_data, 'lxml')
tries = tries - 1
print('soup done at:', ctime())
我是分步运行的:先注释掉step2,运行step1;之后注释掉1,运行2。新手轻拍
运行结果:
解析一个博客页面300次,Beautiful用了约8秒,lxml用了约1秒
BeautifulSoup.png lxml.png