lxml解析网页速度比BeautifulSoup快

2016-08-17 本文已影响0人 bbjoe

我的代码：

# -*- coding: utf-8 -*-
import requests
from time import ctime
from lxml import etree
from bs4 import BeautifulSoup

url = 'http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html'
tries = 300
web_data = requests.get(url).text

# step 1
print('lxml start at:', ctime())
while tries > 0:
    lxml_page = etree.HTML(web_data)
    tries = tries - 1
print('lxml done at:', ctime())

# step 2
print('soup start at:', ctime())
while tries > 0:
    soup_page = BeautifulSoup(web_data, 'lxml')
    tries = tries - 1
print('soup done at:', ctime())

我是分步运行的：先注释掉step2，运行step1；之后注释掉1，运行2。新手轻拍

运行结果：

解析一个博客页面300次，Beautiful用了约8秒，lxml用了约1秒

BeautifulSoup.png

lxml.png

lxml解析网页速度比BeautifulSoup快

我的代码：

运行结果：

猜你喜欢

热点阅读