lxml的使用方法

2018-08-07 本文已影响17人戌时说

使用lxml解析HTML代码

1.解析html字符串：使用lxml.etree.HTML进行解析，示例代码如下：

htmlElement  = etree.HTML(text)
print (etree.tostring(htmlElement,encoding='utf-8').decode('utf-8'))

2.解析html文件：使用lxml.etree.parse进行解析，示例代码如下：

htmlElement = etree.parse('tencent.html')
print (etree.tostring(htmlElement,encoding = 'utf-8').decode('utf-8'))

这个函数默认使用的是xml解释器，所以碰到一些不规范的html代码的时候就会有解析错误，此时需要自己创建html解释器：

parser = etree.HTMLParser(encoding = 'utf-8') #parser为自己创建的解释器
htmlElement = etree.parse(encoding = 'utf-8',parser = parser)
print (etree.tostring(htmlElement,encoding = 'utf-8').decode('utf-8'))

lxml的使用方法

使用lxml解析HTML代码

猜你喜欢

热点阅读