python爬虫报错 'utf-8' codec can't d
2018-08-16 本文已影响0人
AoeKeller
不废话,用urllib.open,返回的信用是二进制文件,然后decode的时候,报错
'utf-8' codec can't decode byte 0x8b
检查了原网页确实是utf-8编码,
我的代码如下
def myDownLoad(self, url):
webheader = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng, */*',
'Accept-Language': 'zh-CN',
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1',
'DNT': '1',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'Keep-Alive',
'Host': 'tuchong.com',
'Cookie': 'PHPSESSID=rqp9t3p1p0n4qkr3kt0f24vd44; webp_enabled=1; _ga=GA1.2.518965677.1534150590; log_web_id=5010412077; email=576063964%40qq.com; token=234ef3d22ce69f5f; _gid=GA1.2.1159319752.1534333971; _gat=1',
'Referer':'https://tuchong.com/1890400/'
}
try:
context = ssl._create_unverified_context()
request_data = request.Request(url, headers=webheader)
response = urlopen(request_data, context=context).read()
# print(response.decode('utf8'))
return response.decode('utf8')
except Exception as e:
print(e)
return False
然后报错
'utf-8' codec can't decode byte 0x8b
后来才发现,是压缩的问题,去掉这个就好了
'Accept-Encoding': 'gzip, deflate',