python爬虫报错 'utf-8' codec can't d

2018-08-16  本文已影响0人  AoeKeller

不废话,用urllib.open,返回的信用是二进制文件,然后decode的时候,报错
'utf-8' codec can't decode byte 0x8b
检查了原网页确实是utf-8编码,
我的代码如下

    def myDownLoad(self, url):
    webheader = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng, */*',
        'Accept-Language': 'zh-CN',
        'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1',
        'DNT': '1',
        'Accept-Encoding': 'gzip, deflate',
        'Connection': 'Keep-Alive',
        'Host': 'tuchong.com',
        'Cookie': 'PHPSESSID=rqp9t3p1p0n4qkr3kt0f24vd44; webp_enabled=1; _ga=GA1.2.518965677.1534150590; log_web_id=5010412077; email=576063964%40qq.com; token=234ef3d22ce69f5f; _gid=GA1.2.1159319752.1534333971; _gat=1',
        'Referer':'https://tuchong.com/1890400/'
    }
    try:
        context = ssl._create_unverified_context()
        request_data = request.Request(url, headers=webheader)
        response = urlopen(request_data, context=context).read()
        # print(response.decode('utf8'))
        return response.decode('utf8')
    except Exception as e:
        print(e)
        return False

然后报错

 'utf-8' codec can't decode byte 0x8b

后来才发现,是压缩的问题,去掉这个就好了

'Accept-Encoding': 'gzip, deflate',
上一篇下一篇

猜你喜欢

热点阅读