爬虫 response.text乱码

2023-06-29 本文已影响0人门前的那颗樱桃树

打印 response.text乱码。
打印 response.encoding 为utf-8。
当用 Python 做爬虫的时候，一些网站为了防爬虫会设置一些检查机制，这时我们就需要添加请求头，伪装成浏览器正常访问。例如我们在使用scrapy写爬虫时，在setting中我们设置了DEFAULT_REQUEST_HEADERS。在这里面我们设置了Accept-Encoding为"gzip, deflate, br"。那么有可能这个网站的编码就br，然而我们的pycharm上没有下载这个库，就会导致乱码。

DEFAULT_REQUEST_HEADERS = {
   "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
   # 这里使用了br,就有可能乱码，解释器需要下载 Brotli pip install Brotli
   "Accept-Encoding": "gzip, deflate, br",
   "Accept-Language": "zh-CN,zh;q=0.9",
   "Cache-Control": "max-age=0",
   "Cookie": "resolution=1080*1920; Hm_lvt_c826b0776d05b85d834c5936296dc1d5=1686822404; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22188228516787b0-0579a4d65a09f9-1c525634-2073600-188228516791df7%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTg4MjI4NTE2Nzg3YjAtMDU3OWE0ZDY1YTA5ZjktMWM1MjU2MzQtMjA3MzYwMC0xODgyMjg1MTY3OTFkZjcifQ%3D%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%22%24device_id%22%3A%22188228516787b0-0579a4d65a09f9-1c525634-2073600-188228516791df7%22%7D; kk_s_t=1687169079723",
   "If-None-Match": "27733-wHpibHGyRBeG+tUml+dq3EKDpIc",
   "Sec-Ch-Ua-Mobile": "?0",
   "Sec-Ch-Ua-Platform": "macOS",
   "Sec-Fetch-Dest": "document",
   "Sec-Fetch-Mode": "navigate",
   "Sec-Fetch-Site": "none",
   "Sec-Fetch-User": "?1",
   "Upgrade-Insecure-Requests": "1",
   "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36",
}

爬虫 response.text乱码

总结：

1、将`Accept-Encoding`中的：`br` 去除。

2、导入`Brotli`这个库。

猜你喜欢

热点阅读

爬虫 response.text乱码

总结：

1、将Accept-Encoding中的：br 去除。

2、导入Brotli这个库。

猜你喜欢

热点阅读

1、将`Accept-Encoding`中的：`br` 去除。

2、导入`Brotli`这个库。