超详细的python爬虫入门,从 requests 到 scrapy

python爬虫小工具--快速获得请求头

2019-01-30  本文已影响6人  渔父歌

我们在写爬虫脚本的时候经常要获取请求头,但是每次从浏览器粘贴到代码里时,都要费一番功夫来处理格式。

于是写了一个请求头转换的脚本,可以将浏览器里复制过来的请求头字符串转换为字典并输出。

import re


def headers_to_dict(headers_str, out_put=True):
    items = headers_str.strip().split('\n')
    headers_dict = {}
    for t in items:
        key, value = re.findall(r'^(\S+):\s*([\s\S]+)$', t)[0]
        headers_dict[key] = value
        if out_put:
            print(f"'{key}': '{value}',")
    return headers_dict

使用说明:

使用示例:

headers_to_dict(''':authority: www.jianshu.com
:method: GET
:path: /p/b671f67a5960
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: max-age=0
if-none-match: W/"0d1384f05bc47dfa8d8d26187e1b3f4f"
referer: https://www.jianshu.com/writer
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36''')

#输出
"""
':authority': 'www.jianshu.com',
':method': 'GET',
':path': '/p/b671f67a5960',
':scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'max-age=0',
'if-none-match': 'W/"0d1384f05bc47dfa8d8d26187e1b3f4f"',
'referer': 'https://www.jianshu.com/writer',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
"""

#返回值
"""
{
    ':authority': 'www.jianshu.com', 
    ':method': 'GET', 
    ':path': '/p/b671f67a5960', 
    ':scheme': 'https', 
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 
    'accept-encoding': 'gzip, deflate, br', 
    'accept-language': 'zh-CN,zh;q=0.9', 
    'cache-control': 'max-age=0',
    'if-none-match': 'W/"0d1384f05bc47dfa8d8d26187e1b3f4f"', 
    'referer': 'https://www.jianshu.com/writer', 
    'upgrade-insecure-requests': '1', 
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
"""

上一篇下一篇

猜你喜欢

热点阅读