python请求模块
2020-07-15 本文已影响0人
山高路陡
爬虫请求模块
urllib.request中的get与post请求
-
get请求,查询参数在url地址中显示
- header={}
- res = urllib.request.Request(url,headers=headers)
- urllib.request.urlopen(res)
-
post请求
- 在Request方法中添加data参数
- header={}
- data={},表单数据以bytes类型数据提交,urllib.parse.urlencode(data)
- res = urllib.request.Request(url,data=data,headers=headers)
- urllib.request.urlopen(res)
-
通过Handler构建opener,用于处理ip代理,网页登录和cookie获取与保存读取
-
urllib.request.build_opener(*Handler) # *Handler代表可传入多个Handler
-
res = urllib.request.Request(url,data=data,headers=headers)
-
ProxyHandler代理
import urllib.request url = 'https://www.baidu.com' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36' } res = urllib.request.Request(url,headers=headers) proxy_handler = urllib.request.ProxyHandler({‘http’:’ip:port’}) opener = urllib.request.build_opener(proxy_handler) html = opener.open(res)
-
-
Cookies处理
import http.cookiejar, urllib.request url = 'https://www.baidu.com' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36' } res = urllib.request.Request(url,headers=headers) # 创建存储cookie的CookieJar()对象 cookie = http.cookiejar.CookieJar() handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) result = opener.open(res) for item in cookie: print(item) # cookie的保存与读取 filename = 'cookies.txt' cookie = http.cookiejar.MozillaCookieJar(filename) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) result = opener.open(res) cookie.save(ignore_discard=True,ignore_expires=True) # LWP格式保存 filename = 'cookies.txt' cookie = http.cookiejar.LWPCookieJar(filename) cookie.save(ignore_discard=True, ignore_expires=True) # 读取cookie cookie = http.cookiejar.LWPCookieJar() cookie.load('cookies.txt', ignore_discard=True, ignore_expires=True) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler)
requests模块
- 安装 pip install requests
- requests常用方法
- response = requests.get(url, params=None, **kwargs)
- params可给url传入参数,参数为字典格式,params={}
- **kwargs–> 传入更多的参数,如:headers=headers
- response.text –> str 自动猜测编码格式解码,可能会发生乱码
- 编码后解码,response.encoding=’utf-8’ response.text
- response.content –> bytes 字节流数据
- response.content.decode(‘utf-8’) –>以特定格式解码
- response = requests.get(url, params=None, **kwargs)
- requests post请求
- requests.post(url,data=None,json=None,**kwargs)
- 参数data为字典表单数据,
- **kwargs –>如:headers=headers
- requests代理设置
- proxy={‘http‘:’ip:port’}
- res = requests.get(url,proxies=proxy)
- 处理ssl风险
- 在请求方式中加verify=False
- requests.get(url,verify=False)