Urllib库

2017-12-20 本文已影响0人苦瓜1512

Urllib是python内置的http请求库，分为以下几个模块

urllib.request：请求模块
urllib.error：url异常处理模块
urllib.parse：url解析模块
urllib.robotparser：robots.txt解析模块

1.`urllib.request`

1.1 `urllib.request.urlopen()`

urllib.request.urlopen(url, data=None, [timeout,]*, ...)

url：要打开的连接
data=None：附加数据，例如使用post方式的时候附加的数据
timeout：超时时间

1.1.1 `url`

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))

response.read()方法返回的是bytes类型数据，需要decode成相应编码的字符串
这是一个get请求方式

1.1.2 `data`

import urllib.request
import urllib.parse

data = bytes(urllib.parse.urlencode({'world':'hello'}), encoding='utf-8')
respon = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(respon.read())

加入了data参数，这是一个post请求方式

1.1.3 `timeout`

import socket
import urllib.request
import urllib.error

try:
    respon = urllib.request.urlopen('http://www.baidu.com', timeout=1)
except urllib.error.URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('Time Out')
try:
    respon = urllib.request.urlopen('http://www.baidu.com', timeout=0.01)
except urllib.error.URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('Time Out')

1.2 响应

1.2.1 响应类型

import urllib.request
respon = urllib.request.urlopen('https://www.python.org')
print(type(respon))
================================================================================================
>> <class 'http.client.HTTPResponse'>

1.2.2 状态码与响应头

import urllib.request
respon = urllib.request.urlopen('http://www.python.org')
print(respon.status)
print(respon.getheaders())
print(respon.getheader('Server'))

respon.status：获取状态码
respon.getheaders()：所有的响应头
respon.getheader('Server')：获取特定响应头

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

import urllib.request as rq
requ = rq.Request('http://www.baidu.com')
resp = rq.urlopen(requ)
print(resp.read().decode('utf-8'))

声明一个Request对象
把Rquest对象传入urllib.request.urlopen()中

1.3.2 使用`Request`对象发起请求并携带额外数据

from urllib import request, parse
url = 'http://httpbin.org/post'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/57.0'
}
dict = {
    'name': 'doggy'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, headers=headers, method='POST')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

1.3.2 使用`Request`对象添加头信息

from urllib import request, parse
url = 'http://httpbin.org/post'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/57.0'
}
dict = {
    'name': 'doggy'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, method='POST')
req.add_header(headers)
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

2. `urllib.error`

urllib.error模块定义了三个错误类：

urllib.error.URLError
- reason：出错原因
urllib.error.HTTPError
- code：出错码
- reason：出错原因
- headers：http响应头
- urllib.error.HTTPError是urllib.error.URLError的子类
urllib.error.ContentTooShortError

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

urllib.parse.urlencode()用来把一个字典数据转换成get请求的参数

from urllib.parse import urlencode
params = {
    'name': 'yindf',
    'age':22
}
base_url = 'http://www.baidu.com?'
url = base_url + urlencode(params)
print(url)
================================================================================================
>> http://www.baidu.com?name=yindf&age=22

Urllib库

1.`urllib.request`

1.1 `urllib.request.urlopen()`

1.1.1 `url`

1.1.2 `data`

1.1.3 `timeout`

1.2 响应

1.2.1 响应类型

1.2.2 状态码与响应头

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

1.3.2 使用`Request`对象发起请求并携带额外数据

1.3.2 使用`Request`对象添加头信息

2. `urllib.error`

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

4. `urllib.robotparse`

猜你喜欢

热点阅读

Urllib库

1.urllib.request

1.1 urllib.request.urlopen()

1.1.1 url

1.1.2 data

1.1.3 timeout

1.2 响应

1.2.1 响应类型

1.2.2 状态码与响应头

1.3 Request对象

1.3.1 使用Request对象发起请求

1.3.2 使用Request对象发起请求并携带额外数据

1.3.2 使用Request对象添加头信息

2. urllib.error

3. urllib.parse

3.1 urllib.parse.urlencode()

4. urllib.robotparse

猜你喜欢

热点阅读

1.`urllib.request`

1.1 `urllib.request.urlopen()`

1.1.1 `url`

1.1.2 `data`

1.1.3 `timeout`

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

1.3.2 使用`Request`对象发起请求并携带额外数据

1.3.2 使用`Request`对象添加头信息

2. `urllib.error`

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

4. `urllib.robotparse`