1122|urllib

2015-11-22 本文已影响0人喵在野

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
Open the URL url, which can be either a string or a Request object.

data must be a bytes object specifying additional data to be sent to the server, or None if no such data is needed. data may also be an iterable object and in that case Content-Length value must be specified in the headers. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided.


Python 3 抓取网页资源的几种办法：

1、最简单
import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()

2、使用 Request
import urllib.request

req = urllib.request.Request('http://python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()

3、发送数据
复制代码
#! /usr/bin/env python3

import urllib.parse
import urllib.request

url = 'http://localhost/login.php'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {
'act' : 'login',
'login[email]' : 'yzhang@i9i8.com',
'login[password]' : '123456'
}

data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
req.add_header('Referer', 'http://www.python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()

print(the_page.decode("utf8"))
复制代码

1122|urllib

猜你喜欢

热点阅读