1122|urllib

2015-11-22  本文已影响0人  喵在野

http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432688314740a0aed473a39f47b09c8c7274c9ab6aee000

https://docs.python.org/3/library/urllib.request.html?highlight=request.urlopen#urllib.request.urlopen

urllib.request的说明:
https://docs.python.org/3.5/library/urllib.request.html

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
Open the URL url, which can be either a string or a Request object.

data must be a bytes object specifying additional data to be sent to the server, or None if no such data is needed. data may also be an iterable object and in that case Content-Length value must be specified in the headers. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided.


Python 3 抓取网页资源的几种办法:

1、最简单
import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()

2、使用 Request
import urllib.request

req = urllib.request.Request('http://python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()

3、发送数据
复制代码
#! /usr/bin/env python3

import urllib.parse
import urllib.request

url = 'http://localhost/login.php'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {
'act' : 'login',
'login[email]' : 'yzhang@i9i8.com',
'login[password]' : '123456'
}

data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
req.add_header('Referer', 'http://www.python.org/')
response = urllib.request.urlopen(req)
the_page = response.read()

print(the_page.decode("utf8"))
复制代码
上一篇下一篇

猜你喜欢

热点阅读