urllib的使用(2)

2018-06-03  本文已影响21人  shenyoujian

在Python3.x中,urllib包下有四个模块分别是

1、urllib.parse.urlencode()

#/usr/bin/env python3
#-*- utf8 -*-

from urllib import request, parse

word = {'ljs':"李建生"}

# 通过urllib.urlencode()方法,将字典键值对按URL编码转换,从而能被web服务器接受
encode = parse.urlencode(word)
print(encode)

# 通过urllib.unquote()方法,把URL编码字符串,转换为原始字符串
print(parse.unquote("ljs=%E6%9D%8E%E5%BB%BA%E7%94%9F"))


#ljs=%E6%9D%8E%E5%BB%BA%E7%94%9F
#ljs=李建生

Get方式

https://www.baidu.com/s?wd=%E7%8E%8B%E8%8F%8A&rsv_spt=1&rsv_iqid=0xb5506d1e00018681&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&oq=%25E7%258E%258B%25E8%258F%258A&rsv_t=76c5iEe0sNG%2F7ghbj2%2B9%2FiRWb4Mz2vab6PycV4g33SQf2TehbSOJ105%2FDBFBm6qT4UR%2F&rsv_pq=ef586b1600016dd0
#/usr/bin/env python3
#-*- utf8 -*-

from urllib import request, parse

url = 'http://www.baidu.com/s'
word = {'wj':'王菊'}
word = parse.urlencode(word)                # 转换成url编码格式

newurl = url + '?' + word

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}

req = request.Request(newurl, headers=headers)

res = request.urlopen(req)

print(res.read())

批量爬取贴吧网页数据

浏览器输入一个百度贴吧的地址,比如:
百度贴吧LOL吧第一页:https://tieba.baidu.com/f?kw=lol&ie=utf-8&pn=0
第二页:https://tieba.baidu.com/f?kw=lol&ie=utf-8&pn=50
第三页:https://tieba.baidu.com/f?kw=lol&ie=utf-8&pn=100

上一篇 下一篇

猜你喜欢

热点阅读