python爬虫每周500字有些文章不一定是为了上首页投稿

爬虫基础系列urllib——构造随机请求头(4)

2019-05-05  本文已影响5人  猛犸象和剑齿虎
1920664-0c61644217f76c3a.jpg

随机取出请求头

agent1="Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0"
agent2="User-Agent:Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
agent3="Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50"
agent4="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5"
agent5="Opera/9.80 (Android 2.3.4; Linux; Opera Mobi/build-1107180945; U; en-GB) Presto/2.8.149 Version/11.10"

random模块取出随机请求头放入请求中

#将请求头信息封装入Request对抗反爬虫机制通常需要多个请求头信息
#将请求头信息做成字典形式方便封装用随机模块随机调用请求头信息
from urllib import request
import random
import re
#User-Agent:"Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0"
agent1="Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0"
agent2="User-Agent:Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
agent3="Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50"
agent4="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5"
agent5="Opera/9.80 (Android 2.3.4; Linux; Opera Mobi/build-1107180945; U; en-GB) Presto/2.8.149 Version/11.10"
#用列表装请求头信息
list1=[agent1,agent2,agent3,agent4,agent5]
#用随机模块取值
agent=random.choice(list1)
print(agent)
#创建请求头信息
header={"User-Agent":agent}

url=r"http://www.baidu.com/"
# request自动创建请求对象不方便使用所以要创建用户自定义请求
mq=request.Request(url,headers=header)#将请求头信息以参数方式传入请求
reponse=request.urlopen(mq).read().decode()
pat=r"<title>(.*?)</title>"
data=re.findall(pat,reponse)
print(data)

运行结果:

Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5
['百度一下']

爬虫基础系列urllib——构造请求头(3)

上一篇下一篇

猜你喜欢

热点阅读