爬虫scrapy框架(8)——添加请求头
2019-06-01 本文已影响2人
猛犸象和剑齿虎
![](https://img.haomeiwen.com/i1920664/e6ab07ac726900dc.jpg)
- 请求头是用户模拟浏览器爬取网站的反爬虫措施,所以请求头信息在爬虫中还是十分重要的。
- 简单回顾在urllib和requests模块中的运用:
- urllib:将请求头信息以字典形式写入header中,然后在自定义的请求中传入headers参数。
req=request.Request(url,headers=header) - requests:在get方法中传入网址和请求头信息。
response=requests.get("http://www.baidu.com/s?",headers=headers)
在普通方法中请求头信息都是比较容易的,同样的在scrapy框架中请求头同样也是容易的,只是形式略有区别。
# -*- coding: utf-8 -*-
import scrapy
import random
#User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE
#http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule
class YdspiderSpider(scrapy.Spider):
name = 'ydspider'
allowed_domains = ['fanyi.youdao.com']
# start_urls = ['http://youdao.com/']
def start_requests(self):
url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
UserAgents=['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE']
UserAgent=random.choice(UserAgents)
headers={'User-Agent':UserAgent}
#向队列中加入post请求
yield scrapy.FormRequest(
url=url,
headers=headers,
formdata={
'i':'男人',
'from':'AUTO',
'to':'AUTO',
'smartresult':'dict',
'client':'fanyideskweb',
'salt':'15589655028559',
'sign':'6781389ab298673f7036bce9cd99815b',
'ts':'1558965502855',
'bv':'ab57a166e6a56368c9f95952de6192b5',
'doctype':'json',
'version':'2.1',
'keyfrom':'fanyi.web',
'action':'FY_BY_REALTlME'
},
callback=self.parse
)
def parse(self, response):
print('-----------------------------------------------------------')
print(response.body)
最后在黑屏终端中输入:scrapy crawl ydspider 结果与上节结果一样。