Scrapyscrapy学习小组scrayp爬虫

Scrapy学习笔记(1)-在虚拟环境中安装scrapy

2016-12-05  本文已影响223人  leeyis

系统环境:CentOS7

本文假设你已经安装了virtualenv,并且已经激活虚拟环境ENV1,如果没有,请参考这里:使用virtualenv创建python沙盒(虚拟)环境

1.安装scrapy依赖文件

#yum install python-devel

#yum install gcc libffi-devel openssl-devel

2.安装Scrapy

#pip install Scrapy

3.测试Scrapy

(ENV1)[eason@localhost ENV1]$ scrapy shell"https://public.tableau.com/zh-cn/s/gallery"2016-11-1311:32:22[scrapy]INFO:Scrapy1.2.1started(bot:scrapybot)2016-11-1311:32:22[scrapy]INFO:Overriddensettings:{'LOGSTATS_INTERVAL':0,'DUPEFILTER_CLASS':'scrapy.dupefilters.BaseDupeFilter'}2016-11-1311:32:22[scrapy]INFO:Enabledextensions:['scrapy.extensions.telnet.TelnetConsole','scrapy.extensions.corestats.CoreStats']2016-11-1311:32:22[scrapy]INFO:Enableddownloader middlewares:['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares.useragent.UserAgentMiddleware','scrapy.downloadermiddlewares.retry.RetryMiddleware','scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware','scrapy.downloadermiddlewares.redirect.RedirectMiddleware','scrapy.downloadermiddlewares.cookies.CookiesMiddleware','scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware','scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats']2016-11-1311:32:22[scrapy]INFO:Enabledspider middlewares:['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','scrapy.spidermiddlewares.referer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy.spidermiddlewares.depth.DepthMiddleware']2016-11-1311:32:22[scrapy]INFO:Enableditem pipelines:[]2016-11-1311:32:22[scrapy]DEBUG:Telnetconsole listening on127.0.0.1:60242016-11-1311:32:22[scrapy]INFO:Spideropened2016-11-1311:32:25[scrapy]DEBUG:Crawled(200)(referer:None)[s]AvailableScrapyobjects:[s]scrapy    scrapymodule(contains scrapy.Request,scrapy.Selector,etc)[s]crawler[s]item{}[s]request[s]response<200https://public.tableau.com/zh-cn/s/gallery>[s]settings[s]spider[s]Usefulshortcuts:[s]shelp()Shellhelp(printthishelp)[s]fetch(req_or_url)Fetchrequest(orURL)andupdatelocalobjects[s]view(response)Viewresponseina browser>>>response.xpath("//div[@class='media-viz']/div[2]/h3[@class='media-viz__title']/a").extract()[u'\u62b1\u6028\u5929\u6c14\uff1a\u56e0\u964d\u6c34\u800c\u5ef6\u8bef\u7684\u7f8e\u56fd\u822a\u73ed',u'\u6700\u9ad8\u6cd5\u9662\u7684\u672a\u6765',u'\u5168\u7403\u6838\u80fd\u4f7f\u7528\u60c5\u51b5',u'\u300a\u6eda\u77f3\u6742\u5fd7\u300b\u201c\u5386\u53f2 500 \u4f73\u4e13\u8f91\u201d',u'\u5386\u53f2\u4e0a\u7684\u91cd\u8981\u751f\u65e5',u'\u544a\u8bc9\u6211 Will \u7684\u60c5\u51b5',u'\u7f8e\u56fd\u5883\u5185 50 \u5e74\u95f4\u72af\u7f6a\u60c5\u51b5',u'NFL \u7684\u5386\u53f2',u'\u6fd2\u5371\u52a8\u7269\u8003\u5bdf',u'The History of the Single Season Home Run Record']>>>

4.一切正常,Done!

更多原创文章,尽在金笔头博客

上一篇下一篇

猜你喜欢

热点阅读