Scrapy 学习1

2017-11-09 本文已影响5人法号无涯

1. 建立在twisted这个异步框架上的，因此非常高效，也支持异步

就像给100个人打电话时，要一个个打，还要等每个人接电话

其他工具： lxml、urllib2（Replaced by requests）、beautiful soup、selenium

如果是简单的任务的化可以由上面的这些工具完成，无需上scrapy

Desktop／courses／scrapy-udemy路径下建了个env目录作为virtualenv并开始安装scrapy，命令是： pip install scrapy

创建工程

命令： scrapy 工程名

scrapy genspider spidername url

同一个工程下可以有多个spider，用命令 spider list 可以查看有哪些spider

scrapy shell 打开shell

fetch('url') 可以用来获取网页内容

response.css('h1')

reponse.css('h1::text')

In [25]:response.xpath('h1') outputs Out[25]:[]

response.xpath('//h1') outputs 正确内容

reponse.xpath('//h1/a')

response.xpath('//h1/a/text()')

response.xpath('//h1/a/text()').extract()

response.xpath('//h1/a/text()').extract_first()

response.xpath('//*[@class="tag"]')

response.xpath('//*[@class="tag-item"]/a/text()')

response.xpath('//*[@class="tag-item"]/a/text()').extract()