不用写代码的爬虫系列-入门篇
2018-10-19 本文已影响38人
5a41eb2ceec6
这里以抓取慧航老师的知乎回答为例~
1.打开Web Scraper
鼠标右键-检查
![](https://img.haomeiwen.com/i11013023/4e7f5d619b9da458.png)
注:这里要确保检查框出现在浏览器底部,如果未在底部点击红框处调整
![](https://img.haomeiwen.com/i11013023/192b3b931c8fb487.png)
2.创建sitemap
![](https://img.haomeiwen.com/i11013023/3b89c2891f9a4982.png)
- sitemap name:自定义
- start url :当前网址
3.设置selector
![](https://img.haomeiwen.com/i11013023/d67dc39c9ff6c226.png)
- multiple:抓取多个
- delay:延时(2000-5000)
4.启动抓取程序
![](https://img.haomeiwen.com/i11013023/acca9190cb34bbb9.png)
![](https://img.haomeiwen.com/i11013023/f15ec85544f685a2.png)
![](https://img.haomeiwen.com/i11013023/d49279b098031226.png)
注:抓取窗口可以最小化,不关闭;同时可以启动多个程序
5.导出爬取结果
![](https://img.haomeiwen.com/i11013023/08198828be72beac.png)
结果展示
![](https://img.haomeiwen.com/i11013023/1af6e099c055f777.png)
- web-scraper-order:排序
- web-scraper-start-url:start url
- title-link:链接的文字
- title-link-href:链接对应的真实url
系列未完待续~