数据获取-爬虫实践
爬虫入门文章
https://zhuanlan.zhihu.com/p/24669128
https://zhuanlan.zhihu.com/p/24769534
https://zhuanlan.zhihu.com/p/25200262
https://zhuanlan.zhihu.com/p/26257790
userAgent 和 动态IP设置
http://lawtech0902.com/2017/06/11/scrapy-useragent-proxyip/
https://zhuanlan.zhihu.com/p/29733174
https://github.com/hellysmile/fake-useragent
延迟和禁止cookies
https://blkstone.github.io/2016/03/02/crawler-anti-anti-cheat/
PhantomJs 和 selenium 处理Ajax
https://my.oschina.net/lewisgong/blog/872257
https://chaycao.github.io/2016/08/19/Scrapy-Selenium-Phantomjs/
页面解析 Beautiful xpath css.
https://cuiqingcai.com/1319.html
python
lxml安装
https://pypi.org/project/lxml/#files
pip install lxml-4.2.1-cp27-cp27m-win_amd64.whl
https://blog.csdn.net/g1apassz/article/details/46574963
https://blog.csdn.net/acingdreamer/article/details/53348649
pip升级
pip install --upgrade pip
requirements.txt的创建及使用
https://blog.csdn.net/orangleliu/article/details/60958525
python path 引用
https://blog.csdn.net/tony_wong/article/details/18044273
Scrapy安装错误:Microsoft Visual C++ 14.0 is required...
https://blog.csdn.net/nima1994/article/details/74931621?locationNum=10&fps=1
Scrapy shell
https://blog.csdn.net/laoyang360/article/details/52809927
Scrapy运行ImportError: No module named win32api错误
https://blog.csdn.net/u013687632/article/details/57075514
xpath
https://blog.csdn.net/manongpengzai/article/details/77109600
python log
https://blog.csdn.net/chosen0ne/article/details/7319306
scrapy link extrator
https://www.jianshu.com/p/ff9125650697
启动爬虫
进入项目的根目录,执行下列命令启动spider:
scrapy crawl dmoz