scrapy学习笔记(〇)
人若无名,便可潜心练剑。
本篇补冲基础知识:scrapy架构安装和shell用法。
安装顺序:
1. Python 3.6
2. python -m pip install pywin32
3. python -m pip install lxml
4. python -m pip install setuptools
5. python -m pip install zope.interface
6. 下载对应已编译版本:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted
pip install .\Twisted-18.7.0-cp36-cp36m-win_amd64.whl
python -m pip install Twisted
7. python -m pip install pyOpenSSL
8. python -m pip install scrapy
建议的运行终端:powershell
运用scrapy shell进行爬虫预分析:
1. scrapy shell http://www.weather.com.cn/weather1d/101250101.shtml
2. print(response)看网页响应状态
3. response.body看有无内容
4. view(response)弹出浏览器看下载的网页是否与原网页一致
5. response.xpath('//p[contains(@class,"tem")]/span/text()').extract()提取信息
6. response.xpath('//p[contains(@class,"tem")]/span/text()').re('\d{0,2}\')[0]正则提取信息(提取两位数字)