python scrapy爬虫框架
2018-02-28 本文已影响15人
proud2008
[基于scrapyd爬虫发布总结]
参考
pip 安装 scrapyd,Scrapyd-client两个工具
1、运行服务端
PS C:\WINDOWS\system32> scrapyd
2018-03-01T15:35:58+0800 [-] Loading c:\users\administrator\appdata\local\programs\python\python36-32\lib\site-packages\scrapyd\txapp
.py...
2018-03-01T15:35:59+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2018-03-01T15:35:59+0800 [-] Loaded.
2018-03-01T15:35:59+0800 [twisted.application.app.AppLogger#info] twistd 17.9.0 (c:\users\administrator\appdata\local\programs\python
\python36-32\python.exe 3.6.4) starting up.
2018-03-01T15:35:59+0800 [twisted.application.app.AppLogger#info] reactor class: twisted.internet.selectreactor.SelectReactor.
2018-03-01T15:35:59+0800 [-] Site starting on 6800
2018-03-01T15:35:59+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x060E07F0>
2018-03-01T15:35:59+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=16, runner='scrapyd.runner'
2018-03-01T15:36:11+0800 [twisted.python.log#info] "127.0.0.1" - - [01/Mar/2018:07:36:10 +0000] "GET / HTTP/1.1" 200 699 "-" "Mozilla
/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"
2018-03-01T15:36:11+0800 [twisted.python.log#info] "127.0.0.1" - - [01/Mar/2018:07:36:10 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "
http://127.0.0.1:6800/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.
36"
浏览器打开 http://127.0.0.1:6800/ 说明服务端运行正常了
image.png
2、客户端打包上传到服务端
命令行 scrapyd-deploy 客户端工具将egg文件发布到服务端 settools支持
注win下, scrapyd-deploy 找不到 在安装目录下 Python\Scripts 添加
scrapyd-deploy.bat
内容如下
@echo off
"C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\python.exe" "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\Scripts\scrapyd-deploy" %1 %2 %3 %4 %5 %6 %7 %8 %9
项目中scrapy.cfg
deploy 前的url#去掉 取消注释
如下
[settings]
default = scrapy1.settings
[deploy]
url = http://localhost:6800/
project = scrapy1
查看服务列表即deploy 中的url配置
>scrapyd-deploy -l
default http://localhost:6800/
发布客户端包
scrapyd-deploy <target> -p <project> --version <version>
>scrapyd-deploy default -p scrapy1
Packing version 1519890216
Deploying to project "scrapy1" in http://localhost:6800/addversion.json
Server response (200):
{"node_name": "SC-201711261536", "status": "ok", "project": "scrapy1", "version": "1519890216", "spiders": 8}
3、测试
通过http请求的方式调用详细查看
http://scrapyd.readthedocs.io/en/latest/api.html