程序员

pyspider: swarm分布式爬虫

2017-06-02  本文已影响201人  彭金虎

目录


1. 起点


学习爬虫有一段时间了,使用过Scrapy, 就想试试其它的爬虫框架,选择pyspider也是因为想通过pyspider了解一下分布式爬虫,由于docker技术的成熟,也就顺理成章的选择docker来完成这件事。

2. 搭建swarm


创建节点

创建三个docker machine:

$ docker-machine create --driver virtualbox manager1
$ docker-machine create --driver virtualbox worker1
$ docker-machine create --driver virtualbox worker2

执行如下命令可查看新创建的docker machine, 以及docker machine对用的IP地址

$ docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER        ERRORS
default    *        virtualbox   Running   tcp://192.168.99.100:2376           v17.05.0-ce
manager1   -        virtualbox   Running   tcp://192.168.99.101:2376           v17.05.0-ce
worker1    -        virtualbox   Running   tcp://192.168.99.102:2376           v17.05.0-ce
worker2    -        virtualbox   Running   tcp://192.168.99.103:2376           v17.05.0-ce

创建swarm

登陆 manager1:

$ docker-machine ssh manager1

执行如下命令,创建一个新的swarm

docker@manager1:~$ docker swarm init --advertise-addr 192.168.99.101

Swarm initialized: current node (wpf2jcvhhvfosv3c9ac6c50dh) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-69wvyxsrnjtm11z38eus20tm0z9cof2ks9khzyv7fdo8it0dln-drdoszuykjp1uvhmn2spaa8vj \
    192.168.99.101:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

将节点加入swarm

登陆worker1, 将worker1加入swarm:

docker@worker1:~$ docker swarm join \
>     --token SWMTKN-1-69wvyxsrnjtm11z38eus20tm0z9cof2ks9khzyv7fdo8it0dln-drdoszuykjp1uvhmn2spaa8vj \
>     192.168.99.101:2377
This node joined a swarm as a worker.

登陆worker2, 将worker2加入swarm:

docker@worker2:~$ docker swarm join \
>     --token SWMTKN-1-69wvyxsrnjtm11z38eus20tm0z9cof2ks9khzyv7fdo8it0dln-drdoszuykjp1uvhmn2spaa8vj \
>     192.168.99.101:2377
This node joined a swarm as a worker.

查看当前swarm状态

docker@manager1:~$ docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
k926754fhudg5tu51rnlp2fdj     worker2             Ready               Active
q1seyrwugtdceqd515tmp8ph3     worker1             Ready               Active
wpf2jcvhhvfosv3c9ac6c50dh *   manager1            Ready               Active              Leader

至此,三个节点的swarm已经创建完成。

3. 编写docker-compose.yml


docker-compose.yml请参考

docker-compose.yml

注意事项

4. 部署服务


部署

登陆manager1, 执行如下命令:

docker@manager1:~$ docker stack deploy -c docker-compose.yml myspider

注意事项

swarm服务部署没有严格的顺序,所以会出现mysql, redis服务启动较晚,在service部署要设置restart_policy, 如

docker@manager1:~$ docker stack deploy -c docker-compose.yml myspider
Creating network myspider_cars
Creating service myspider_fetcher
Creating service myspider_processor
Creating service myspider_result-worker
Creating service myspider_webui
Creating service myspider_redis
Creating service myspider_mysql
Creating service myspider_scheduler
Creating service myspider_phantomjs

5. 感想


由于本身对swarm stack缺乏经验,中途一度想放弃swarm, 转而采用k8s,还好坚持下来,所以也就有了这篇记录,列出了当前踩的一些坑。

上一篇 下一篇

猜你喜欢

热点阅读