scrapyd自定义下载pipeline
2017-08-01 本文已影响0人
汤汤汤汤汤雪林
当标准scrapyd 下载 pipeline 无法满足需求时,可以自定义pipeline。
仅举例文件下载和图片下载pipeline。
扩展文件(图片) FilesPipeline (ImagesPipeline)仅需重写以下两个方法:
get_media_request(self, item, info) # 返回一个Request对象
# 当上面的Requsts下载完成后回调这个方法,然后填充files或images字段
item_completed(self, results, item, info)
举例:
pipelines.py
import scrapyd
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class MyImagePipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def item_completed(self, request, item, info):
image_paths = [x['path'] for ok, x in request if ok]
if not image_paths:
raise DropItem("item contains no images")
item['image_paths'] = image_paths
return item