大数据 爬虫Python AI Sql

python 学习 DAY18笔记

2020-03-10  本文已影响0人  Peng_001

scrapy 009 Using Item Containers and Stroing in JSON, XML and CSV

import scrapy


class QuotetutorialItem(scrapy.Item):
    # define the fields for your item here like:
    title = scrapy.Field()
    author = scrapy.Field()
    tag = scrapy.Field()
import scrapy

from ..items import QuotetutorialItem


class QuoteSpider (scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http://quotes.toscrape.com/'
    ]

    def parse(self, response):

        items = QuotetutorialItem()

        all_div_quotes = response.css('div.quote')

        for quotes in all_div_quotes:
            title = quotes.css('.span.text::text').extract(),
            author = quotes.css('.author::text').extract(),
            tag = quotes.css('.tag::text').extract()

            items['title'] = title
            items['author'] = author
            items['tag'] = tag

            yield items

scrapy 010 Pipelines in Web Scraping

ITEM_PIPELINES = {
   'quotetutorial.pipelines.QuotetutorialPipeline': 300,
}
# 数字越低,pipeline 优先级越高。

可以在该字典中添加其他pipeline,并设定相关顺序(数值)

class QuotetutorialPipeline(object):
    def process_item(self, item, spider):

        print("Pipelines :")
        return item
# 在pipeline 文件中添加内容,也可以通过scrapy 打印出,说明数据经过了pipeline。
class QuotetutorialPipeline(object):
    def process_item(self, item, spider):

        print("Pipelines :" + item['title'][0])
        return item

报错原因为TypeError: can only concatenate str (not "list") to str

上一篇 下一篇

猜你喜欢

热点阅读