head first Scrapy

3 Scrapy 爬取(2)

2017-11-10  本文已影响2人  法号无涯

根据前面的知识可以写出一个简单的爬虫,再一步步完善它

# -*- coding: utf-8 -*-
import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    allowed_domains = ['quotes.toscrape.com']
    start_urls = ['http://quotes.toscrape.com/']

    def parse(self, response):
        quotes = reponse.xpath('//*[@class="quote"]')
        for quote in quotes:
            text = quote.xpath('.//*[@class="text"]/text()').extract_first()
            author = quote.xpath('.//*[@itemprop="author"]/text()').extract()
            tags = quote.xpath('.//*[@itemprop="keywords"]/@content').extract()

            print '\n'
            print text
            print author
            print tags
            print '\n'

在爬虫的根目录中输入命令
scrapy crawl quotes

上一篇下一篇

猜你喜欢

热点阅读