爬虫scrapy框架(2)

2019-05-27  本文已影响0人  猛犸象和剑齿虎
t013b9c86f5a43c0037.jpg

框架内的所有文件都不要删除。

框架入门

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class MyspiderItem(scrapy.Item):
    # define the fields for your item here like:
    title = scrapy.Field()#歌曲名
    artist= scrapy.Field()#艺术家
# -*- coding: utf-8 -*-
import scrapy


class MusicspiderSpider(scrapy.Spider):
    name = 'musicspider'#爬虫识别名称
    allowed_domains = ['htqyy.com']#爬虫能够爬取的网址范围
    start_urls = ['http://www.htqyy.com/top/musicList/hot?pageIndex=0&pageSize=20']#爬取的起始url

    def parse(self, response):
        filename='music.html'
        data= response.body#获取响应内容
        open(filename,'wb').write(data)#写入本地,请求的动作被框架完成

在这个文件夹中打开命令窗口


image.png
输入:scrapy crawl musicspider image.png
上一篇 下一篇

猜你喜欢

热点阅读