人生苦短-python

Scrapy输出CSV指定列顺序

2016-07-28  本文已影响0人  向右奔跑

Scrapy抓取数据输出到CSV文件,不是按照items.py中定义的字段的顺序。

from scrapy import Field,Item

class JsuserItem(Item):

    author = Field()
    url = Field()
    title = Field()
    reads = Field()
    comments = Field()
    likes = Field()
    rewards = Field()

QQ20160728-1.png

如何在输出CSV文件时,按指定的顺序输出呢?

1)在spiders中增加文件csv_item_exporter.py

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class MyProjectCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        delimiter = settings.get('CSV_DELIMITER', ',')
        kwargs['delimiter'] = delimiter

        fields_to_export = settings.get('FIELDS_TO_EXPORT', [])
        if fields_to_export :
            kwargs['fields_to_export'] = fields_to_export

        super(MyProjectCsvItemExporter, self).__init__(*args, **kwargs)

2)settings.py中

FEED_EXPORTERS = {                                                        
    'csv': 'jsuser.spiders.csv_item_exporter.MyProjectCsvItemExporter',   
} #jsuser为工程名                                            
                                                                          
FIELDS_TO_EXPORT = [                                                                                                                         
    'author',                                                             
    'title',                                                              
    'url',                                                                
    'reads',                                                              
    'comments',                                                           
    'likes',                                                              
    'rewards'                                                             
]                                                                         

再次爬取数据时,就会按照指定的列顺序来输出了。

QQ20160728-0.png

还可以在settings.py中指定csv文件中的分隔符

CSV_DELIMITER = "\t"
上一篇下一篇

猜你喜欢

热点阅读