垃圾CSV输出重复字段

import scrapy class Wotd(scrapy.Item): word = scrapy.Field() definition = scrapy.Field() sentence = scrapy.Field() translation = scrapy.Field() class WotdSpider(scrapy.Spider): name = 'wotd' allowed_domains = ['www.spanishdict.com/wordoftheday'] start_urls = ['http://www.spanishdict.com/wordoftheday/'] custom_settings = { #specifies exported fields and their order 'FEED_EXPORT_FIELDS': ['word','definition','sentence','translation'] } def parse(self, response): jobs = response.xpath('//div[@class="sd-wotd-text"]') for job in jobs: item = Wotd() item['word'] = job.xpath('.//a[@class="sd-wotd-headword-link"]/text()').extract_first() item['definition'] = job.xpath('.//div[@class="sd-wotd-translation"]/text()').extract_first() item['sentence'] = job.xpath('.//div[@class="sd-wotd-example-source"]/text()').extract_first() item['translation'] = job.xpath('.//div[@class="sd-wotd-example-translation"]/text()').extract_first() yield item

1条回答

网友

1楼 · 发布于 2024-09-24 22:30:27

首先，您没有分享您当前的项目结构，因此很难建议将其放在具体示例中的位置。在

假设您的项目名为my_project。在主项目目录（包含settings.py）下，使用以下内容创建文件exporters.py：

import scrapy.exporters

class NoHeaderCsvItemExporter(scrapy.exporters.CsvItemExporter):
    def __init__(self, file, join_multivalued=', ', **kwargs):
        super(NoHeaderCsvItemExporter, self).__init__(file=file, include_headers_line=False, join_multivalued=join_multivalued, **kwargs)

类NoHeaderCsvItemExporter继承自标准CSV导出器，只是指定我们不希望在输出中包含头行。在

接下来，您必须为CSV格式指定新的导出器类，可以是settings.py，也可以是spider的custom_settings。按照您当前的方法和后面的选项，它将是：

^{pr2}$

请注意，使用这个类，CSV中不会包含任何标题行，甚至第一次导出时也不会。在

相关问题更多 >

编程相关推荐

热门问题

热门文章