刮取结果导出问题

2024-10-04 03:22:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个简单的蜘蛛在网站上搜索细节。当我在控制台上运行它时,我得到了输出,但是如果我使用-o filename.json将它放入一个文件中,它只会在文件中给我一个[。我该怎么办

我的蜘蛛看起来像

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

def parse(self, response):
    title=response.xpath('//a[@class="fk-display-block"]/text()').extract()
    print title

我在控制台上的输出

[u"\n Asst JKT8810 Full Sleeve Self Design Men's Cotton ", u' ', u"\n Justanned Full Sleeve Solid Men's Bomber ", u' ', u"\n Pepe Sleeveless Solid Men's ", u' ', u"\n Platinum Studio Sleeveless Solid Men's Nehru ", u' ', u"\n Yepme Sleevele ss Solid Men's ", u' ', u'\n Love Leather ', u" Full Sleeve Solid Men's Puleather Ja...\n ", u"\n Justanned Full Sleeve Solid Men's Bomber ", u' ', u"\n Oceanic Full Sleeve Self Design Men's ", u' ', u"\n Dooda Full Sleeve Solid Men's ", u' ', u"\n Bare Skin Full Sleeve Self Design Men's ", u' ', u"\n Asst Full Sleeve Solid Women's ", u' ', u"\n Locomotive F ull Sleeve Men's ", u' ', u"\n Justanned Full Sleeve Solid Women's Leather ", u' ', u' ', u"\n Wrangler Sleeveless Solid Men's ", u' ', u"\n TSX Sleeveless Solid Men's Bomber ", u' ']

但是当我执行scrapy crawl spider_name -o filename.json操作时,我在文件中没有得到相同的输出


Tags: 文件importselfjsonfilenamefullscrapydesign
1条回答
网友
1楼 · 发布于 2024-10-04 03:22:39

这是因为您需要返回^{}实例:

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
    name = "chillum"
    allowed_domains = ["flipkart.com"]
    start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

    def parse(self, response):
        titles = response.xpath('//a[@class="fk-display-block"]/text()').extract()
        for title in titles:
            item = TutorialItem()
            item['title'] = title
            yield item

相关问题 更多 >