Scrapy不解析数据

2024-06-28 20:04:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我是scrapy的新手,我正在尝试在json文件中检索我最喜欢的球队的得分。但是,我的json文件保持为空

这是我的密码:

import scrapy
from scrapy.crawler import CrawlerProcess


class SoccerwaySpider(scrapy.Spider):
    name="Soccerway"
    start_urls = ['https://fr.soccerway.com/teams/france/olympique-de-marseille/890/']

    def start_requests(self):
        headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
        for url in self.start_urls:
            yield scrapy.Request(url, headers=headers, callback=self.parse)

    def parse(self,response):
        yield
        {
        'score':str.strip(response.css("table.matches").css('td.score-time.score').css('a::text').get()),
        }

process = CrawlerProcess(settings={
    "FEEDS": {
        "Soccerway.json": {"format": "json"},
    },
})
process.crawl(SoccerwaySpider)
process.start()

提前谢谢你


Tags: 文件importselfjsondefprocessurlsstart
2条回答

您有问题,因为您将{放在了错误的位置。它必须符合yield

yield {
    'score': ...,
}

若你们放入另一行,那个么它会把它当作两个命令

# command 1 - exit function without arguments
yield 

# command 2 - create local dictionary without assigning to variable
{
    'score': ...,
}
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


class SoccerwaySpider(scrapy.Spider):
    name = "Soccerway"
    start_urls = ['https://fr.soccerway.com/teams/france/olympique-de-marseille/890/']
    custom_settings={"FEEDS": {"Soccerway.json": {"format": "json"}}}

    def start_requests(self):
        headers = {
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'
        }
        for url in self.start_urls:
            yield scrapy.Request(url, headers=headers, callback=self.parse)

    def parse(self, response):
        yield {
            'score': str.strip(response.css("table.matches").css('td.score-time.score').css('a::text').get()),
        }


if __name__ == "__main__":
    process = CrawlerProcess(get_project_settings())
    process.crawl('Soccerway')
    process.start()

Soccerway.json:

[
{"score": "2 - 2"}
]

相关问题 更多 >