我的Scrapy代码只是在爬网(Debug:Crawled(200)),但没有爬网任何数据

2024-05-19 18:18:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我的Scrapy代码只是抓取网页上的链接,但没有抓取任何数据。实际上,我正试图为我的项目抓取一些有关冠状病毒大流行的数据(比如国家名称、该国的城市,然后是病例数、伤亡人数等)。输出是调试:Crawled(200)在cmd中。我正试图从WorldMeters网站上抓取它(作为scrapy的新手,我知道的不多,并且提供了图像链接作为输出参考)

# -*- coding: utf-8 -*-
import scrapy
import logging

class CountriesSpider(scrapy.Spider):
    name = 'countries'
    allowed_domains = ['www.worldometers.info']
    start_urls = ['http://www.worldometers.info/coronavirus/']

def parse(self, response):
    countries = response.xpath("//td/a")
    for country in countries:
        country_name = country.xpath(".//text()").get()     
        country_link = country.xpath(".//@href").get()
        #To access the country link
        absolute_url = response.urljoin(country_link)
        yield scrapy.Request(url = absolute_url,callback = self.parse_country)     #Or do directly-->  yield response.follow(url = country_link)

def parse_country(self,response):
    rows = response.xpath("(//table[@class = 'table table-bordered table-hover table-responsive usa_table_countries dataTable no-footer'])[1]/tbody/tr")
    for row in rows:
        city = row.xpath(".//td[1]/text()").get()
        cases = row.xpath(".//td[2]/text()").get()
        deaths = row.xpath(".//td[4]/text()").get()
        active_cases = row.xpath(".//td[6]/text()").get()

        yield {
        "City":city,
        "Total_Number_of_Cases": cases,
        "Deaths":deaths,
        "Active_Cases":active_cases
        }

enter image description here


Tags: textselfurlgetparseresponsetablelink