我的Scrapy代码只是在爬网（Debug:Crawled（200）），但没有爬网任何数据

2024-05-19 18:18:58 发布

男 | 程序猿一只，喜欢编程写python代码。

我的Scrapy代码只是抓取网页上的链接，但没有抓取任何数据。实际上，我正试图为我的项目抓取一些有关冠状病毒大流行的数据（比如国家名称、该国的城市，然后是病例数、伤亡人数等）。输出是调试：Crawled（200）在cmd中。我正试图从WorldMeters网站上抓取它（作为scrapy的新手，我知道的不多，并且提供了图像链接作为输出参考）

# -*- coding: utf-8 -*-
import scrapy
import logging

class CountriesSpider(scrapy.Spider):
    name = 'countries'
    allowed_domains = ['www.worldometers.info']
    start_urls = ['http://www.worldometers.info/coronavirus/']

def parse(self, response):
    countries = response.xpath("//td/a")
    for country in countries:
        country_name = country.xpath(".//text()").get()     
        country_link = country.xpath(".//@href").get()
        #To access the country link
        absolute_url = response.urljoin(country_link)
        yield scrapy.Request(url = absolute_url,callback = self.parse_country)     #Or do directly-->  yield response.follow(url = country_link)

def parse_country(self,response):
    rows = response.xpath("(//table[@class = 'table table-bordered table-hover table-responsive usa_table_countries dataTable no-footer'])[1]/tbody/tr")
    for row in rows:
        city = row.xpath(".//td[1]/text()").get()
        cases = row.xpath(".//td[2]/text()").get()
        deaths = row.xpath(".//td[4]/text()").get()
        active_cases = row.xpath(".//td[6]/text()").get()

        yield {
        "City":city,
        "Total_Number_of_Cases": cases,
        "Deaths":deaths,
        "Active_Cases":active_cases
        }

enter image description here

Tags： text self url get parse response table link

0条回答

目前没有回答

我的Scrapy代码只是在爬网（Debug:Crawled（200）），但没有爬网任何数据

相关问题更多 >

编程相关推荐

热门问题

热门文章

我的Scrapy代码只是在爬网（Debug:Crawled（200）），但没有爬网任何数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >