我的Scrapy代码只是抓取网页上的链接,但没有抓取任何数据。实际上,我正试图为我的项目抓取一些有关冠状病毒大流行的数据(比如国家名称、该国的城市,然后是病例数、伤亡人数等)。输出是调试:Crawled(200)在cmd中。我正试图从WorldMeters网站上抓取它(作为scrapy的新手,我知道的不多,并且提供了图像链接作为输出参考)
# -*- coding: utf-8 -*-
import scrapy
import logging
class CountriesSpider(scrapy.Spider):
name = 'countries'
allowed_domains = ['www.worldometers.info']
start_urls = ['http://www.worldometers.info/coronavirus/']
def parse(self, response):
countries = response.xpath("//td/a")
for country in countries:
country_name = country.xpath(".//text()").get()
country_link = country.xpath(".//@href").get()
#To access the country link
absolute_url = response.urljoin(country_link)
yield scrapy.Request(url = absolute_url,callback = self.parse_country) #Or do directly--> yield response.follow(url = country_link)
def parse_country(self,response):
rows = response.xpath("(//table[@class = 'table table-bordered table-hover table-responsive usa_table_countries dataTable no-footer'])[1]/tbody/tr")
for row in rows:
city = row.xpath(".//td[1]/text()").get()
cases = row.xpath(".//td[2]/text()").get()
deaths = row.xpath(".//td[4]/text()").get()
active_cases = row.xpath(".//td[6]/text()").get()
yield {
"City":city,
"Total_Number_of_Cases": cases,
"Deaths":deaths,
"Active_Cases":active_cases
}
目前没有回答
相关问题 更多 >
编程相关推荐