无法从scrapy中的链接获取任何请求

2024-04-28 06:20:21 发布

您现在位置:Python中文网/ 问答频道 /正文

这里的问题是,它不会从final_url获得任何请求,也无法从链接获取开放时间数据

class YellSpider(scrapy.Spider):
    name = 'yell'
    base_url = 'https://www.yell.com{}'
    start_urls = ['https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=770796459&keywords=hospitals&location=united+kingdom']

    def parse(self, response):
        for data in response.css('div.row.businessCapsule--mainRow'):
            title = data.css('.text-h2::text').get()
            business_url = data.css('a.businessCapsule--title::attr(href)').get()
            final_url = self.base_url.format(business_url)
            avg_rating = response.css('span.starRating--average::text').get()

    def parse_site(self,response):
        req = scrapy.Request(final_url, callback=self.parse_site)
        opening_hours  = response.css('strong::text').get().strip()

            items= {
                'Title': title ,
                'Title Url' : final_url,
                'Average Rating': avg_rating,
                'Hours': opening_hours
            }
            yield items
        pass

1条回答
网友
1楼 · 发布于 2024-04-28 06:20:21

以下内容应该可以解决您遇到的问题。事实证明,您没有向这个parse_site方法发送任何请求,这就是您的请求没有被处理的原因

class YellSpider(scrapy.Spider):
    name = 'yell'
    base_url = 'https://www.yell.com{}'
    start_urls = ['https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=770796459&keywords=hospitals&location=united+kingdom']

    def parse(self, response):
        for data in response.css('div.row.businessCapsule mainRow'):
            title = data.css('.text-h2::text').get()
            business_url = data.css('a.businessCapsule title::attr(href)').get()
            final_url = self.base_url.format(business_url)
            avg_rating = response.css('span.starRating average::text').get()
            
            yield scrapy.Request(
                final_url, 
                callback=self.parse_site,
                cb_kwargs={
                    "title":title,
                    "final_url":final_url,
                    "avg_rating":avg_rating,
                }
            )

    def parse_site(self,response,title,final_url,avg_rating):
        opening_hours  = response.css('strong::text').get()
        opening_hours = opening_hours.strip() if opening_hours else ""

        items = {
            'Title': title ,
            'Title Url' : final_url,
            'Average Rating': avg_rating,
            'Hours': opening_hours
        }
        yield items

相关问题 更多 >