用Scrapy爬行不成功

2024-09-30 22:26:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用scrapy捕捉站点https://www.kalkhoff-bikes.com/的所有产品名称。但结果不如预期。我做错了什么? 我的第一次尝试是:

import scrapy

class ToScrapeSpider(scrapy.Spider):
    name = 'Kalkhoff_1'

    start_urls = [
        'https://www.kalkhoff-bikes.com/'
    ]
    allowed_domains = [
        'kalkhoff-bikes.com'
    ]

    def parse(self, response):
        for item in response.css('ul.navMain__subList--sub > li.navMain__subItem'):
            yield {
                    'Name': item.css("span.navMain__subText::text").get()
                   }

        for href in response.css('li.navMain__item a::attr(href)'):
            yield response.follow(href, self.parse)

之后,我读到,如果有一个动态的内容,那么解决方案应该是飞溅。所以我试了这个:

import scrapy
from scrapy_splash import SplashRequest

class ToScrapeSpider(scrapy.Spider):
    name = 'Kalkhoff_2'

    start_urls = [
        'https://www.kalkhoff-bikes.com/'
    ]
    allowed_domains = [
        'kalkhoff-bikes.com'
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse,
                endpoint='render.html',
                args={'wait':0.5},
            )

    def parse(self, response):
        for item in response.css('ul.navMain__subList--sub > li.navMain__subItem'):
            yield {
                    'Name': item.css("span.navMain__subText::text").get()
                   }

        for href in response.css('li.navMain__item a::attr(href)'):
            yield response.follow(href, self.parse)

不幸的是,我没有得到所有的产品名称。我走对了吗


Tags: inselfcomforparseresponseitemstart