使用相同的函数对多个类别和子类别进行刮削/解析

2024-05-05 19:35:58 发布

您现在位置：Python中文网/ 问答频道 /正文

4553

网友

男 | 程序猿一只，喜欢编程写python代码。

我有一个工作（在大多数情况下）代码刮一个电子商务网站。我从一个URL开始抓取主要的分类，然后深入到一个子分类，再做同样的事情，直到我在产品页面上登陆。你知道吗

看起来是这样的：

class ExampleSpider(scrapy.Spider):
    name = "example_bot"  # how we have to call the bot
    start_urls = ["https://......html"]

def parse(self, response):
    for link in response.css('div.mvNavSub ul li a::attr(href)').extract():
        yield response.follow(link, callback = self.parse_on_categories) #going to one layer deep from landing page

def parse_on_categories(self, response):
    for link in response.css('div.mvNavSub ul li a::attr(href)').extract():
        yield response.follow(link, callback = self.parse_on_subcategories) #going to two layer deep from landing page

def parse_on_subcategories(self, response):
    (same code than above)

def parse_data(self, response):
    (parse data)

我注意到，与网站的某些部分相比，我必须更深入地分析子类别才能解析产品。由于我总是重用相同的代码来爬网类别，我想知道是否有可能重用第一个函数，直到没有更多的类别来爬网。以下是我尝试的：

def parse(self, response):
    for link in response.css('div.mvNavSub ul li a::attr(href)').extract():
        yield response.follow(link, callback = self.parse_on_categories)

def parse_on_categories(self, response):
    if response.css('div.mvNavSub ul li a::attr(href)').extract(): # if there is categories to crawl
        self.parse(response)
    else:
        self.parse_data(response)

def parse_data(self, response):

如果有需要爬网的类别，我想解析\u on \u categories来调用第一个函数。如果没有，应该调用parse\u数据。你知道吗

但目前我还不能让它工作，所以如果你能让我走上正轨，我将非常感激：）谢谢

Tags： to self div parse on response def link

1条回答

网友

1楼 · 发布于 2024-05-05 19:35:58

您必须从parse（）和parse\ data（）方法中得到任何结果。你知道吗

def parse_on_categories(self, response):
    if response.css('div.mvNavSub ul li a::attr(href)').extract():
        callback = self.parse
    else:
        callback = self.parse_data

    for r in callback(response):
        yield r

使用相同的函数对多个类别和子类别进行刮削/解析

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用相同的函数对多个类别和子类别进行刮削/解析

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >