如何从站点中刮取项目名称，并在这样做的同时进入一个循环，输入项目链接和解析描述？

import scrapy class QuotesSpider(scrapy.Spider): name = 'killertools' start_urls = ['https://www.killertools.com/Dent-Removal-Aluminum-Steel_c_11.html', ] def parse(self, response): for item in response.css('div.name'): yield {'Name': item.xpath('a/text()').get()} next_page = response.css('div.paging a:nth-child(4)::attr("href")').get() if next_page is not None: yield response.follow(next_page, self.parse)

import scrapy class QuotesSpider(scrapy.Spider): name = 'killertools' start_urls = ['https://www.killertools.com/Dent-Removal-Aluminum-Steel_c_11.html', ] def parse(self, response): for item in response.css('div.name'): yield {'Name': item.xpath('a/text()').get()} detail_page = response.css('div.name a::attr("href")').get() if detail_page is not None: yield response.follow(detail_page) for detail in response.css('div.item'): yield {'Description': detail.xpath('p/strong/text').get()} next_page = response.css('div.paging a:nth-child(4)::attr("href")').get() if next_page is not None: yield response.follow(next_page, self.parse)

1条回答

网友

1楼 · 发布于 2024-10-03 02:38:34

For starters I want to scrape item names from all products in first category from both pages if more than 1 page of product is there to browse through.

建议0

试着把你的头绕在XPATH上

建议1

在页面底部，靠近分页链接，您将看到一个“全部查看”。这会将?viewall=1添加到原始URL。对于您提供的URL，所有21项都放在一个上。似乎您不再需要担心分页了

建议2

为了获取产品描述，可以设想两步流程：

收集产品页面的URL
response.xpath('//div[contains(@class, "product-item")]/div[@class="name"]/a/@href').getall()
应该会让你离得很近。
检查URL。您可能需要在基本url前面加上前缀urllib是你的朋友
运行第二个spider来处理所有这些链接，并从中读取描述
在产品页面上，您可以找到结构良好的描述：
response.xpath('//div[@itemprop="description"]/ul/li/text()').getall()
会让你在要点后面排好所有的线

建议3

仁慈点！不要用不必要的请求敲打他们的网站。测试时，自定义设置应包括'HTTPCACHE_ENABLED': True。请查看HTTPCACHE了解详细信息

祝你好运，玩得开心

建议0

建议1

建议2

建议3

相关问题更多 >

编程相关推荐

热门问题

热门文章