使用Scrapy（美味冰淇淋）绕过弹出窗口

class NutritionSpider(scrapy.Spider): name = 'nutrition' allowed_domains = ['threetwinsicecream.com'] start_urls = ['http://threetwinsicecream.com/'] def parse(self, response): products = response.xpath("//div[@id='pints']/div[2]/div") for product in products: name = product.xpath(".//a/p/text()").extract_first() link = product.xpath(".//a/@href").extract_first() yield scrapy.Request( url=link, callback=self.parse_products, meta={ "name": name, "link": link } ) def parse_products(self, response): name = response.meta["name"] link = response.meta["link"] serving_size = response.xpath("//div[@id='nutritionFacts']/ul/li[1]/text()").extract_first() calories = response.xpath("//div[@id='nutritionFacts']/ul/li[2]/span/text()").extract_first() yield { "Name": name, "Link": link, "Serving Size": serving_size, "Calories": calories }

def parse(self, response): urls = [ "https://threetwinsicecream.com/products/ice-cream/madagascar-vanilla/", "https://threetwinsicecream.com/products/ice-cream/sea-salted-caramel/", ... ] for url in urls: yield scrapy.Request( url=url, callback=self.parse_products ) def parse_products(self, response): pass

1条回答

网友

1楼 · 发布于 2024-09-30 20:19:56

你贴的蜘蛛很管用。至少在我的机器上。我唯一需要改变的是start_urls = ['http://threetwinsicecream.com/']到start_urls = ['https://threetwinsicecream.com/products/ice-cream/']

如果遇到此类问题，可以使用Scrapysopen_in_browser函数，通过该函数可以查看Scrapy在浏览器中看到的内容。它被记录在案here

相关问题更多 >

编程相关推荐

热门问题

热门文章