Scrapy Xpath在shell中工作，但在cod中不起作用

import scrapy class MySpider(scrapy.Spider): name = 'MySpider' start_urls = [ #WRONG URL, SHOULD BE https://shop.app4health.it/ PROBLEM SOLVED! 'https://www.app4health.it/', ] def parse(self, response): self.logger.info('A response from %s just arrived!', response.url) print ('PRE RISULTATI') results = response.selector.xpath('//*[@id="nav"]/ol/li[*]/a/@href').extract() # results = response.css('li a>href').extract() # This works on scrapy shell, not in code #risultati = response.xpath('//*[@id="nav"]/ol/li[1]/a').extract() print (risultati) #for pagineitems in risultati: # next_page = pagineitems print ('NEXT PAGE') #Ignores the request cause already done. Insert dont filter yield scrapy.Request(url=risultati, callback=self.prodotti,dont_filter = True) def prodotti(self, response): self.logger.info('A REEEESPONSEEEEEE from %s just arrived!', response.url) return 1

2条回答

网友

1楼 · 编辑于 2024-10-01 09:18:31

除了Desperado的答案，您还可以使用css选择器，这些选择器简单得多，但对于您的用例来说已经足够了：

$ scrapy shell "https://shop.app4health.it/"
In [1]: response.css('.level0 .level-top::attr(href)').extract()
Out[1]: 
['https://shop.app4health.it/sonno',
 'https://shop.app4health.it/monitoraggio-e-diagnostica',
 'https://shop.app4health.it/terapia',
 'https://shop.app4health.it/integratori-alimentari',
 'https://shop.app4health.it/fitness',
 'https://shop.app4health.it/benessere',
 'https://shop.app4health.it/ausili',
 'https://shop.app4health.it/prodotti-in-offerta',
 'https://shop.app4health.it/kit-regalo']

scrapy shell命令非常适合调试这样的问题。在

网友

2楼 · 编辑于 2024-10-01 09:18:31

    //nav[@id="mmenu"]//ul/li[contains(@class,"level0")]/a[contains(@class,"level-top")]/@href

使用这个xpath，在创建xpath之前还要考虑页面的“view source”

相关问题更多 >

编程相关推荐

热门问题

热门文章