我试图了解scrapy是如何工作的,并想知道一旦满足条件,如何阻止蜘蛛。我使用scrapy教程来说明,一旦作者Pablo Neruda被删除,蜘蛛就不应该继续到下一页。它可以完成对页面的刮削,但不必进入下一页。任何帮助都将不胜感激
import scrapy
class AuthorSpider(scrapy.Spider):
name = 'aq1'
start_urls = ['http://quotes.toscrape.com/']
stop_page = 0
def parse(self, response):
author_page_links = response.css('.author + a')
yield from response.follow_all(author_page_links, self.parse_author)
if AuthorSpider.stop_page == 0:
pagination_links = response.css('li.next a')
yield from response.follow_all(pagination_links, self.parse)
else:
pagination_links = " "
yield from response.follow_all(pagination_links, self.parse)
def parse_author(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
yield {
'Name': extract_with_css('h3.author-title::text'),
}
if extract_with_css('h3.author-title::text') == "Pablo Neruda":
AuthorSpider.stop_page = 1
目前没有回答
相关问题 更多 >
编程相关推荐