我正在尝试抓取这个页面和下一页的链接。https://www.chavesnamao.com.br/carros/usados/brasil/
<a href="javascript:;" onclick="goToPage(2); return false;" rel="next" title="Página 2">2</a>
指向下一页的链接是用这个goToPage()
函数生成的。函数生成如下查询字符串
pg是请求的页面参数。有没有系统的方法来实现一个Rule
在这些页面后面并调用一个解析函数,或者我需要硬编码一个循环来增加pg参数并每次发送请求吗?这是我的问题,我的蜘蛛在下面。在
class MySpider(CrawlSpider):
name = 'myspider'
start_urls = [
'https://www.chavesnamao.com.br/carros/usados/brasil/#{%22pg%22:%221%22,%22or%22:%22%22,%22c%22:%22%22,%22e%22:%22%22,%22r%22:%2250%22,%22view%22:%22tabela%22}',
]
def parse(self, response):
#logging.warning('parse function called on %s', response.url)
#from scrapy.shell import inspect_response
#inspect_response(response, self)
items = response.css('div.view ul#listagem li div a.description::attr(href)').getall()
for item in items:
yield scrapy.Request(
url='{}'.format(item),
method='GET',
callback=self.parse_items,
headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
)
def parse_items(self, response):
# Crawling the item without any problem here
目前没有回答
相关问题 更多 >
编程相关推荐