我有一个初始链接,从中可以获得页面数。如何获取起始链接的url
分页不起作用:
DEBUG: Crawled (404) <GET https://www.healthgrades.com/api3/&pageNum=2> (referer: https://www.healthgrades.com/api3/usearch?where=CA&sessionId=%7BsessionId%7D&requestId=%7BrequestId%7D&sort.provider=bestmatch&source=init&what=%7Bspecialty%7D&category=provider&cid&debug=false&debugParams=false&isPsr=false&isFsr=false&isFirstRequest=true&userLocalTime=23%3A55)
蜘蛛网:
def start_requests(self):
yield scrapy.Request('https://www.healthgrades.com/api3/usearch?where=CA&sessionId={sessionId}&requestId={requestId}' +
'&sort.provider=bestmatch&source=init&what={specialty}&category=provider&cid&debug=false&d' +
'ebugParams=false&isPsr=false&isFsr=false&isFirstRequest=true&userLocalTime=23%3A55',
callback=self.pagination)
def pagination(self, response):
jsonresponse = json.loads(response.body_as_unicode())
totalPages = jsonresponse['search']['searchResults']['totalPages']
for page in range(1, totalPages):
page = '&pageNum=%s' % page
yield scrapy.Request(urljoin(response.request.url, page), callback=self.profile_link)
目前没有回答
相关问题 更多 >
编程相关推荐