Scrapy:如何解析多个页面?

2024-09-27 19:26:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个初始链接,从中可以获得页面数。如何获取起始链接的url

分页不起作用:

DEBUG: Crawled (404) <GET https://www.healthgrades.com/api3/&pageNum=2> (referer: https://www.healthgrades.com/api3/usearch?where=CA&sessionId=%7BsessionId%7D&requestId=%7BrequestId%7D&sort.provider=bestmatch&source=init&what=%7Bspecialty%7D&category=provider&cid&debug=false&debugParams=false&isPsr=false&isFsr=false&isFirstRequest=true&userLocalTime=23%3A55)

蜘蛛网:

def start_requests(self): 
    yield scrapy.Request('https://www.healthgrades.com/api3/usearch?where=CA&sessionId={sessionId}&requestId={requestId}' +
                                 '&sort.provider=bestmatch&source=init&what={specialty}&category=provider&cid&debug=false&d' + 
                                 'ebugParams=false&isPsr=false&isFsr=false&isFirstRequest=true&userLocalTime=23%3A55', 
                                  callback=self.pagination)

def pagination(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    totalPages = jsonresponse['search']['searchResults']['totalPages']

    for page in range(1, totalPages):
        page = '&pageNum=%s' % page  
        yield scrapy.Request(urljoin(response.request.url, page), callback=self.profile_link)

Tags: httpsselfcomfalseurl链接responsewww

热门问题