我想抓取连续的页面,所以如果当前页面有下一个页面,我使用scrapy来生成新的请求,但是我发现它没有调用Request函数。以下是我的代码和结果:
next page url: http://www.wowsai.com/index.php?app=store&act=credit&id=682376&page=2#module
2015-05-06 10:00:47+0800 [spider22] INFO: Closing spider (finished)
def parse(self, response):
...
#2.get the next page url and trigger another request and par
if "page" not in response.url:
nextpage_url = 'ht tp://www.wowsai.com/'+sel.xpath('//div[@id="pageBox"]/a[1]/@href').extract()[0]
else:
nextpage_url = 'htt p://www.wowsai.com/'+sel.xpath('//div[@id="pageBox"]/a[2]/@href').extract()[0]
print "next page url:", nextpage_url
yield Request(nextpage_url, callback=self.parsePage)
def parsePage(self, response):
print response.url
print "here is parsePage"
目前没有回答
相关问题 更多 >
编程相关推荐