<p>Scrapy接收并跟踪服务器发送的cookie,并在后续请求时发送它们,就像任何普通的web浏览器一样,检查更多信息<a href="http://doc.scrapy.org/en/latest/faq.html#does-scrapy-manage-cookies-automatically" rel="nofollow">here</a></p>
<p>我看不出您是如何在代码上分页的,但应该是这样的:</p>
<pre><code>class EPGD_spider(Spider):
name = "EPGD"
allowed_domains = ["epgd.biosino.org"]
stmp = []
term = "man"
my_urls = ["http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//tr[@class="odd"]|//tr[@class="even"]')
for site in sites:
item = EPGD()
item['genID'] = map(unicode.strip, site.xpath('td[1]/a/text()').extract())
item['taxID'] = map(unicode.strip, site.xpath('td[2]/a/text()').extract())
item['familyID'] = map(unicode.strip, site.xpath('td[3]/a/text()').extract())
item['chromosome'] = map(unicode.strip, site.xpath('td[4]/text()').extract())
item['symbol'] = map(unicode.strip, site.xpath('td[5]/text()').extract())
item['description'] = map(unicode.strip, site.xpath('td[6]/text()').extract())
yield item
yield Request('http://epgd.biosino.org/EPGD/search/textsearch.jsp?currentIndex=10',
callback=self.parse_second_url)
def parse_second_url(self, response):
# do your thing
</code></pre>
<p>第二个请求携带第一个请求的cookies。在</p>