在抓取第一页之前，如何在不知道曲奇是什么的情况下传递饼干？问题的回答

在抓取第一页之前，如何在不知道曲奇是什么的情况下传递饼干？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我想从多个页面获取数据。如果我想从第二页获取数据，我应该使用cookies传递搜索项（因为搜索项不会出现在url中） 第一页的url是： <blockquote> <a href="http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky" rel="nofollow">http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky</a> </blockquote> 第二页的url是： <blockquote> <a href="http://epgd.biosino.org/EPGD/search/textsearch.jsp?currentIndex=10" rel="nofollow">http://epgd.biosino.org/EPGD/search/textsearch.jsp?currentIndex=10</a> </blockquote> 我在堆栈溢出中看到了很多问题，他们在爬网之前都知道cookies是什么数据。但是只有当我完成第一页的爬行后，我才能得到饼干。所以我想知道怎么处理这个？这是我的代码： <pre><code>__author__ = 'Rabbit' from scrapy.spiders import Spider from scrapy.selector import Selector from scrapy_Data.items import EPGD class EPGD_spider(Spider): name = "EPGD" allowed_domains = ["epgd.biosino.org"] stmp = [] term = "man" url_base = "http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky" start_urls = stmp def parse(self, response): sel = Selector(response) sites = sel.xpath('//tr[@class="odd"]|//tr[@class="even"]') for site in sites: item = EPGD() item['genID'] = map(unicode.strip, site.xpath('td[1]/a/text()').extract()) item['taxID'] = map(unicode.strip, site.xpath('td[2]/a/text()').extract()) item['familyID'] = map(unicode.strip, site.xpath('td[3]/a/text()').extract()) item['chromosome'] = map(unicode.strip, site.xpath('td[4]/text()').extract()) item['symbol'] = map(unicode.strip, site.xpath('td[5]/text()').extract()) item['description'] = map(unicode.strip, site.xpath('td[6]/text()').extract()) yield item </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在抓取第一页之前，如何在不知道曲奇是什么的情况下传递饼干？

1 个回答

相关Python问题