擅长:python、mysql、java
<p>您需要某种数据库或文件来存储一个spider的结果并在另一个spider中读取它们。在</p>
<pre><code>class FirstSpider(Spider):
"""First spider crawls something end stores urls in file, 1 url per newline"""
name = 'first'
start_urls = ['someurl']
storage_file = 'urls.txt'
def parse(self, response):
urls = response.xpath('//a/@href').extract()
with open(self.storage_file, 'a') as f:
f.write('\n'.join(urls) + '\n')
class SecondSpider(Spider):
"""Second spider opens this file and crawls every line in it"""
name = 'second'
def start_requests(self):
file_lines = open(FirstSpider.storage_file)
for line in file_lines:
if not line.strip(): # skip empty lines
continue
yield Request(line.strip())
</code></pre>