擅长:python、mysql、java
<p>简单明了:)</p>
<p>检查一下<a href="http://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script" rel="noreferrer">official documentation</a>。我会做一些改变,这样你就可以控制蜘蛛只在你做<code>python myscript.py</code>的时候运行,而不是每次你从它导入的时候运行。只需添加一个<code>if __name__ == "__main__"</code>:</p>
<pre><code>import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
pass
if __name__ == "__main__":
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
</code></pre>
<p>现在将文件保存为<code>myscript.py</code>,并运行“python myscript.py”。</p>
<p>享受吧!</p>