擅长:python、mysql、java
<p>您只需创建一个普通的Python脚本,然后使用Scrapy的命令行选项<a href="http://doc.scrapy.org/en/latest/topics/commands.html#std:command-runspider" rel="nofollow">^{<cd1>}</a>,它允许您运行一个spider,而不必创建一个项目。</p>
<p>例如,您可以创建一个文件<code>stackoverflow_spider.py</code>,其内容如下:</p>
<pre><code>import scrapy
class QuestionItem(scrapy.item.Item):
idx = scrapy.item.Field()
title = scrapy.item.Field()
class StackoverflowSpider(scrapy.spider.Spider):
name = 'SO'
start_urls = ['http://stackoverflow.com']
def parse(self, response):
sel = scrapy.selector.Selector(response)
questions = sel.css('#question-mini-list .question-summary')
for i, elem in enumerate(questions):
l = scrapy.contrib.loader.ItemLoader(QuestionItem(), elem)
l.add_value('idx', i)
l.add_xpath('title', ".//h3/a/text()")
yield l.load_item()
</code></pre>
<p>然后,如果正确安装了scrapy,则可以使用以下命令运行它:</p>
<pre><code>scrapy runspider stackoverflow_spider.py -t json -o questions-items.json
</code></pre>