<p>你的xpath有个问题-它们应该是相对的。代码如下:</p>
<pre><code>from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class CraigslistItem(Item):
title = Field()
link = Field()
class CraigslistSpider(BaseSpider):
name = "craigslist_unique"
allowed_domains = ["craiglist.org"]
start_urls = [
"http://sfbay.craigslist.org/search/sof?zoomToPosting=&query=&srchType=A&addFour=part-time",
"http://newyork.craigslist.org/search/sof?zoomToPosting=&query=&srchType=A&addThree=internship",
"http://seattle.craigslist.org/search/sof?zoomToPosting=&query=&srchType=A&addFour=part-time"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("//span[@class='pl']")
items = []
for site in sites:
item = CraigslistItem()
item['title'] = site.select('.//a/text()').extract()[0]
item['link'] = site.select('.//a/@href').extract()[0]
items.append(item)
return items
</code></pre>
<p>如果通过以下方式运行:</p>
^{pr2}$
<p>你会看到的输出.json公司名称:</p>
<pre><code>{"link": "/sby/sof/3824966457.html", "title": "HR Admin/Tech Recruiter"}
{"link": "/eby/sof/3824932209.html", "title": "Entry Level Web Developer"}
{"link": "/sfc/sof/3824500262.html", "title": "Sr. Ruby on Rails Contractor @ Funded Startup"}
...
</code></pre>
<p>希望有帮助。在</p>