我可以把craiglist的第一页废了。但Linkextractor并没有从其他页面获取数据。我在定义规则时是不是做错了什么?在
import scrapy
from craiglist.items import craiglistItem
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class ExampleSpider(CrawlSpider):
name = "craiglist"
allowed_domains = ["craiglist.org"]
start_urls = (
'http://sfbay.craigslist.org/search/npo',
)
rules = [
Rule(LinkExtractor(restrict_xpaths='//a[@class="button next"]'), callback='parse', follow= True)
]
def parse(self, response):
titles = response.selector.xpath('//*[@id="sortable-results"]/ul/li/p')
items = []
for title in titles:
item = craiglistItem()
item["title"] = title.select("a/text()").extract()
item["link"] = title.select("a/@href").extract()
items.append(item)
return items
我已经修改了代码,现在它可以正常工作了。下面是工作代码。在
相关问题 更多 >
编程相关推荐