我在试着让一个非常简单的爬虫来工作。我从parse得到一个NotImplemented错误-本质上与这个问题相同:Why does scrapy throw an error for me when trying to spider and parse a site?
除了我是从crawspider继承的。在
以下是我逐字逐句跟踪的内容: https://github.com/scrapy/scrapy/blob/master/docs/topics/spiders.rst#crawlspider-example
要想出这个代码:
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
class SiteSpider(CrawlSpider):
name = 'sdreader'
allowed_domains = ['sandiegoreader.com']
start_urls = ['http://www.sandiegoreader.com/events/all/']
rules = [Rule(LinkExtractor(allow=['/events/2015/.+', '/events/ongoing/.+']), 'parse_event')]
def parse_event(self, response):
event = EventItem()
event['name'] = response.xpath('//*[@id="content"]/div[2]/h2/text()').extract()
return event
我在日志里看到了这个:
^{pr2}$一定是某个地方的用户错误。我已经调查了爬行蜘蛛的来源,它似乎做了我所期望的。它实现解析并从Spider继承。在
尝试将
Rule
中的回调函数更改为:相关问题 更多 >
编程相关推荐