名称错误:python scrapy中未定义名称“Rule”

2024-10-01 13:34:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下脚本递归爬网一个网站:

#!/usr/bin/python 
import scrapy
from scrapy.selector import Selector
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner

class GivenSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/",
#        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
 #       "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]
    rules = (Rule(LinkExtractor(allow=r'/'), callback=parse, follow=True),)

    def parse(self, response):
        select = Selector(response)
        titles = select.xpath('//a[@class="listinglink"]/text()').extract()
        print ' [*] Start crawling at %s ' % response.url
        for title in titles:
            print '\t %s' % title


#configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()

d = runner.crawl(GivenSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()

当我调用它时:

^{pr2}$

Tags: fromorgimporthttpresponsewwwselectorclass
2条回答

洛伊克·福雷·拉克鲁瓦是对的。但是在当前版本的scray(1.6)中,您需要像这样从scrapy.spiders导入{}:

from scrapy.spiders import Rule

See documentation for more information

如果您查看文档并搜索单词规则,您会发现:

http://doc.scrapy.org/en/0.20/topics/spiders.html?highlight=rule#crawling-rules

由于您没有导入任何内容,很明显规则没有被定义。在

 class scrapy.contrib.spiders.Rule(link_extractor, callback=None, cb_kwargs=None, follow=None, process_links=None, process_request=None)

因此,理论上,您应该能够使用from scrapy.contrib.spiders import Rule导入{}类

相关问题 更多 >