Scrapy:尝试将索引中的每个链接作为完整的html fi下载失败

import scrapy import urlparse from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule class EasySpider(CrawlSpider): name = 'easy' allowed_domains = ['web'] start_urls = ['http://www.example.com/index.html'] rules = ( Rule(LinkExtractor(restrict_xpaths='//*[@class="foobar"]//a/@href'), callback='parse_item') ) def parse_item(self, response): filename = urlparse.urljoin(response.url, url) with open(filename, 'wb') as f: f.write(response.body) return

1条回答

网友

1楼 · 发布于 2024-09-29 08:28:56

您的问题是parse_item不在类内部，而是在类外部。所以它不会成为你蜘蛛的一部分

import scrapy
import urlparse
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class EasySpider(CrawlSpider):
    name = 'easy'
    allowed_domains = ['web']
    start_urls = ['http://www.example.com/index.html']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//*[@class="foobar"]//a'), 
             callback='parse_item'), 
    )

    def parse_item(self, response):
       filename = "index.html"
       with open(filename, 'wb') as f:
           f.write(response.body)

       return

相关问题更多 >

编程相关推荐

热门问题

热门文章