动态启动URL值

class SiteFeedConstructor(CrawlSpider, FeedConstructor): name = "Data_Feed" start_urls = ['http://www.cnn.com/'] def __init__(self, *args, **kwargs): FeedConstructor.__init__(self, **kwargs) kwargs = {} super(SiteFeedConstructor, self).__init__(*args, **kwargs) self.name = str(self.config_json.get('name', 'Missing value')) self.start_urls = str(self.config_json.get('start_urls', 'Missing value')) self.start_urls = self.start_urls.split(",")

1条回答

网友

1楼 · 发布于 2024-09-30 22:13:51

不是定义__init__()重写^{}方法：

This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests. This method is also called only once from Scrapy, so it’s safe to implement it as a generator.

class SiteFeedConstructor(CrawlSpider, FeedConstructor):
    name = "Data_Feed"

    def start_requests(self):
        self.name = str(self.config_json.get('name', 'Missing value'))
        for url in str(self.config_json.get('start_urls', 'Missing value')).split(","):
            yield self.make_requests_from_url(url)

相关问题更多 >

编程相关推荐

热门问题

热门文章