动态启动URL值

2024-09-30 22:13:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我是个新手。我已经编写了一个spider,它可以很好地使用初始化的start_url值。在

如果我在Init中的代码中添加一个文本,它也可以正常工作

{self.start_URL='http://something.com}

但是,当我从一个json文件中读入值并创建一个列表时,我得到了同样的错误:缺少%20

我觉得我错过了一些很明显的东西,无论是刮毛还是Python,因为我是一个裸体。在

class SiteFeedConstructor(CrawlSpider, FeedConstructor):

    name = "Data_Feed"
    start_urls = ['http://www.cnn.com/']

    def __init__(self, *args, **kwargs):

    FeedConstructor.__init__(self, **kwargs)
    kwargs = {}
    super(SiteFeedConstructor, self).__init__(*args, **kwargs)

    self.name = str(self.config_json.get('name', 'Missing value'))
    self.start_urls = str(self.config_json.get('start_urls', 'Missing value'))
    self.start_urls = self.start_urls.split(",")

错误:

^{pr2}$

Tags: nameselfcomconfigjsonhttpinit错误
1条回答
网友
1楼 · 发布于 2024-09-30 22:13:51

不是定义__init__()重写^{}方法:

This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests. This method is also called only once from Scrapy, so it’s safe to implement it as a generator.

class SiteFeedConstructor(CrawlSpider, FeedConstructor):
    name = "Data_Feed"

    def start_requests(self):
        self.name = str(self.config_json.get('name', 'Missing value'))
        for url in str(self.config_json.get('start_urls', 'Missing value')).split(","):
            yield self.make_requests_from_url(url)

相关问题 更多 >