当上传到S3时，如何为一个项目中的多个文件url定制一个路径？

{'file_urls': ['http://i.stack.imgur.com/tKsDb.png', 'http://i.stack.imgur.com/NAGkl.png'], 'files': [{'checksum': 'b0974ea6c88740bed353206b279e0827', 'path': 'full/762f5682798c5854833316fa171c71166e284630.jpg', 'url': 'http://i.stack.imgur.com/tKsDb.png'}, {'checksum': '9a42f7bd1dc45840312fd49cd08e6a5c', 'path': 'full/615eabb7b61e79b96ea1ddb34a2ef55c8e0f7ec3.jpg', 'url': 'http://i.stack.imgur.com/NAGkl.png'}]}

1条回答

网友

1楼 · 发布于 2024-10-02 00:43:18

是的，如果您查看scrapy files管道here的源代码，这是可能的

您将看到它有一些可以重写的方法，其中之一是file_path方法，您可以通过将其添加到管道类中来覆盖它，如下所示

    def file_path(self, request, response=None, info=None):
    # start of deprecation warning block (can be removed in the future)
        def _warn():
            from scrapy.exceptions import ScrapyDeprecationWarning
            import warnings
            warnings.warn('FilesPipeline.file_key(url) method is deprecated,\
            please use file_path(request, response=None, info=None) instead',
                      category=ScrapyDeprecationWarning, stacklevel=1)

    # check if called from file_key with url as first argument
    if not isinstance(request, Request):
        _warn()
        url = request
    else:
        url = request.url

    # detect if file_key() method has been overridden
    if not hasattr(self.file_key, '_base'):
        _warn()
        return self.file_key(url)
    # end of deprecation warning block

    # Modify the file path HERE to your own custom path 
    filename = request.meta['filename']
    media_ext = 'jpg'
    return '%s/%s/%s.%s' % \
        (request.meta['image_category'],
            request.meta['image_month'],
            filename, media_ext)

其结果将是一个目录，如：

^{pr2}$

如果您查看代码的最后几行[这是我添加的唯一代码，其余的代码是因为该方法来自于蹩脚的源代码]

    # Modify the file path HERE to your own custom path 
    filename = request.meta['filename']
    media_ext = 'jpg'
    return '%s/%s/%s.%s' % \
        (request.meta['image_category'],
            request.meta['image_month'],
            filename, media_ext)

返回自定义路径现在这个路径依赖于一些东西，在spider上，您可以收集图像元字段，如图像名的文件名、图像类别和其他任何图像拍摄日期等，并在管道中使用它来创建一个自定义目录

相关问题更多 >

编程相关推荐

热门问题

热门文章