自定义图像管道设置.py

class MyImagesPipeline(ImagesPipeline): def get_media_requests(self, item, info): for image_url in item['image_urls']: yield scrapy.Request(image_url) def item_completed(self, results, item, info): some processing... return item

1条回答

网友

1楼 · 发布于 2024-07-04 06:24:41

两个图像管道都在处理项目中的images_urls字段，这就是为什么要两次获取它们的图像。在

我将尝试使用单个管道并修复其中遇到的任何错误，以获得一个独立的组件来处理整个图像处理。特别是，您必须更好地处理来自ImagesPipeline的继承。在

关于KeyError，ImagesPipeline.item_completed方法is in charge of updating the ^{} field in the items，如果您重写它，它将在您需要时不可用。在

要在管道中修复此问题，可以按如下方式进行更新：在

class MyImagesPipeline(ImagesPipeline):
    ...

    def item_completed(self, results, item, info):
        item = super(MyImagesPipeline, self).item_completed(results, item, info)

        some processing...
        return item

我建议检查ImagesPipeline的代码（在Scrapy 1.0中它被放在scrapy/pipelines/images.py中，或者在以前的版本中放在scrapy/contrib/pipeline/images.py中，但是代码实际上是相同的）以完全理解它内部的情况。在

相关问题更多 >

编程相关推荐

热门问题

热门文章