我想对光盘上的图像文件执行重复数据消除。我有一个json文件,它描述了成对的副本(从DuplicateImageFinder输出)。如果我要配置自动删除规则,因为通常有两个以上的重复图像,我可能会取消所有图像实例的链接。示例json文件如下所示:
{"images" : [
{"image1": "./folder1/IMG_013251.jpg", "image2": "./folder3/IMG_013251.jpg", "similarity": 100},
{"image1": "./folder1/IMG_013251.jpg", "image2": "./folder5/IMG-WA0149.jpg", "similarity": 100},
{"image1": "./folder1/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "similarity": 100},
{"image1": "./folder5/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "similarity": 100},
{"image1": "./folder2/IMG-WA0149.jpg", "image2": "./folder3/IMG-WA0125.jpg", "similarity": 100},
{"image1": "./folder3/IMG_045262.jpg", "image2": "./folder8/IMG_013251.jpg", "similarity": 100},
{"image1": "./folder4/IMG-WA0024.jpg", "image2": "./folder1/IMG-WA0079.jpg", "similarity": 100},
{"image1": "./folder5/IMG-WA0130.jpg", "image2": "./folder4/IMG-WA0024.jpg", "similarity": 100}]}
我的第一个想法是修改json,使其看起来像这样,但无法理解逻辑:
{"images" : [
{"image1": "./folder1/IMG_013251.jpg", "image2": "./folder3/IMG_013251.jpg", "image3": "./folder5/IMG-WA0149.jpg", "similarity": 100},
{"image1": "./folder1/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "image3": "./folder5/IMG-WA0149.jpg", "similarity": 100},
{"image1": "./folder2/IMG-WA0149.jpg", "image2": "./folder3/IMG-WA0125.jpg", "similarity": 100},
{"image1": "./folder3/IMG_045262.jpg", "image2": "./folder8/IMG_013251.jpg", "similarity": 100},
{"image1": "./folder4/IMG-WA0024.jpg", "image2": "./folder1/IMG-WA0079.jpg", "image3": "./folder5/IMG-WA0130.jpg", "similarity": 100}]}
我最初的方法是创建两个列表,然后将每个元素与其他元素进行比较,将重复项放入字典中。我试过这个,但它没有给我有用的输出。我还研究了dict.update()方法,但不确定如何首先识别重复的dict。我还能怎么做呢
谢谢你
一种方法是计算等价集
基本上,假设相似性关系是可传递的,您将迭代夫妇列表并生成所有等价图片的集合。然后从集合中取出一个实例并取消其他实例的链接
例如,基于您的数据的集合将是:
从中,您可以选择一个实例来保存并取消与其他实例的链接
使用数据布局计算等价集的方法是:
相关问题 更多 >
编程相关推荐