python集中的哈希表冲突？

type(data_chunks) <class 'set'> len(data_chunks) 43130 same = [x for x in data_chunks if x.md5==chunk.md5] [<Model.Chunk.Chunk o...x0DB40870>, <Model.Chunk.Chunk o...x0DB40870>] len(same) 2 same[0] is same[1] True same[0] == same[1] True len(set(same)) 1

class Chunk(object): def __init__(self, md5, size=None, compressedMd5=None, # ... (more elements) product_id=None): self.md5 = md5 self.product_id = product_id # (etc.) def __eq__(self, other): if self.compressedMd5: return self.compressedMd5 == other.compressedMd5 and self.product_id == other.product_id return self.md5 == other.md5 and self.product_id == other.product_id def __hash__(self): return self.name.__hash__() @property def name(self): return self.compressedMd5 if self.compressedMd5 is not None else self.md5

sfChunk = Chunk( sfCompressedContentMD5, # yes I see that this is compressed md5 - it was intended for some reason I don't know size=sfSize, compressedMd5=sfCompressedContentMD5, compressedSize=sfCompressedSize, product_id=productId ) if not sfChunk in data_chunks: # purly sanity check data_chunks.add(sfcChunk)

2条回答

网友
1楼 · 编辑于 2024-09-30 02:28:01

一个问题是__eq__对于一个有compressedMd5而另一个没有compressedMd5的对象对是不可交换的（即其compressedMd5被设置为None）。这意味着可以构造两个对象a和b，使得a == b同时b != a。你知道吗
一个相关的问题是__eq__和__hash__在相似的情况下不一致（__eq__会拒绝看other.compressedMd5如果self.compressedMd5是None）
易变性也可能是一个问题，如下例所示：
class Chunk(object): def __init__(self, md5): self.md5 = md5 def __hash__(self): return hash(self.md5) s = set() chunk = Chunk('42') s.add(chunk) chunk.md5 = '123' s.add(chunk) print(s)
在我的电脑上，这会产生set([<__main__.Chunk object at 0x106d03390>, <__main__.Chunk object at 0x106d03390>])，即同一对象在集合中出现两次。你知道吗
如果更改md5或设置/取消设置/更改compressedMd5，代码中可能会发生类似的情况。你知道吗

网友
2楼 · 编辑于 2024-09-30 02:28:01

好吧，我知道这是一种很难可靠重现的行为，所以我们所能提供的只是关于可能导致此类问题的建议。。。你知道吗
除了NPE已经提到的，你确实有一个潜在的问题Chunk是可变的：md5和compressedMd5属性可以随时更改，因此hash(chunk)的结果不能保证是稳定的。您可能需要在这里检查您的代码库，以查找最终的小精灵。如果在achunk被添加到集合之后发现任何更新这些属性之一的代码，那么可能就是罪魁祸首。FWIW请记住，Python从不隐式地复制这样的内容：
chunks = set() def make_chunk(md5, ...) c = Chunk(md5, ...) chunks.add(c) return c def do_something_bad(chunk): chunk.md5 = something_else def main(): c = make_chunk() # ... lots of code here do_something_bad(c)
会反映出chunks的变化，把一切都搞砸（注：是的，你很可能已经知道了，但对于来自更主流语言的人来说，这是一个非常常见的陷阱）
注意：这只是一个问题，如果任何东西改变了其中一个属性，当然，但使它们成为只读的仍然会更安全（好吧，至少根据Python对“只读”和“安全”即xD的定义）。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章