<p>按照<a href="https://stackoverflow.com/users/129600/acushner">acushner</a>的答案,如果我能计算数据集元素内容的哈希值,就有可能使它工作。你知道吗</p>
<pre><code>import pickle
from collections import defaultdict
def tupelize_dict(ds):
t = {}
d = defaultdict(list)
for k, v in ds.items():
h = dumps(ds)
t[h] = v
d[h].append(k)
return {tuple(v): t[k] for k, v in d.items()}
</code></pre>
<p>这个解决方案比我原来的提议快得多。你知道吗</p>
<p>为了测试它,我制作了一组大的随机嵌套字典,并在两个实现上运行<code>cProfile</code>:</p>
<pre><code>original: 204.9 seconds
new: 6.4 seconds
</code></pre>
<p><strong>编辑:</strong></p>
<p>我意识到<code>dumps</code>不适用于某些字典,因为键的顺序可能会因模糊的原因而在内部发生变化(参见<a href="https://stackoverflow.com/questions/40976060/how-to-hash-a-dictionary-in-python">question</a>)</p>
<p>一个解决方法是订购所有的dict:</p>
<pre><code>import copy
import collections
def faithfulrepr(od):
od = od.deepcopy(od)
if isinstance(od, collections.Mapping):
res = collections.OrderedDict()
for k, v in sorted(od.items()):
res[k] = faithfulrepr(v)
return repr(res)
if isinstance(od, list):
for i, v in enumerate(od):
od[i] = faithfulrepr(v)
return repr(od)
return repr(od)
def tupelize_dict(ds):
taxonomy = {}
binder = collections.defaultdict(list)
for key, value in ds.items():
signature = faithfulrepr(value)
taxonomy[signature] = value
binder[signature].append(key)
def tu(keys):
return tuple(sorted(keys)) if len(keys) > 1 else keys[0]
return {tu(keys): taxonomy[s] for s, keys in binder.items()}
</code></pre>