查找关键字以相同前缀开头的字典值的更有效方法

网友

1楼 · 编辑于 2024-09-28 21:22:57

sortedContainerslib有一个SortedDict实现，一旦对dict进行了排序，就可以左等分查找起始位置，右等分查找最后一个位置，然后使用irange获取范围内的键：

from sortedcontainers import SortedDict
from operator import itemgetter
from itertools import takewhile


d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
  "key2":"valD", "key2-22":"valE","key3":"foo" }

key = "key2"
d = SortedDict(sorted(d.items(), key=itemgetter(0)))
start = d.bisect_left(key)
print([d[key] for key in takewhile(lambda x: x.startswith("key2"), d.irange(d.iloc[start]]))
['valD', 'valE']

一旦您维护了一个Sortedict，使用Sortedict会更有效：

^{pr2}$

网友

2楼 · 编辑于 2024-09-28 21:22:57

您可以避免生成dict.keys()（在python2.x中）生成的中间列表：

result = [d[key] for key in d if key.startswith(query)]

但是您很可能希望使用trie而不是字典，这样您就可以找到与具有公共前缀的键相关联的所有值（trie类似于基于前缀的树）。在

Here您可以找到一些不同的尝试实现。在

A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". (source wikipedia)

让我们比较一下不同解决方案的时间安排：

^{pr2}$

# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]

    100 loops, best of 3: 7.83 ms per loop

# 11.72% improvement

# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)

%timeit [pt[s] for s in pt.iterkeys(query)]

    1000 loops, best of 3: 320 µs per loop

# 96.36% improvement

# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
    dt[unicode(key)] = val

%timeit [dt[s] for s in dt.keys(unicode(query))]

    10000 loops, best of 3: 162 µs per loop

# 98.17% improvement

网友

3楼 · 编辑于 2024-09-28 21:22:57

您可以使用suffix tree：

#!/usr/bin/env python2
from SuffixTree import SubstringDict # $ pip install https://github.com/JDonner/SuffixTree/archive/master.zip

d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
      "key2":"valD", "key2-22":"valE" }

a = '\n' # anchor
prefixes = SubstringDict()
for key, value in d.items(): # populated the tree *once*
    prefixes[a + key] = value # assume there is no '\n' in key

for query in ["key1", "key2"]: # perform queries
    print query, prefixes[a + query]

输出

^{pr2}$

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

查找关键字以相同前缀开头的字典值的更有效方法

输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >