高度优化的搜索树(红黑、展开和排序列表),带有可选的扩展(动态顺序统计、间隔树等)
Banyan的Python项目详细描述
这个包提供了python的set和dict(带有可选的扩展)的排序插入版本。由于是基于树的,它们在简单的查找和修改方面不如基于散列的容器(如python的内置)高效。相反:
- (常见情况:)对于修改和查找与排序迭代交错的情况,它们的效率远远高于它们。
- (不太常见的情况是:)通过可选的tree augmentation,它们在利用底层树结构(例如,该项在集合中的顺序位置是什么)进行其他类型的有用查询时的效率远远高于它们。哪个几何区间与此区间重叠?).
功能
支持针对不同用例的高性能算法:
- Red-black trees for normal cases
- Splay trees for temporal locality cases (i.e., when only a small subset of items will actually be accessed in any time period)
- Sorted lists for infrequent-update cases
提供pythonic接口:
允许使用附加节点元数据和树方法的可选tree augmentation:
- Dynamic order statistics allow queries for the k-th item
- Interval trees allow geometric querying
- Any user-defined algorithm can be easily plugged in
Note
This feature can be almost entirely ignored if all that is needed are efficient sorted drop-in replacemnt containers for Python’s sets and dicts.
优化实现(请参阅联机文档中的Performance部分):
- C++ templated backend drives the implementation. C++ template metaprogramming is used to avoid run-time queries along the search path
- Homogeneous-key trees optimization
几个快速示例
注意
以下示例假设首先键入:
>>> from banyan import *
选择一个适合设置的算法,并获得python内置项的下拉排序替换:
A (red-black tree) general drop-in replacement for Python’s set:
>>> t = SortedSet([2, 3, 1]) >>> t SortedSet([1, 2, 3]) >>> assert 2 in t >>> t.add(4) >>> len(t) 4 >>> t.add(4) >>> len(t) 4 >>> t.remove(4) >>> len(t) 3 >>> t.remove(4) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "banyan/__init__.py", line 299, in remove self._tree.erase(item) KeyError: 4
一个基于splay的排序的drop-in替换python的dict,针对临时局部访问进行了优化:
>>> t = SortedDict([(2, 'b'), (3, 'c'), (1, 'a')], alg = SPLAY_TREE) >>> print(list(t)) [1, 2, 3] >>> assert 1 in t >>> assert 4 not in t >>> # Now access the key 2 for awhile >>> t[2] = 'e' >>> t[2] = 'f' >>> t[2] = 'g' >>> t[2] = 'a' >>> t[2] = 'b' >>> t[2] = 'c' >>> t[2] = 'd' >>> t[2] = 'e'
python的frozenset的(基于排序列表的)排序内存效率下拉列表:
>>> t = FrozenSortedSet(['hao', 'jiu', 'mei', 'jian']) >>> assert 'hao' in t >>> assert 'zaijian' not in t >>> t.add('zaijian') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'FrozenSortedSet' object has no attribute 'add'
指定比较条件-例如,使用带有小写比较的字符串字典:
- Using the newer-style ^{tt1}$ parameter:
>>> t = SortedDict(key = str.lower) >>> t['hao'] = 3 >>> t['Hao'] = 4 >>> t SortedDict({'Hao': 4})
- Using the older-style (pre-Py3K) ^{tt2}$ parameter:
>>> t = SortedDict(compare = lambda x, y: cmp(str.lower(x), str.lower(y))) >>> t['hao'] = 3 >>> t['Hao'] = 4 >>> t SortedDict({'Hao': 4})
- 使用范围和切片:
>>> import string >>> >>> t = SortedDict(zip(string.ascii_lowercase, string.ascii_uppercase)) >>> >>> # Erase everything starting at 'e' >>> del t['e': ] >>> t SortedDict({'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}) >>> >>> # View the items between 'b' and 'd' >>> t.items('b', 'd') ItemsView([('b', 'B'), ('c', 'C')]) >>> >>> # View the values of the keys between 'a' and 'c', in reverse order >>> t.values('a', 'c', reverse = True) ValuesView(['B', 'A']) >>> >>> # Set the three values of the keys between 'a' and 'd' to 2 >>> t['a': 'd'] = [2, 2, 2] >>> t SortedDict({'a': 2, 'b': 2, 'c': 2, 'd': 'D'})
易于使用的同构密钥优化:
>>> # Simply specify the key type as key_type - no other changes are needed. >>> t = SortedSet([1, 2, 3], key_type = int) >>> assert 2 in t >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], key_type = str) >>> assert 'hola' not in t
- 使用pythonic版本的C++/STL’s lower_bound/upper_bound:
>>> from itertools import * >>> >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian']) >>> t SortedSet(['hao', 'jian', 'jiu', 'mei']) >>> assert 'hao' in t >>> >>> # Find the key after 'hao' >>> keys = t.keys('hao', None) >>> next(islice(keys, 1, None)) 'jian'
利用树结构可获得其他高效功能:
- Use an overlapping-interval updator for creating a data structure that can efficiently answer overlapping queries:
>>> t = SortedSet([(1, 3), (2, 4), (-2, 9)], updator = OverlappingIntervalsUpdator) >>> >>> print(t.overlap_point(-5)) [] >>> print(t.overlap_point(5)) [(-2, 9)] >>> print(t.overlap_point(3.5)) [(-2, 9), (2, 4)] >>> >>> print(t.overlap((-10, 10))) [(-2, 9), (1, 3), (2, 4)]
For high performance, combine this with homogeneous-keys optimization:
>>> t = SortedSet(key_type = (int, int), updator = OverlappingIntervalsUpdator) >>> t.add((1, 3)) >>> t.overlap_point(2) [(1, 3)] >>> >>> t = SortedSet(key_type = (float, float), updator = OverlappingIntervalsUpdator) >>> t.add((1.0, 3.0)) >>> t.overlap_point(2.5) [(1.0, 3.0)]
- Use a rank updator for creating a data structure that can efficiently answer order queries:
>>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator) >>> t SortedSet(['hao', 'jian', 'jiu', 'mei']) >>> >>> # 'hao' is item no. 0 >>> t.kth(0) 'hao' >>> t.order('hao') 0 >>> >>> # 'mei' is item no. 3 >>> t.kth(3) 'mei' >>> t.order('mei') 3
- Use a min-gap updator for creating a data structur that can efficiently answer smallest-gap queries:
>>> t = SortedSet([1, 3, 2], updator = MinGapUpdator) >>> t SortedSet([1, 2, 3]) >>> t.min_gap() 1 >>> t.remove(2) >>> t SortedSet([1, 3]) >>> t.min_gap() 2
下载、安装、文档和错误跟踪
包裹在PyPI。
通常使用python库的设置。类型:
^{tt5}$
or
^{tt6}$
Note
To install this package from the source distribution, the system must have a C++ compiler installed. The setup script will invoke this compiler.
Using Python 2.x on Windows will attempt to invoke Visual Studio 2008. If you are using a later version, download and extract the archive; then, from within the Banyan directory, use
^{tt7}$
or
^{tt8}$
(for Visual Studio 2010 and 2012, respectively), followed by
^{tt9}$
文档位于PyPI Docs,也可以在发行版的“docs”目录中找到。
错误跟踪在Google Code上。
更改
Version | Date | Description |
---|---|---|
0.1.5 | 05/04/2013 | Faster red-black tree iteration; minor documentation bugfixes |
0.1.4 | 01/04/2013 | User key-function specification optimization; performance tests for dictionary types; better warnings for user mismatched policies |
0.1.3.5 | 30/3/2013 | OverlappingIntervalsUpdator: more regression tests + performance improvements + performance comparison tests |
0.1.3 | 28/03/2013 | OverlappingIntervalsUpdator implemented; minor documentation bugfixes |
0.1.2 | 26/03/2013 | Efficient C++ RankUpdator and MinGapUpdator; MinMaxUpdator out; major internal simplification |
0.1.0 | 17/03/2013 | Initial release |
计划功能
- 改进文档和性能文档w.r.t.密钥类型和策略规范。
- 对用户不匹配的策略给出更好的警告
- 实现优先级搜索树更新程序