高度优化的搜索树(红黑、展开和排序列表),带有可选的扩展(动态顺序统计、间隔树等)

Banyan的Python项目详细描述


这个包提供了python的setdict(带有可选的扩展)的排序插入版本。由于是基于树的,它们在简单的查找和修改方面不如基于散列的容器(如python的内置)高效。相反:

  • (常见情况:)对于修改和查找与排序迭代交错的情况,它们的效率远远高于它们。
  • (不太常见的情况是:)通过可选的tree augmentation,它们在利用底层树结构(例如,该项在集合中的顺序位置是什么)进行其他类型的有用查询时的效率远远高于它们。哪个几何区间与此区间重叠?).

功能

  • 支持针对不同用例的高性能算法:

  • 提供pythonic接口:

    • Collections are almost entirely drop-in sorted replacements for Python’s set and dict
    • User defined ^{tt1}$ functions (or older style ^{tt2}$ functions) are supported
    • Methods take slices and ranges where applicable
    • Pickle is supported
    • Reverse-order views are provided
  • 允许使用附加节点元数据和树方法的可选tree augmentation

    • Dynamic order statistics allow queries for the k-th item
    • Interval trees allow geometric querying
    • Any user-defined algorithm can be easily plugged in

    Note

    This feature can be almost entirely ignored if all that is needed are efficient sorted drop-in replacemnt containers for Python’s sets and dicts.

  • 优化实现(请参阅联机文档中的Performance部分):

    • C++ templated backend drives the implementation. C++ template metaprogramming is used to avoid run-time queries along the search path
    • Homogeneous-key trees optimization

几个快速示例

注意

以下示例假设首先键入:

>>> from banyan import *
  • 选择一个适合设置的算法,并获得python内置项的下拉排序替换:

    • A (red-black tree) general drop-in replacement for Python’s set:

      >>> t = SortedSet([2, 3, 1])
      >>> t
      SortedSet([1, 2, 3])
      >>> assert 2 in t
      >>> t.add(4)
      >>> len(t)
      4
      >>> t.add(4)
      >>> len(t)
      4
      >>> t.remove(4)
      >>> len(t)
      3
      >>> t.remove(4)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "banyan/__init__.py", line 299, in remove
          self._tree.erase(item)
      KeyError: 4
      
    • 一个基于splay的排序的drop-in替换python的dict,针对临时局部访问进行了优化:

      >>> t = SortedDict([(2, 'b'), (3, 'c'), (1, 'a')], alg = SPLAY_TREE)
      >>> print(list(t))
      [1, 2, 3]
      >>> assert 1 in t
      >>> assert 4 not in t
      >>> # Now access the key 2 for awhile
      >>> t[2] = 'e'
      >>> t[2] = 'f'
      >>> t[2] = 'g'
      >>> t[2] = 'a'
      >>> t[2] = 'b'
      >>> t[2] = 'c'
      >>> t[2] = 'd'
      >>> t[2] = 'e'
      
    • python的frozenset的(基于排序列表的)排序内存效率下拉列表:

      >>> t = FrozenSortedSet(['hao', 'jiu', 'mei', 'jian'])
      >>> assert 'hao' in t
      >>> assert 'zaijian' not in t
      >>> t.add('zaijian')
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      AttributeError: 'FrozenSortedSet' object has no attribute 'add'
      
  • 指定比较条件-例如,使用带有小写比较的字符串字典:

    • Using the newer-style ^{tt1}$ parameter:
    >>> t = SortedDict(key = str.lower)
    >>> t['hao'] = 3
    >>> t['Hao'] = 4
    >>> t
    SortedDict({'Hao': 4})
    
    • Using the older-style (pre-Py3K) ^{tt2}$ parameter:
    >>> t = SortedDict(compare = lambda x, y: cmp(str.lower(x), str.lower(y)))
    >>> t['hao'] = 3
    >>> t['Hao'] = 4
    >>> t
    SortedDict({'Hao': 4})
    
  • 使用范围和切片:
    >>> import string
    >>>
    >>> t = SortedDict(zip(string.ascii_lowercase, string.ascii_uppercase))
    >>>
    >>> # Erase everything starting at 'e'
    >>> del t['e': ]
    >>> t
    SortedDict({'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'})
    >>>
    >>> # View the items between 'b' and 'd'
    >>> t.items('b', 'd')
    ItemsView([('b', 'B'), ('c', 'C')])
    >>>
    >>> # View the values of the keys between 'a' and 'c', in reverse order
    >>> t.values('a', 'c', reverse = True)
    ValuesView(['B', 'A'])
    >>>
    >>> # Set the three values of the keys between 'a' and 'd' to 2
    >>> t['a': 'd'] = [2, 2, 2]
    >>> t
    SortedDict({'a': 2, 'b': 2, 'c': 2, 'd': 'D'})
    
  • 易于使用的同构密钥优化:

    >>> # Simply specify the key type as key_type - no other changes are needed.
    >>> t = SortedSet([1, 2, 3], key_type = int)
    >>> assert 2 in t
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], key_type = str)
    >>> assert 'hola' not in t
    
  • 使用pythonic版本的C++/STL’s lower_bound/upper_bound
    >>> from itertools import *
    >>>
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'])
    >>> t
    SortedSet(['hao', 'jian', 'jiu', 'mei'])
    >>> assert 'hao' in t
    >>>
    >>> # Find the key after 'hao'
    >>> keys = t.keys('hao', None)
    >>> next(islice(keys, 1, None))
    'jian'
    
  • 利用树结构可获得其他高效功能:

    • Use an overlapping-interval updator for creating a data structure that can efficiently answer overlapping queries:
    >>> t = SortedSet([(1, 3), (2, 4), (-2, 9)], updator = OverlappingIntervalsUpdator)
    >>>
    >>> print(t.overlap_point(-5))
    []
    >>> print(t.overlap_point(5))
    [(-2, 9)]
    >>> print(t.overlap_point(3.5))
    [(-2, 9), (2, 4)]
    >>>
    >>> print(t.overlap((-10, 10)))
    [(-2, 9), (1, 3), (2, 4)]
    

    For high performance, combine this with homogeneous-keys optimization:

    >>> t = SortedSet(key_type = (int, int), updator = OverlappingIntervalsUpdator)
    >>> t.add((1, 3))
    >>> t.overlap_point(2)
    [(1, 3)]
    >>>
    >>> t = SortedSet(key_type = (float, float), updator = OverlappingIntervalsUpdator)
    >>> t.add((1.0, 3.0))
    >>> t.overlap_point(2.5)
    [(1.0, 3.0)]
    
    • Use a rank updator for creating a data structure that can efficiently answer order queries:
    >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator)
    >>> t
    SortedSet(['hao', 'jian', 'jiu', 'mei'])
    >>>
    >>> # 'hao' is item no. 0
    >>> t.kth(0)
    'hao'
    >>> t.order('hao')
    0
    >>>
    >>> # 'mei' is item no. 3
    >>> t.kth(3)
    'mei'
    >>> t.order('mei')
    3
    
    • Use a min-gap updator for creating a data structur that can efficiently answer smallest-gap queries:
    >>> t = SortedSet([1, 3, 2], updator = MinGapUpdator)
    >>> t
    SortedSet([1, 2, 3])
    >>> t.min_gap()
    1
    >>> t.remove(2)
    >>> t
    SortedSet([1, 3])
    >>> t.min_gap()
    2
    

下载、安装、文档和错误跟踪

  • 包裹在PyPI

  • 通常使用python库的设置。类型:

    ^{tt5}$

    or

    ^{tt6}$

    Note

    To install this package from the source distribution, the system must have a C++ compiler installed. The setup script will invoke this compiler.

    Using Python 2.x on Windows will attempt to invoke Visual Studio 2008. If you are using a later version, download and extract the archive; then, from within the Banyan directory, use

    ^{tt7}$

    or

    ^{tt8}$

    (for Visual Studio 2010 and 2012, respectively), followed by

    ^{tt9}$

  • 文档位于PyPI Docs,也可以在发行版的“docs”目录中找到。

  • 错误跟踪在Google Code上。

更改

Changes
VersionDateDescription
0.1.505/04/2013Faster red-black tree iteration; minor documentation bugfixes
0.1.401/04/2013User key-function specification optimization; performance tests for dictionary types; better warnings for user mismatched policies
0.1.3.530/3/2013OverlappingIntervalsUpdator: more regression tests + performance improvements + performance comparison tests
0.1.328/03/2013OverlappingIntervalsUpdator implemented; minor documentation bugfixes
0.1.226/03/2013Efficient C++ RankUpdator and MinGapUpdator; MinMaxUpdator out; major internal simplification
0.1.017/03/2013Initial release

计划功能

  • 改进文档和性能文档w.r.t.密钥类型和策略规范。
  • 对用户不匹配的策略给出更好的警告
  • 实现优先级搜索树更新程序

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Glassfish3+Webstart+JNLP不会启动   支持单元格选择的java行   java FTP组织。阿帕奇。平民网格式错误的ServerReplyException:截断的服务器回复:“220”   java为什么公用文件夹下的更改会触发自动重新加载?   使用Java graphics相对于线条摆动打印线条标签   java从代码中的任何位置访问HttpContext、HttpServletRequest和HttpServletResponse   java如何跳过JSON中的第一个元素   Java 安卓列表字符串带有数字和国家/地区字母的排序字符串   从文件创建响应elasticsearch响应时发生java错误   java哈希表与关键词的获取   amazon web服务如何获取特定的lambda度量统计信息,比如使用java的iteratorAge   从异步任务返回对象时出现java问题   java bufferedReader。readLine()无法读取整个文件行   如何在JavaSpring引导应用程序中跟踪用户的登录   java为什么从基类继承的子类方法不能打印自身字段的值?