Python setix包_程序模块 - PyPI

求交集和相似字符串的快速数据结构

setix的Python项目详细描述

===
setix在其核心setix中提供了一个“集合交集索引”，一个反向索引数据结构，用于存储符号集合
，快速查询与给定集合相交的集合，并根据交集数或相似性度量进行排序。在SETIX.TrGM中提供了一种用于索引字符串的包装器，它实现了一个与PostgreSQL扩展PGRTGM兼容的TrimRAM索引。
BR/>实例：BR/>密码块：python

>ix=setix.setinterintersectionindex（）
ix.add（（1，2，3））
ix.add（（1，2，4））
ix.add（（1，2，4，2，3，4））

>ix.find（（1，2，2，2，2，4））
>ix.add（（1，2，4，2，2，4））
>ix.add（（1，2，4，1，2，4，4））
>ix.br/>（1，[（2，3，4）]]
#（前两个结果的顺序可以改变，因为它们的分数相等）

代码块：python

import setix.trgm

ix=setix.trgm.trigramindex（）
ix.add（“强度”）
ix.add（“强度”）
ix.add（“强度和荣誉”）

ix.find（“stremgth”，threshold=1）。get_list（）
#（4，[“强度”]）]

ix.find_similar（“stremgth”，threshold=0.1）。get_list（）
返回[（0.5，[“强度”]），6个交叉点/（9个总计+9个总计-6）
（0.29，[“强度”]），4个交叉点/（9个总计+9个总计-4）
（0.27，[“实力与荣誉”]）]6个交叉点/（9个总数+19个总数-6）

一般来说，要搜索包含拼写错误的单词的短语，可以给出-3*n的阈值，其中n是拼写错误的数目。代码块：：python

ix.find（“stremgth”，threshold=-3）。get_list（）
返回[（6，[“力量与荣誉”]，
（6，[“强度”]]

benchmarks
==

tests/dvd-db-u test中包含一个基准。py

2.6ghz运行的Athlon II的结果：

python 2.7
代码块：none

in[1]：import tests.dvd-db-u test
加载数据库…
提取240577个标题
数据使用的内存：107.8mb
建立索引…
使用的CPU时间：43.1s
索引的唯一三元组：11352
索引的唯一短语：228620
索引使用的内存：80.9mb

在[2]：%时间列表中（tests.dvd-db-u test.titles.find（“daft-punk”，8））
10个循环，每个循环的最佳值为3:27.8 ms

在[3]：%时间列表中（tests.dvd-db-u test.titles.find（“daft-punk”，1））
10个循环，每个循环的最佳值为3:86.4 ms

代码块：none

in[1]：import tests.dvd-db-u test
加载数据库…
提取240577个标题
数据使用的内存：108.8mb
建立索引…
使用的CPU时间：45.8s
索引的唯一三元组：11352
索引的唯一短语：228620
索引使用的内存：86.2mb

在[2]：%时间列表中（tests.dvd-db-u test.titles.find（“daft punk”，8））
10个循环，每个循环的最佳时间为3:27.9ms

在[3]：%时间列表中（tests.dvd-db-u test.titles.find（“daft punk”，1））
10个循环，基准测试中使用的DVD标题列表从http://www.hometheaterinfo.com/dvd list.htm
获得，每次循环的最佳时间为3:86.3ms。

欢迎加入QQ群-->： 979659372

setix 0.8.3

setix的Python项目详细描述

推荐PyPI第三方库

distributions-ex5-zgoe

print-list-module-bymohitbangale

deluge-search

django-allianceutils

lint-test

os-translator

marshmallowannotations

cv2ools

haihonglicom-test-package

pymorph

alex_message_server

sangreal-bt

crawler-toolz

eleg

outlierrm101703074

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

setix 0.8.3

setix的Python项目详细描述

推荐PyPI第三方库

distributions-ex5-zgoe

print-list-module-bymohitbangale

deluge-search

django-allianceutils

lint-test

os-translator

marshmallowannotations

cv2ools

haihonglicom-test-package

pymorph

alex_message_server

sangreal-bt

crawler-toolz

eleg

outlierrm101703074

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签