利用后缀数组和lcp快速查找文本中的子串

2024-10-06 07:40:43 发布

您现在位置：Python中文网/ 问答频道 /正文

6558

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图在巨大的文本中找到包含子串（作为输入）的单词。文本如下：*america*python*erica*escape*。。示例：Input：“rica”=>；输出：美国，erica

我使用后缀数组。在

我的伪代码是：

firstChar=input[0] // the first character of input
suffixArray=getSuffixArray(text) // suffix array
result=[]

for every index of suffix array which points to firstChar:
    length=len(input)
    indexText=text[suffixArray[index]]
    indexes=[]

    if input in text[indexText: indexText+length]:
        word=find whole word containig this index between '*' 
        result.append(word)

这行得通，但太慢了。LCP数组应该可以改进algorhitm的运行时，但我不知道如何改进。你能给我个建议吗？在

提前谢谢！在

Tags： of text 文本 input index 数组 result array

1条回答

网友

1楼 · 发布于 2024-10-06 07:40:43

后缀数组的免费Python代码位于Effcient way to find longest duplicate string。它在一台个人电脑上最多能工作1亿个字符。在

利用后缀数组和lcp快速查找文本中的子串

相关问题更多 >

编程相关推荐

热门问题

热门文章

利用后缀数组和lcp快速查找文本中的子串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >