Python：如何加速这种耗尽方法？崔？

2024-09-27 23:25:15 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个语料库，有几行句子，单词之间没有空格：

thenextdayonmayanarchistsstagedarallyatchicagoshaymarketsquare
abombwasthrownbyanunknownpartyneartheconclusionoftherallykillinganofficer
intheensuingpanicpoliceopenedfireonthecrowdandeachother
sevenpoliceofficersandatleastfourworkerswerekilled

我需要用字典里的词汇把每个句子分开，比如： {'the': 1, 'next':2, 'thenext':3'...}数字只是频率，在这里并不重要。你知道吗

输出将是分段的变体（列表），就像： [[the, next, day...], [thenext, day...]...]

这是我的代码（filter_worddict是字典）

def segment(sentence):
    if sentence == '':
        yield []
    for w in filter_worddict:
        if sentence.startswith(w):
            for rest in segment(sentence[len(w):]):
                yield [w] + rest

with open('sentences.txt', 'r') as f4, open('result.txt', 'w') as f5:
    for line4 in f4:
        line4 = line4.strip()
        corpus = list(segment(line4))
        for corpusline in corpus:
            f5.write(str(corpusline) + '\n')

这段代码如何加速？上一次我试着用一个语料库（不到30MB），字典是5MB，花了48小时。你知道吗

我环顾四周，在Trie和Pytrie之间做出了选择，这似乎是一个很有希望的解决方案。但我不知道在这种情况下该怎么做。提前谢谢！你知道吗

Tags： the 代码 in for 字典 segment filter sentence

0条回答

目前没有回答

Python：如何加速这种耗尽方法？崔？

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：如何加速这种耗尽方法？崔？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >