如何在Python中有效地使用大型文本语料库的拼写更正

from autocorrect import spell import re WORD = re.compile(r'\w+') def reTokenize(doc): tokens = WORD.findall(doc) return tokens text = ["Hi, welcmoe to speling.","This is jsut an exapmle, but cosnider a veri big coprus."] def spell_correct(text): sptext = [] for doc in text: sptext.append(' '.join([spell(w).lower() for w in reTokenize(doc)])) return sptext print(spell_correct(text))

1条回答

网友

1楼 · 发布于 2024-10-01 17:34:18

正如@khelwood在评论中所说，您应该使用autocorrect.Speller：

from autocorrect import Speller
import re


spell=Speller(lang="en")
WORD = re.compile(r'\w+')
def reTokenize(doc):
    tokens = WORD.findall(doc)
    return tokens

text = ["Hi, welcmoe to speling.","This is jsut an exapmle, but cosnider a veri big coprus."]
def spell_correct(text):
    sptext = []
    for doc in text:
        sptext.append(' '.join([spell(w).lower() for w in reTokenize(doc)]))      
    return sptext    

print(spell_correct(text)) 

#Output
#['hi welcome to spelling', 'this is just an example but consider a veri big corpus']

作为替代方案，您可以使用列表理解来提高速度，也可以使用库^{}，在这种情况下，它可以提高单词'veri'的准确性：

from spellchecker import SpellChecker
import re

WORD = re.compile(r'\w+')
spell = SpellChecker()

def reTokenize(doc):
    tokens = WORD.findall(doc)
    return tokens

text = ["Hi, welcmoe to speling.","This is jsut an exapmle, but cosnider a veri big coprus."]

def spell_correct(text):
    sptext =  [' '.join([spell.correction(w).lower() for w in reTokenize(doc)])  for doc in text]    
    return sptext    

print(spell_correct(text))

输出：

['hi welcome to spelling', 'this is just an example but consider a very big corpus']

相关问题更多 >

编程相关推荐

热门问题

热门文章