一个微型实用程序,用于生成合理的拼写错误
mrs-spellings的Python项目详细描述
斯佩林斯夫人
在程序上生成合理拼写错误的微型实用程序
Table of Contents
安装
从pypi
pip install mrs-spellings
来源
python -m pip install git+https://github.com/CircArgs/mrs_spellings.git
用例
- 在文本清理过程中生成要替换的拼写错误,开销较低
- 在
- 使你的模型不容易出错的培训
- 测试期间作为TTA的一部分
- 为现有替换词典中没有出现的词汇表外单词/单词补充现有的解决方案
使用
目前支持3种主要方法:
In[1]:frommrs_spellingsimportMrsWord,MrsSpellings#methods return MrsSpellingsIn[2]:MrsWord("hello").swap()Out[2]:{'ehllo','hello','helol','hlelo'}In[3]:MrsWord("hello").delete(number_deletes=1)Out[3]:{'ello','hell','helo','hllo'}In[4]:MrsWord("hello").qwerty_swap(max_distance=1)Out[4]:{'gello','h3llo','hdllo','he,lo','he:lo',...'jello','nello','yello'}# simply chain methodsIn[5]:MrsWord("hello").swap().delete()Out[5]:{'ehll','ehlo','ello',...'hllo','hlol','lelo'}# MrsWord is a stringIn[6]:MrsWord("Hello")+" "+MrsWord("World")Out[6]:'Hello World'In[7]:MrsWord("Hello {}").format("world")Out[7]:'Hello world'# MrsSpellings work as setsIn[8]:MrsWord("hello").swap().union(MrsWord("world").delete())Out[8]:{'ehllo','hello','helol','hlelo','orld','wold','word','worl','wrld'}In[9]:MrsWord("hello").delete(1)-MrsWord("hello").delete(1)Out[9]:set()In[10]:" ".join(MrsWord("Hello").qwerty_swap())Out[10]:'Helko Hdllo Yello He,lo Helll Hellp Hel,o Nello Heklo Hrllo H3llo Gello Heolo He:lo Helli Hell9 Heloo Hel:o Jello Hwllo'
方法
删除
^{pr2}$交换
Signature:MrsWord.swap()Docstring:swapsomeconsecutivecharactersArgs:Returns:MrsSpellings(set):allpossiblemisspellingsthatformasaresultofswappingconsecutivecharacters
基于qwerty距离(出租车)的交换
Signature:MrsWord.qwerty_swap(max_distance=1)Docstring:swapcharacterswiththeirqwertyneighborsArgs:max_distance(int):themaxdistance(taxi-cab)ofkeysonthekeyboardtoswape.g.`max_distance=1`then"g"couldbecomeoneof["f","h"]`max_distance=2`then"g"couldbecomeoneof['f','h','t','y','v','b']Note:Thenumberofswapspossibleincreaseswithdistancehowevertheincreaseisnotalwaysuniform.Forexample,the3rdsetofkeysfromgis['6','d','j']whilethesecondwas['t','y','v','b']Returns:MrsSpellings(set):allpossiblemisspellingsthatformasaresultofswappingcharacterswithqwertyneighbors
什么是qwerty距离?
Qwerty距离是典型键盘上按键之间的距离。在本文件包中,假设如下:
- 每行有半个键偏移量
- l1距离可以很好地估计键盘上按键之间的自然移动距离
- 通过按住shift键可以增加距离
下面是这些假设结果的一个例子。与g
键按相等距离分组(按升序到最远距离分组)的最近键是:
[['f','h'],['t','y','v','b'],['6','d','j'],['r','u','c','n'],['^','5','7','s','k'],['e','i','x','m'],['%','&','4','8','a','l'],['w','o','z','<'],['$','*','3','9',':'],['q','p',','],['#','(','2','0',';'],['[','>'],['@',')','1','-','"'],[']','.'],['!','_','`','=',"'"],['\\','?'],['~','+','{'],['/'],['}'],['|']]
- 项目
标签: