NLTK西班牙语标签结果真的很糟糕？

*************** START TAGGING FOR LINE 6 **************************************************************************************************************************************** Current line contents before tagging-> mejor ve a la sucursal de Juan Pablo II es la que menos gente tiene y no te tardas nada Unigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', 'aq0cs0'), ('ve', 'vmip3s0'), ('a', 'sps00'), ('la', 'da0fs0'), ('sucursal', 'ncfs000'), ('de', 'sps00'), ('Juan', 'np0000p'), ('Pablo', None), ('II', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('que', 'pr0cn000'), ('menos', 'rg'), ('gente', 'ncfs000'), ('tiene', 'vmip3s0'), ('y', 'cc'), ('no', 'rn'), ('te', 'pp2cs000'), ('tardas', None), ('nada', 'pi0cs000')] Bigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', None), ('ve', None), ('a', None), ('la', None), ('sucursal', None), ('de', None), ('Juan', None), ('Pablo', None), ('II', None), ('es', None), ('la', None), ('que', None), ('menos', None), ('gente', None), ('tiene', None), ('y', None), ('no', None), ('te', None), ('tardas', None), ('nada', None)] Trigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', None), ('ve', None), ('a', None), ('la', None), ('sucursal', None), ('de', None), ('Juan', None), ('Pablo', None), ('II', None), ('es', None), ('la', None), ('que', None), ('menos', None), ('gente', None), ('tiene', None), ('y', None), ('no', None), ('te', None), ('tardas', None), ('nada', None)] **************************************************************************************************************************************** *************** START TAGGING FOR LINE 7 **************************************************************************************************************************************** Current line contents before tagging-> He levantado ya varios reporte pero no resuelven nada Unigram tagger-> [('He', 'vaip1s0'), ('levantado', 'vmp00sm'), ('ya', 'rg'), ('varios', 'di0mp0'), ('reporte', 'vmsp1s0'), ('pero', 'cc'), ('no', 'rn'), ('resuelven', None), ('nada', 'pi0cs000')] Bigram tagger-> [('He', None), ('levantado', None), ('ya', None), ('varios', None), ('reporte', None), ('pero', None), ('no', None), ('resuelven', None), ('nada', None)] Trigram tagger-> [('He', None), ('levantado', None), ('ya', None), ('varios', None), ('reporte', None), ('pero', None), ('no', None), ('resuelven', None), ('nada', None)] *************** START TAGGING FOR LINE 8 **************************************************************************************************************************************** Current line contents before tagging-> Es lamentable el servicio que brindan Unigram tagger-> [('@ContactoBanamex', None), ('Es', 'vsip3s0'), ('lamentable', 'aq0cs0'), ('el', 'da0ms0'), ('servicio', 'ncms000'), ('que', 'pr0cn000'), ('brindan', None)] Bigram tagger-> [('@ContactoBanamex', None), ('Es', None), ('lamentable', None), ('el', None), ('servicio', None), ('que', None), ('brindan', None)] Trigram tagger-> [('@ContactoBanamex', None), ('Es', None), ('lamentable', None), ('el', None), ('servicio', None), ('que', None), ('brindan', None)]

2条回答

网友

1楼 · 编辑于 2024-09-28 21:25:19

创建意大利面标记器（https://code.google.com/p/spaghetti-tagger/）是为了简单地指导如何使用NLTK语料库和标记模块轻松地创建可伸缩的标记器。在

它并不像网站所说的那样是一个最先进的系统。建议使用最先进的标记，如http://nlp.lsi.upc.edu/freeling/。如果您需要的话，我很乐意用python编写一个合适的包装器类来释放它。在

回到您的问题，正如Francis所暗示的（https://groups.google.com/forum/#!topic/nltk-users/FtqksaZLLvY），首先浏览教程http://nltk.googlecode.com/svn/trunk/doc/howto/tag.html，然后您将看到backoff参数可能会解决您的问题

免责声明：我写了意大利面.pyhttps://spaghetti-tagger.googlecode.com/svn/spaghetti.py

网友

2楼 · 编辑于 2024-09-28 21:25:19

在我看来，Jacob Perkins关于使用NLTK进行词性标注的教程博客文章可能是更好的在线资源之一。他首先构建了一个简单的backoff ngram标记器，然后研究添加正则表达式和基于词缀的标记，然后是Brill标记，然后是基于分类器的完整标记。这些帖子清晰易懂，还包括一些有用的绩效比较。在

从这里开始，一直到第4部分：http://streamhacker.com/2008/11/03/part-of-speech-tagging-with-nltk-part-1/

相关问题更多 >

编程相关推荐

热门问题

热门文章