回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在尝试为西班牙语创建一个标记性能比较。我当前的脚本是<a href="https://stackoverflow.com/a/20817106/526801">this</a>one的修改版本,尽管我尝试了另一个版本,结果非常相似。在</p>
<p>我正在使用cess_esp语料库,并为这个语料库创建了一个Unigram、Bigram、Trigram和Brill tagger,使用标记语句来训练每个tagger。在</p>
<p>我关心的是二元曲线,三元曲线标记器的性能…从结果来看,它们似乎一点都不起作用。在</p>
<p>例如,下面是我的脚本的一些输出:</p>
<pre><code>*************** START TAGGING FOR LINE 6 ****************************************************************************************************************************************
Current line contents before tagging-> mejor ve a la sucursal de Juan Pablo II es la que menos gente tiene y no te tardas nada
Unigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', 'aq0cs0'), ('ve', 'vmip3s0'), ('a', 'sps00'), ('la', 'da0fs0'), ('sucursal', 'ncfs000'), ('de', 'sps00'), ('Juan', 'np0000p'), ('Pablo', None), ('II', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('que', 'pr0cn000'), ('menos', 'rg'), ('gente', 'ncfs000'), ('tiene', 'vmip3s0'), ('y', 'cc'), ('no', 'rn'), ('te', 'pp2cs000'), ('tardas', None), ('nada', 'pi0cs000')]
Bigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', None), ('ve', None), ('a', None), ('la', None), ('sucursal', None), ('de', None), ('Juan', None), ('Pablo', None), ('II', None), ('es', None), ('la', None), ('que', None), ('menos', None), ('gente', None), ('tiene', None), ('y', None), ('no', None), ('te', None), ('tardas', None), ('nada', None)]
Trigram tagger-> [('@yadimota', None), ('@ContactoBanamex', None), ('mejor', None), ('ve', None), ('a', None), ('la', None), ('sucursal', None), ('de', None), ('Juan', None), ('Pablo', None), ('II', None), ('es', None), ('la', None), ('que', None), ('menos', None), ('gente', None), ('tiene', None), ('y', None), ('no', None), ('te', None), ('tardas', None), ('nada', None)]
****************************************************************************************************************************************
*************** START TAGGING FOR LINE 7 ****************************************************************************************************************************************
Current line contents before tagging-> He levantado ya varios reporte pero no resuelven nada
Unigram tagger-> [('He', 'vaip1s0'), ('levantado', 'vmp00sm'), ('ya', 'rg'), ('varios', 'di0mp0'), ('reporte', 'vmsp1s0'), ('pero', 'cc'), ('no', 'rn'), ('resuelven', None), ('nada', 'pi0cs000')]
Bigram tagger-> [('He', None), ('levantado', None), ('ya', None), ('varios', None), ('reporte', None), ('pero', None), ('no', None), ('resuelven', None), ('nada', None)]
Trigram tagger-> [('He', None), ('levantado', None), ('ya', None), ('varios', None), ('reporte', None), ('pero', None), ('no', None), ('resuelven', None), ('nada', None)]
*************** START TAGGING FOR LINE 8 ****************************************************************************************************************************************
Current line contents before tagging-> Es lamentable el servicio que brindan
Unigram tagger-> [('@ContactoBanamex', None), ('Es', 'vsip3s0'), ('lamentable', 'aq0cs0'), ('el', 'da0ms0'), ('servicio', 'ncms000'), ('que', 'pr0cn000'), ('brindan', None)]
Bigram tagger-> [('@ContactoBanamex', None), ('Es', None), ('lamentable', None), ('el', None), ('servicio', None), ('que', None), ('brindan', None)]
Trigram tagger-> [('@ContactoBanamex', None), ('Es', None), ('lamentable', None), ('el', None), ('servicio', None), ('que', None), ('brindan', None)]
</code></pre>
<p>现在,二元曲线和三元曲线被训练成指示的链接,顺便说一下,这是NLTK书中描述的更直接的方式:</p>
^{pr2}$
<p>你知道我是否遗漏了什么吗?二元组和三元组难道不应该比单元组更好吗?我应该对二元曲线和三元曲线使用回退标记吗?在</p>
<p>谢谢!
亚历杭德罗</p>