我试着用stanfordspegmenter来分割一段中文,但遇到了标题问题。我首先从http://nlp.stanford.edu/software/segmenter.shtml下载了Stanford Word Segmenter 3.5.2版
然后我写了一条Python:
import os
os.environ['JAVAHOME'] = "C:/Program Files/Java/jdk1.8.0_102/bin/java.exe"
from nltk.tokenize.stanford_segmenter import StanfordSegmenter
segmenter = StanfordSegmenter(path_to_jar = "./stanford-segmenter-2015-12-09/stanford-segmenter-3.6.0.jar",
path_to_slf4j = "./stanford-segmenter-2015-12-09/slf4j-api.jar",
path_to_sihan_corpora_dict = "./stanford-segmenter-2015-12-09/data",
path_to_model = "./stanford-segmenter-2015-12-09/data/pku.gz",
path_to_dict = "./stanford-segmenter-2015-12-09/data/dict-chris6.ser.gz")
sentence = u"这是斯坦福中文分词器测试"
segmenter.segment(sentence)
但我得到了以下错误:
^{pr2}$我哪里出错了?谢谢。在
我认为有一些实现错误。我也有类似的问题。 要解决错误,请尝试
segmenter.\u stanford_jar=“./stanford-segmenter-2015-12-09/stanford-segmenter-3.6.0.jar”
如果这不起作用,尝试给出segmenter jar文件在segmenter上的完整路径
相关问题 更多 >
编程相关推荐