如何使用nltk标记单词列表？

#file readdata.py from globalvariable import * import os class readdata: def dataAyat(self): global kalimatayat fo = open(os.path.join('E:\dataset','dataset.txt'),"r") line = [] for line in fo.readlines(): datatxt = line.rstrip('\n').split('\t') newdatatxt = [x.split('\t') for x in datatxt] kalimatayat.append(newdatatxt) print newdatatxt readdata().dataAyat()

[['this' , 'is' , 'string' , '1' , ',' , 'first' , 'sentence' , '.'],['this' , 'is' , 'string' , '2' , ',' , 'first' , 'sentence' , '.']] [['this' , 'is' , 'string' , '1' , ',' , 'second' , 'sentence' , '.'],['this' , 'is' , 'string' , '2' , ',' , 'second' , 'sentence' , '.']]

1条回答

网友

1楼 · 发布于 2024-10-01 02:35:29

要标记句子列表，请在其上迭代并将结果存储在列表中：

data = [[['this is string 1, first sentence.'],['this is string 2, first sentence.']],
[['this is string 1, second sentence.'],['this is string 2, second sentence.']]]
results = []
for sentence in data:
    sentence_results = []
    for s in sentence:
        sentence_results.append(nltk.word_tokenize(sentence))
    results.append(sentence_results)

结果会是这样的

[[['this' , 'is' , 'string' , '1' , ',' , 'first' , 'sentence' , '.'],  
  ['this' , 'is' , 'string' , '2' , ',' , 'first' , 'sentence' , '.']], 
[['this' , 'is' , 'string' , '1' , ',' , 'second' , 'sentence' , '.'],
  ['this' , 'is' , 'string' , '2' , ',' , 'second' , 'sentence' , '.']]]

相关问题更多 >

编程相关推荐

热门问题

热门文章