Python。文字问题？

import nltk from nltk.corpus import opinion_lexicon from nltk.tokenize.simple import (LineTokenizer, line_tokenize) poswords = set(opinion_lexicon.words("positive-words.txt")) negwords = set(opinion_lexicon.words("negative-words.txt")) f=open("paulryan.txt", "rU") raw = f.read() token= nltk.line_tokenize(raw) print(token) def finddemons(): for x in token: y = token.words() percpos = len([w for w in token if w in poswords ]) / len(y) percneg = len([w for w in token if w in negwords ]) / len(y) print(x, "pos:", round(percpos, 3), "neg:", round(percneg, 3)) finddemons() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in finddemons AttributeError: 'list' object has no attribute 'words'

1条回答

网友

1楼 · 发布于 2024-10-03 19:29:46

我建议你逐行阅读文件。然后，使用单词“tokenize”：

for line in f:
    tokens = word_tokenize(line)

关于在词典中搜索的小写文本，您是对的：

for line in f:
    tokens = word_tokenize(line.lower())

您甚至可以尝试使用wordnet来对标记进行柠檬化，因为意见词典的词汇并不丰富。尤其是如果你使用tweet，在tweet中，单词的形式常常不同。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章