打印术语频率列表（有分布）

>>> fileObj = codecs.open( "/Users/shannonmcgregor/Desktop/ALLstories.txt", "r", "Latin-1" ) chattanooga_stories = fileObj.read() >>> import nltk from nltk.corpus import stopwords >>> lowered_stories = chattanooga_stories.lower() >>> word_list = lowered_stories.split() >>> filtered_stories = [w for w in word_list if not w in stopwords.words('english')] >>> fdist = nltk.FreqDist(w.lower() for w in filtered_stories) >>> print(fdist) <FreqDist with 7031 samples and 19893 outcomes> >>> top_2k = [ ] >>> top_2k = fdist.most_common(2000) >>> fdist.plot(2000, cumulative=True)

1条回答

网友

1楼 · 发布于 2024-09-28 22:25:54

当您使用most\u common（）时，您确实可以获得各种单词的计数。使用items方法获取按排序顺序排列的项目列表（最常见的是第一个）。在

fdist = nltk.FreqDist(filtered_stories)    #filtered_stories is already lowercase
print(fdist)
top_2k = [ ]
top_2k = fdist.most_common(2000)
tok_2k.items() #should give you a sorted list

相关问题更多 >

编程相关推荐

热门问题

热门文章