我已经从数据框中创建了一个单词列表,并从中删除了停止词。 我想创建一个词频大于某个值n的单词列表。 我该怎么做
Here is my code to generate the list:
tokenizer = RegexpTokenizer(r"\w+(?:[-']\w+)?")
wineData['description'] = wineData['description'].apply(lambda x:
str.lower(x))
wineDataTokenized = wineData['description'].apply(lambda x: [el for el in
tokenizer.tokenize(x) if el not in stop_words])
filteredList = chain.from_iterable(wineDataTokenized)
frequencyList = FreqDist(filteredList)
highFreq = list(frequencyList.keys())
资料来源:https://programminghistorian.org/en/lessons/counting-frequencies
相关问题 更多 >
编程相关推荐