我正在研究一个词云问题。我认为我的结果涵盖了需求,因为它生成了一个没有无趣单词或标点符号的单词云,但显然不是。我不知道我错过了什么
脚本需要处理文本,删除标点符号,忽略不包含所有字母的大小写和单词,计算频率,忽略无趣或不相关的单词。字典是calculate_frequencies
函数的输出。wordcloud模块将从您的字典生成图像
我的代码:
def calculate_frequencies(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just", \
"in", "for", "so" ,"on", "says", "not", "into", "because", "could", "out", "up", "back", "about"]
# LEARNER CODE START HERE
frequencies = {}
words = file_contents.split()
final_words = []
for item in words:
item = item.lower()
if item in punctuations:
words = words.replace(item, "")
if item not in uninteresting_words and item.isalpha()==True:
final_words.append(item)
for final in final_words:
if final not in frequencies:
frequencies[final]=0
else:
frequencies[final]+=1
#wordcloud
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(frequencies)
return cloud.to_array()
如前所述,我认为您的代码不会运行
words
是一个列表,.replace
不是有效的list
方法要简单地获取计数,请参阅以下代码
有关标点符号,请参阅-Best way to strip punctuation from a string
计数时,使用
Counter
对于WordCloud,您不需要计算任何东西,因为它会为您计算。注意
stopwords
有一个参数和一个process_text
函数,该函数使用默认情况下忽略标点的正则表达式模式-https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html相关问题 更多 >
编程相关推荐