我目前正在分析文本数据,并从语料库中提取名词
是的,我是一个新手,我在这里学习和改进我的错误
当我基于提取的名词列创建wordcloud时,单词cloud只显示字母和符号,而不显示单个单词
我主要关注的不是wordcloud,但由于我正在进一步分析文本、主题建模并旨在开发预测模型,因此我希望确保专栏没有需要进一步分析的问题
from textblob import TextBlob
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
height=500,
max_words=50,
max_font_size=100,
relative_scaling=0.5,
colormap='Blues',
normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
包含来自数据框的名词的列
0 ['lot']
1 ['weapon', 'gun', 'instance']
2 ['drive', 'drive', 'car']
3 ['felt', 'guy', 'stage']
4 ['price', 'launch', 'ryse', 'son', 'ip', 'cryt...
5 ['drivatar', 'crash', 'guy', 'track', 'use', '...
6 ['spark', 'thing']
7 ['stream', 'player', 'linux', 'start', 'stream...
8 ['kill', 'game', 'absolute', 'shit']
9 ['breed', 'stealth', 'horse', 'duck']
10 ['beach', 'duty']
11 []
12 ['europe', 'guess']
13 ['power', 'cloud', 'god']
14 ['gameplay', 'footage', 'zoom']
15 []
16 ['stream', 'play', 'game', 'week', 'gdex', 'co...
17 ['edit']
19 ['halo', 'clip', 'lot', 'journey']
21 ['thing', 'master', 'chief', 'shawl', 'help', ...
22 ['respect', 'respawn', 'trailer', 'gameplay', ...
Name: nouns, Length: 7523, dtype: object
你的代码很好。这里没有显示的预处理管道中一定有错误
请参阅下面基于您的代码的完整工作示例:
相关问题 更多 >
编程相关推荐