有没有办法提高情感维德情绪分析器？

sid = SentimentIntensityAnalyzer() c.execute("select body, creation_date, group_id from posts where (substring(lower(body) from (%s))=(%s)) and language=\'en\' order by creation _ date DESC (s,s,)") conn.commit() if(c.rowcount>0): dump_fetched = c.fetchall() textsSql=pd.DataFrame(dump_fetched,columns=['body','created_at', 'group_id']) del dump_fetched gc.collect() texts = textsSql['body'].values # here, some data manipulation: steps listed above polarity_ = [sid.polarity_scores(s)['compound'] for s in texts]

1条回答

网友

1楼 · 发布于 2024-10-01 13:39:17

/1。你不需要删除停止语，nltk+vader已经这样做了。在

/2。您不需要删除标点符号，因为这会影响维德的极性计算，除了处理开销。所以，继续用标点符号。在

    >>> txt = "this is superb!"
    >>> s.polarity_scores(txt)
    {'neg': 0.0, 'neu': 0.313, 'pos': 0.687, 'compound': 0.6588}
    >>> txt = "this is superb"
    >>> s.polarity_scores(txt)
    {'neg': 0.0, 'neu': 0.328, 'pos': 0.672, 'compound': 0.6249}

/3.你也应该引入句子标记化，因为它会提高准确性，然后根据句子。例句这里：https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vaderSentiment.py#L517

/4。极性计算是完全独立的，可以使用一个multiprocessing pool来计算小尺寸，比如10，以提供很好的速度提升。在

polarity_ = [sid.polarity_scores(s)['compound'] for s in texts]

相关问题更多 >

编程相关推荐

热门问题

热门文章